From R scripts to shiny applications – use case in the spare parts business
How we developed a system of eight shiny applications for more than 500 users worldwide starting with a single R script
The main benefits of using R compared to other languages are speed and handling of large data sets. That’s the cause why we started the implementation of pricing models for one of our customers in R in 2010. Today, this customer runs a shiny server with eight shiny applications for more than 500 users. Image 1 depicts the architecture of the shiny applications and internal and external interfaces.
The initial R script used different data sets to run mathematical models (cluster analysis and regression models) to evaluate a fair market value for surplus spare parts. This script was initially created as the result of a pilot study about pricing models for spare parts. First applications arose immediately: trading managers wanted to use this script as a stand-alone evaluation tool for requests or offers for single parts. Updates were manually distributed via USB flash drive and the number of users was at most 10. Once started, the users proposed multiple new features to get rid of large Excel sheets with opaque vlookups. At the same time, the first stable version of shiny was released and our journey with shiny began. Starting with a local shiny server and a single application, the range of business cases, use cases and users steadily grew.
Today, among other features such as KPI dashboards, and parameter administration, the functionality of the system includes
- the evaluation of packages including more than 1,000 surplus parts
- a rule-based model to derive sales prices from the fair market value for surplus parts in stock
- a dynamic model to propose the best utilization for a given surplus part based on demands and value of the part
The number of users increased as fast as the number of functions and modules increased. Today, more than 500 users from all over the world are registered in our system. This fast and diverse growth was not always easy and straightforward. We had to learn some hard lessons to be where we are today: how to store data effectively, how to ensure good performance on a heavily restricted server infrastructure, how to synchronize data within different shiny applications, and last but not least how to document processes and models.
We solved the challenges regarding data storage, by storing all data in RDS files on disk and load the needed data on startup as global variables in our shiny apps, by continuously improving the implementation and algorithm efficiency and by establishing a file based notification concept to synchronize the data in all applications. Holding all data tables as global variables in memory is expensive in terms of hardware resources, however, as long as the data is not too large, this provides you fast access and full support for reactivity features in shiny applications.
A new challenge arises when the system gets more and more integrated in existing IT processes: interfaces to other IT systems have to be set up to transfer data and multiple users start to interact and manipulate the same data set simultaneously. Our first approach was that each origin of manipulation has its own data file and a regularly scheduled cron job combines all information and builds the actual data file loaded in the shiny applications. Later, we started to use the plumber package to establish a service handling the data and coordinating all manipulations. As a next step, we want to switch our data storage to a SQLite data base with data handling layers to enable reactivity and synchronicity in all shiny applications.
Performance challenges during the evolution process of our applications are mainly triggered by hardware restrictions and sometimes by a sudden and significant increase in users. As hardware extensions are heavily restricted by the IT department of our customer, every time performance was going down, we scrutinized our implementation and source code and looked for memory and efficiency leaks. This lead, e.g., to a differentiated and extensive usage of global reactive structures in our shiny apps. One of our most beneficial insights in this field is that using global eventReactive or reactive can significantly reduce memory and CPU usage. Using global observeEvent, however, does not have a significant potential and may actually increase the number of support activities as a crashed observer is not reactivated until the complete app is restarted.
While starting with single R scripts used by some experts, documentation in terms of comments within the code was more than sufficient. Nowadays, we more and more face the challenge to provide some more detailed documentation of background algorithms and user interfaces. After some tries with basic documentation in form of handbooks or specification sheets, a wiki based documentation process was established. This captures the agile and iterative development process established in this project and allows multiple users to collaborate on certain documentation tasks. To easily enable users to find the correct wiki page in a specific situation, we included help buttons in pages and boxes of our applications.
Summarizing our lessons learned during the last three years with this project, first of all we can conclude that shiny and R is a perfect platform for data driven analysis and algorithms even for a system with more than 500 users. As a second result, we can encourage everybody to use shiny and R to establish an agile software development process. Doing so, enabled us to exploit the rapid development of R, shiny packages with its features and to outperform competitive software systems and programming languages. And there is still fantastic work in progress, while the R and shiny community is steadily growing.