What is...shiny?

R’s Shiny package is a tool for creating web apps. It’s highly versatile and used to build everything from simple KPI dashboards, to complex data modelling tools, to beautiful, dynamic client reports.

data dashboard R

Why is it good?

You can go from rough sketch to dynamic, interactive prototype in hours. But you can also craft highly functional, infinitely customisable user interfaces.

Because it is part of the R ecosystem, it’s easy to run data processing and modelling tasks behind the scenes. And it’s free!

Why not just use Tableau or PowerBI?

If you want to stay away from the world of code and have the budget, Tableau or any of their many competitors might be the right choice for you. But these tools have their limitations.

They are designed to be highly general purpose, which often means they’re not quite able to produce exactly what you want. You inevitably hit the edge of what is possible in the world of ‘drag-and-drop’.

While it is easy to connect to databases from these tools, often what you want to visualise in your app isn’t raw data. And Tableau et al are emphatically not good tools for data processing or data analysis. For these tasks you have to turn to R or python, and if you’re in those environments already, why not use their web frameworks to build your user interface?

Check out our recent blog post on 6 Reasons To Make The Leap From Excel To R

How secure is it?

It is straightforward to host your apps on cloud services, such as Google Cloud, AWS or Azure. It is also relatively painless to add password protection and restrict access to designated users.

What are the challenges?

The major hurdle is finding people who not only know R, but who can develop well produced, well documented packaged applications. There is undoubtedly a steep initial learning curve for R more generally, and shiny has its own idiosyncrasies. To produce client or customer facing applications will probably involve also knowing some Javascript.

While many organisations use Shiny to build client / customer ready applications, a very common alternative approach is to use shiny in the proof of concept stage, and then hand this over to web developers and designers to build the final product.

DS Analytics are a data science consulting company. We provide clients with support, mentoring and training. We help companies embed data science capabilities within their organisations. We provide R support and training, as well as training in other tools and techniques in data science.


Get in touch to find out more!

6 Reasons To Make The Leap From Excel To R

I am a big fan of Excel. It has been one of the biggest productivity boosting technologies invented in recent times. But it has its limitations.

Learning to code, like learning any language, is both challenging and rewarding. It takes time, effort and perseverance.

Learning R is no different. The initial learning phase can be daunting and there are many moments when, deadlines looming, you abandon ship and return to the safety of Excel. But once you build up confidence, you’ll find the benefits are huge. Tasks can be done much faster. New types of analysis are possible. Repetitive tasks are automatable and workflows can be shared with colleagues.

Here are 6 (of many) reasons to take the leap from Excel to R. (PS. it’s also free!!)

1. Cleaning data

Transforming messy raw data into usable clean processed data is generally the first data analysis task of a new project. There are many subtasks - merging different datasets together, checking for missing or erroneous data, reshaping tables, reformatting and transforming variables.

Most of these are just about possible in Excel. Merging datasets generally involves getting very friendly with the VLOOKUP() function. Searching for missing values involves a lot of filtering and sorting. Reshaping data is difficult, usually involving a lot of playing about with pivot tables.

In R these tasks are all straightforward. Packages from tidyverse contain clean well written functions. Each of these tasks can be neatly stacked into a logical workflow.


2. Automating tasks

If you received the same messy raw data periodically, you might want to automate the data cleaning process. Coding in general allows many repetitive tasks to be automated. At best this can be achieved with VBA in the Excel environment, but it’s not ideal.


At DS Analytics we have a whole host of R scripts we recycle, that clean, model and visualise data. Once written they can be executed with a single run command.

3. Sharing workflows

Most analysts at one time or other will have been on the receiving end of a bad project handover. A myriad of miscellaneous .xls and .csv files. The former filled with hideous long formulae and conditional formatting. Generally it is not possible to trace back the analysis stages.


This agony can be relieved by replacing these messy workflows with a series of well commented and documented R scripts. Workflows can be packaged up into custom-build R functions, which themselves can be wrapped up into your own internal R package.

Using version control (github or bitbucket for example) code can be shared and improved in an organised safe environment.

Screen Shot 2018-11-09 at 09.31.32.png


4. Modelling

You can just about build models in Excel, but it isn’t pretty. Without doubt this is not the right tool for that job. Whatever kind of modelling you’re doing, whether correlation analysis, analysis of variance, building regression modelling or cluster analysis, you’re better off in the land of R. There are a clutch of other programs you could turn to - Stata, SPSS, SAS or EViews. But they are all fast becoming obsolete. The only other contender worth considering when it comes to data modelling is python (which is in fact complementary, rather than competitive to R).


In R you can build every conceivable type of model, from simple linear regression to most intricate algorithms.

Check out our recent blog on how to build price optimisation models in R here!


5. Handling larger datasets

When you’ve got datasets with over ~50k rows, Excel starts to splutter to a halt. I can’t count the number of times I’ve accidentally opened datasets too large for Excel to handle and had to wait while it freezes to death.

On most personal laptops R is comfortable with millions of rows of data. When data gets very large (exceeds your laptop’s RAM), there are an increasing number of options available. There are packages for efficient memory usage and parallelisation. And its now relatively painless for a non-computer scientist to fire up high memory virtual machines with R pre-installed, through cloud services like Google Cloud Platform and Amazon Web Services.

Screen Shot 2018-11-09 at 09.43.43.png

Very, very large data will be stored in big data warehouse solutions, such as Hadoop or Spark. These are often designed to work with R, with easy to use APIs to either pull subsets of the data from the warehouses, or to push R code to them.


6. Creating Better data visualisations

Excel plots have got a lot better in the latest versions. And sometimes they’re right for the job at hand. If you want to produce dynamic, highly customised plots, perhaps within a web application available to users within your organisation, then you’ll need another solution.


One route would be to buy an expensive licence for a dashboarding program e.g. Tableau or PowerBI. While these are extremely powerful tools, they come with drawbacks. Often visualisation is a subtask nested within a series of others.

For example your workflow might be:

Read in raw data >

Clean data >

Transform variables >

Visualise with simple plots >

Run statistical tests >

More visualisation >

Build models >

More visualisation >

Write report with analysis results and visualisations or build data app so others can use the information


Drag-and-drop tools like Tableau and PowerBI can only do elements of this workflow, so you’ll have to keep jumping in and out of different programs. ALL of the above can be done in R, including writing up tidy reports (Rmarkdown) or building data apps (Shiny).


A great many organisations have switched adopted open source software like R and python. Not everyone in your organisation, or even within your analytics team need coding and R skills. But having some in-house R skills can greatly improve analytical efficiency.

Where next? Read a previous blog on 8 Skills Every Data Analytics Team Should Have

DS Analytics are a data science consulting company. We provide clients with support, mentoring and training. We help companies embed data science capabilities within their organisations. We provide R support and training, as well as training in other tools and techniques in data science.


Get in touch to find out more!

What is...a p-value?

Experimentation is a great way to boost performance. It’s becoming easier and easier to run tests on website design, marketing copy, product features and much else besides. Amazon is famous for running thousands of experiments day-in day-out, constantly learning and improving their products.


Tools like Optimizely’s Web Experimentation allow non-statisticians to easily run controlled experiments on websites to optimise KPIs such as dwell times, click-through rates or sales.

The outputs these tools provide can often be confusing. One such output is called a p-value. A p-value is a statistical concept which, like all statistical concepts, can be a little tricky to get your head around if you don’t spend all day looking at them.

In this post we unpack in simple terms what p-values are and how to interpret them.

A/B tests can be used to determine the best attributes for your website. A/B tests are a great way to boost website performance and to understand what ideas, colours and distractions your target audience responds well to.

An A/B test is an example of a ‘two sample hypothesis test’: that is, a statistical test that compares average values from two samples.

For example, is the conversion to sales rate higher for email campaign group A than group B?

Two samples are collected, for example by sending web visitors to two different versions of the website at random, or randomly selecting recipients for one of two email campaigns. Summary stats are calculated, for example average conversion rates. These stats are then compared.



Making judgements from data

You’ve run your experiment and collected the data. Your old site design (version A) gets a sales conversion rate of 5%; your new design (version B) gets a sales conversion rate of 10%.

corporate data science training

In this experiment version B beats version A by 5 percentage points. But that does not necessarily mean B will beat A every time such an experiment is run. The result could have been due to random chance. What if you only had 20 visitors to your website in each sample? Is that enough evidence to make a decision?

Statistics is all about how you make these judgements. If you flip a coin five times, and each time you get heads, do you conclude the coin must be biased? Intuitively you know the larger the sample size, the more times you flip the coin, the more confident you can be in your judgement.


And p-values, in a neat single figure, help you make this call. A p-value is the probability you would have gotten a difference as big or bigger as the one observed across the two samples if no difference truly exists.

Assume for a moment there is in fact, across all potential visitors to your site, no preference for one version or the other. That is, if you took thousands of samples of visitors over and over again for months you’d find on average an equal chance of conversion to sales for both the A and the B version of your site.

But for any one of those samples there’s a small chance there would be a difference. This is called sampling variation and the probability of getting such a sample is the p-value.

If the p-value is small, say 1%, that means there is a small probability you identified a difference in performance of your two site versions by random accident. Therefore, you conclude there IS a difference.

It is typical to use the threshold of 5%: if your p-value is lower than 5% you conclude the difference observed in your experiment is significant.

Statistics is a tricky but increasingly valuable subject. It is very easy to misinterpret results and make bad decisions without sufficiently understand what the numbers are telling you.

DS Analytics are a data consultancy. We help our clients get value from their data. Get in touch to find out more or email us at contact@dsanalytics.co.uk.