10 techniques any analytics team should be capable of...

Many companies now have in-house business intelligence or data science teams. Some will be developing complex machine learning prediction algorithms in python; others will be producing basic charts in Excel. Here’s a look at the key techniques a solid analytics team should have under their belt:

1. Turning business questions into testable hypotheses

A PhD in machine learning is of little use to a company if you can’t solve problems. At a basic level this means taking questions and concerns from decision makers and converting them into statistically testable hypotheses.

For example: CEO asks: ‘Was our big expensive website renovation worth the investment?’

How do we go about answering this? What is the metric of interest? Revenue? Or brand awareness? We might formulate a hypothesis that says: investment is worth it if, over some time horizon, the increase in revenue resulting from an increase in conversion rate outweighs the cost. We might then reduce this to a statistical significance test looking at the conversion rate distributions before and after the change.

2. Prepare great looking visualisations and plots

Microsoft have hugely improved Excel’s charting capabilities and you can now produce decent looking, basic plots. Tableau is still dominating the next tier, though there’s a large and growing army of drag-and-drop type competitors out there e.g. PowerBI. For those wading into the world of coding the two best options are R and Javascript. The former is easier to learn and great for a range of analytical jobs; the latter is capable of producing dynamic infinitely customisable web-based visualisations.

Whatever the tool, the most important point is visual representations of data should tell a story.

3. Correlation and simple linear regression analyses

Finding associations between things, establishing cause and effect. These lie at the heart of statistical analysis. It’s easy to calculate correlation coefficients or build regression models with any stats package. The harder bit is understanding the theory and assumptions underpinning them.

4. Cluster analysis

Customer segmentations are a useful way to improve targeted marketing. There are lots of tools and agencies out there (including DS Analytics!) that can build complex segmentations, using third party data sources and intricate algorithms. But you can get a long way with simple k-means cluster analysis. Base R comes with a kmeans() function to do just that.

5. Cleaning messy data

Data processing / wrangling / munging is the process of converting data from a raw messy state to a clean usable structure. This could be a one off job in Excel using pivot tables and the INDEX() MATCH() functions, or more automated procedures using python’s pandas library. The key skills needed are a) handling different data types (strings, dates, integers etc.), b) handling missing values and c) joining data sets. R’s dplyr and tidyr packages are great for data wrangling.

6. Benchmarking / creating indices

7. Significance testing / Z-tests and A/B testing

A/B testing is where you you compare a treatment group (B), who’ve experienced some intervention, for example a new website feature, with a control group (A), who have not. The core assumption is that without the intervention, B would see the same results as A. A significant difference between the two indicates the intervention had some effect. The actual testing part is easy (a good old GCSE stats z-test will probably do). What’s harder is choosing the groups and ensuring assumptions are satisfied.

8. Survey response analysis

Qualitative business intelligence teams often commission surveys. These typically aim to collect data on a group’s stated preferences and opinions, where revealed preferences in behavioural data are not available. The depth and intricacy of the analysis will depend on the business questions and quantity of data available. At its simplest this could be building contingency tables and testing for independence across cells using a chi-sq test (chisq.test() function in R). More involved analyses might include log-linear regression modelling and visualisations, e.g. a dendrogram.

9. Time series analysis and forecasting

Time series decomposition and forecasting are essential tools for business statistics. The stl() function in R allows you to decompose a time series into its seasonal, trend and noise components. KPI and cost forecasts can help decision makers plan inventory and resource allocation. The forecast package comes with a suite of forecasting functions. More recently, the bsts package provides for more complex bayesian structural time series models and dynamic regressions.

10. Optimisation / operational research

‘What should our Q2 price schedule look like?’, ‘What is the optimal way to allocate marketing budget across the year?’. To answer these types of questions we need to use methods in operational research. This means setting up a mathematical model that describes the business operation, estimating model parameters then using the model to understand how inputs affect outputs. Often it’s best to move away from your laptop with a pen and paper and draw out the system first. How the parameters are estimated will depend on the situation. It could be you’ve got data available to parameterise the model, or you might just use common sense (or combine the two with a Bayesian treatment). The optimisation stage might involve simple calculus or a computer simulation.

Why Bayesian?

Buzzwords are ubiquitous in data science (itself very much a buzzword/term). The word Bayesian has the distinct whiff of buzzword about it. Those in the commercial world (like us) have a habit of appropriating academics’ jargon and ‘monetising’ it (shudder). Well Bayesian analysis is actually a thing, and this is a mostly jargon-free explanation of what it is.

Mathematical models are used to simplify real world cause and effect phenomena. They allow us to draw out nuggets of knowledge that can be used to make better decisions. How much should we invest in marketing? What should our pricing schedule look like? Whom should we target with direct marketing and when should we reach out to them? What will demand for our service be in Q1? Data is used to calibrate and fine tune models. In and of itself, data is not valuable. What makes it valuable is good analysis.

Marketing Mix Modelling or econometrics are terms used (and misused) to describe statistical modelling that attempts to quantify the causal impact of marketing, price changes etc. on sales, web traffic, brand awareness etc. Using historic data we try to control for other influencing factors, like weather or Christmas, and isolate the effects we’re interested in. We can then use this knowledge to optimise marketing investments.

Traditionally, this involves specifying a model that looks like a reasonable representation of the real world. Then chucking whatever data is available at it. Some fine tuning and adjustments, some statistical tests. Then read off the result: ‘£1 investment in print advertising drives £2.30 in sales revenue…’. Or more often: ‘£1 investment in print advertising drives…-£500 in sales revenue…’ and try again…

Bayesian analysis is all about conditional probability: what is the probability of X, given you know Y? How does that probability change now you know Z? You see a coin on a table at a distance. What is the probability the side up is a head? Someone tells you they flipped that coin 2000 times and every time it came out heads...now what is the probability? Bayesian statistics is the mathematical formulation of that thought process.

There are three reasons why a Bayesian approach to modelling is an improvement on the traditional approach:

1. Incorporates business knowledge


Bayesian analysis allows you to support your data with what you know. Industry knowledge and intuition. Maybe you’ve only got a few years’ worth of data, or you’re modelling a new product. If you know price increases will always have a negative impact on sales, or the industry benchmark for TV ROI is £3.00, it makes sense to tell the model. Faced with that prior knowledge, the model will look at the data and update your knowledge accordingly. There is no way of achieving this in traditional modelling. If your data tells you the average return on investment is £350 when you know it’s closer to £5, all you can do is fix it at £5…in which case, why model the data at all? With Bayesian modelling you tell the model you’re reasonability confident it’s around £5 (that confidence is expressed mathematically using a probability distribution), then it takes the data and pushes it up or down, but pegged to the realms of reality.

2. Reports uncertainty


The output of a traditional model is a number. Price elasticity = -1.5. Or TV ROI = £2.30 The output of a Bayesian model is a distribution:

Bayesian modelling

This says ROI is likely to be around £2.30. There’s about a 25% chance it’s between £2.00 and £2.60, and a 10% chance it’s less than £1.

3. Models learn


Because Bayesian models are inherently learning models: they take existing knowledge and update it with data, they are inherently good for repeated analysis. They can be set up to continually, or at weekly / monthly / quarterly / yearly intervals take in new data and update the outputs without having to labouriously calibrate a new model every time on the full data history. This not only saves months of analyst time but allows you to track changes in a statistically valid way.

So, why isn’t everyone doing Bayesian modelling? Firstly, though the theories and maths underlying it have been about for hundreds of years (since Thomas Bayes and others c. mid 18th century in fact), the processing power required has only been available at reasonable cost for 5 to 10 years or so. Secondly, whilst conceptually it’s fairly straightforward, the maths and coding behind the scenes can be daunting. And lastly, there’s the issue of legacy. Agencies struggle to shake up and revitalise their approaches to problems and struggle further to explain them to clients.

DS Analytics are very keen on Bayesian modelling. Get in touch to find out more.