Many companies now have in-house business intelligence or data science teams. Some will be developing complex machine learning prediction algorithms in python; others will be producing basic charts in Excel. Here’s a look at the key techniques a solid analytics team should have under their belt:
1. Turning business questions into testable hypotheses
A PhD in machine learning is of little use to a company if you can’t solve problems. At a basic level this means taking questions and concerns from decision makers and converting them into statistically testable hypotheses.
For example: CEO asks: ‘Was our big expensive website renovation worth the investment?’
How do we go about answering this? What is the metric of interest? Revenue? Or brand awareness? We might formulate a hypothesis that says: investment is worth it if, over some time horizon, the increase in revenue resulting from an increase in conversion rate outweighs the cost. We might then reduce this to a statistical significance test looking at the conversion rate distributions before and after the change.
2. Prepare great looking visualisations and plots
Whatever the tool, the most important point is visual representations of data should tell a story.
3. Correlation and simple linear regression analyses
Finding associations between things, establishing cause and effect. These lie at the heart of statistical analysis. It’s easy to calculate correlation coefficients or build regression models with any stats package. The harder bit is understanding the theory and assumptions underpinning them.
4. Cluster analysis
Customer segmentations are a useful way to improve targeted marketing. There are lots of tools and agencies out there (including DS Analytics!) that can build complex segmentations, using third party data sources and intricate algorithms. But you can get a long way with simple k-means cluster analysis. Base R comes with a kmeans() function to do just that.
5. Cleaning messy data
Data processing / wrangling / munging is the process of converting data from a raw messy state to a clean usable structure. This could be a one off job in Excel using pivot tables and the INDEX() MATCH() functions, or more automated procedures using python’s pandas library. The key skills needed are a) handling different data types (strings, dates, integers etc.), b) handling missing values and c) joining data sets. R’s dplyr and tidyr packages are great for data wrangling.
6. Benchmarking / creating indices
7. Significance testing / Z-tests and A/B testing
A/B testing is where you you compare a treatment group (B), who’ve experienced some intervention, for example a new website feature, with a control group (A), who have not. The core assumption is that without the intervention, B would see the same results as A. A significant difference between the two indicates the intervention had some effect. The actual testing part is easy (a good old GCSE stats z-test will probably do). What’s harder is choosing the groups and ensuring assumptions are satisfied.
8. Survey response analysis
Qualitative business intelligence teams often commission surveys. These typically aim to collect data on a group’s stated preferences and opinions, where revealed preferences in behavioural data are not available. The depth and intricacy of the analysis will depend on the business questions and quantity of data available. At its simplest this could be building contingency tables and testing for independence across cells using a chi-sq test (chisq.test() function in R). More involved analyses might include log-linear regression modelling and visualisations, e.g. a dendrogram.
9. Time series analysis and forecasting
Time series decomposition and forecasting are essential tools for business statistics. The stl() function in R allows you to decompose a time series into its seasonal, trend and noise components. KPI and cost forecasts can help decision makers plan inventory and resource allocation. The forecast package comes with a suite of forecasting functions. More recently, the bsts package provides for more complex bayesian structural time series models and dynamic regressions.
10. Optimisation / operational research
‘What should our Q2 price schedule look like?’, ‘What is the optimal way to allocate marketing budget across the year?’. To answer these types of questions we need to use methods in operational research. This means setting up a mathematical model that describes the business operation, estimating model parameters then using the model to understand how inputs affect outputs. Often it’s best to move away from your laptop with a pen and paper and draw out the system first. How the parameters are estimated will depend on the situation. It could be you’ve got data available to parameterise the model, or you might just use common sense (or combine the two with a Bayesian treatment). The optimisation stage might involve simple calculus or a computer simulation.