Many companies now have in-house business intelligence or data science teams. Some will be developing complex machine learning prediction algorithms in python, others will be producing basic charts in Excel.
Here’s a look at the key techniques a solid analytics team should have under their belt:
1. Turning business questions into testable hypotheses
A PhD in machine learning is of little use to a company if you can’t solve problems. At a basic level this means taking questions and concerns from decision makers and converting them into statistically testable hypotheses.
For example: key stakeholder asks: ‘Should we be discounting product X?’
What knowledge do we need to answer this question? What do we need to predict? What data do we need to make that prediction? And what judgements should we make given our findings?
Once these questions are set up we need to think about how we're going to put together a model and how we're going to tackle uncertainty.
2. Cleaning messy data
Data nearly always arrives in a messy format. You might have been sent a bunch of miscellaneous CSVs, or an unhelpfully formatted automated report. Or maybe you're scraping data from a website and you've got a JSON file to sort out.
The key things to consider when cleaning messy data are:
Column formats - are dates coded as dates? Do some text variables need to be converted to binary 0s and 1s?
Missing values - are there thousands of NAs? Are you going to exclude these or impute values? Have text strings got mixed in amongst numeric values?
Data errors - when plotting variables do you see huge spikes or zeros?
Long / wide format - do you need to gather your data into long format?
You can get quite far in Excel, but at some point you'll find you need to graduate to the world of coding. R and python are fantastic tools for data processing.
3. Prepare great looking visualisations
Humans are good at spotting patterns. Data visualisations allow you to quickly see relationships in your data.
Visualisations are used to explore data and to communicate results. They're also often just used to show off!
There are many great tools available to visualise data, including Tableau, PowerBI, d3, R and good old Excel.
R’s ggplot2 package is a great place to start. Visualisations can be customised across every dimension. You can use it to create rough and ready plots to visualise model outputs, for example, or slick dynamic maps as part of a web-based dashboard.
4. Correlation and regression analysis
Understanding causal relationships and correlations between variables to essential for providing robust recommendations.
What impact will price changes have on KPIs? Are your marketing channels working together to drive engagement? What features of your website are driving click-throughs?
Regression models allow you to understand the impact of factors on outcomes. They allow you to predict what scenarios are likely given your decisions.
Every analyst should be able to tackle this problem and every statistical program has this capability.
5. Cluster analysis
Cluster analysis is a great technique for understanding different segments. These could be different customer groups, cities, website visitors, consumer journeys etc.
Machine learning methods such as kmeans look for natural separations in your data. Subjects are bucketed into different groups and each group can be described by the prevalence of certain characteristics.
There are a huge array of libraries in R and python, covering everything from the most basic clustering to highly involved machine learning algorithms.
6. Time series analysis and forecasting
Time series are data points gather periodically over time. Weekly website traffic over the past 3 years for example, or UK interest rates by quarter. Time series have characteristics such as trend, cyclicality and seasonality.
Techniques such as time series decomposition and time series forecasting (e.g. ARIMA) allow you to understand how time series evolve over time and to predict future values.
The R prophet R / python package allows you to produce great time series forecasts.
7. Predictive modelling
Machine learning and AI are all the rage. A huge plethora of machine learning algorithms with wild sounding names have been developed in recent years (with implementations in R and python).
There is a long tail in the distribution of usage for these algorithms. By far the most common predictive modelling approach is regression.
Being able to put together a simple predictive model using regression (then maybe graduating to decision trees or neural networks) is a key component of the toolkit.
How likely is recipient x to open an email with a given subject line? How likely are customers to accept an insurance quote? How likely is a customer to default on a loan? To address these questions you need predictive modelling.
8. Providing clear recommendations
Communicating results is a skill that takes time for most analysts to acquire. We're all so interested in learning new analytical techniques we neglect to build up our communication skills.
And communicating the results of statistical analysis is hard. Probability distributions are not natural concepts unless you spend all day looking at them. Even simple averages are often not easily understood.
The key principles are:
Keep the focus recommendation-led. Do not jump straight into the methodology. Start with what judgements you draw from your analysis
Communicate with simple plots. Bar plots and line plots over complex multi-axis multi-dimension horror shows. And no pie charts!
Try to incorporate uncertainty. It is often easier to communicate single point estimate recommendations. But you should strive to incorporate uncertainty into your feedback. This can be most easily done by giving 'best', 'likely' and 'worst' cases scenarios.
DS Analytics are a data consultancy. We help our clients get value from their data. Get in touch to find out more or email us at firstname.lastname@example.org.