Is Bayesian MMM worth the faff?
Key takeaways
Traditional MMM is being stretched beyond its already quite limited capabilities
Bayesian MMM can address these issues and generate huge additional value IF used properly
If all you’re doing is swap a Bayesian model in place of an old school model, you’re getting all the faff and none of the benefits.
We can use priors to do away with null hypothesis significance testing, which wastes huge amounts of time and doesn’t answer the questions anyone cares about
Care is needed when we generate recommendations from Bayesian MMMs - and no one cares about ‘uncertainty’!
Introduction
Marketing Mix Modelling (MMM) is once again the de facto choice for marketers who want to understand and optimise their marketing investments.
This is both good and bad. Good because there is widespread appetite for statistics-backed decision making. Bad because many of the poor practices in traditional MMM are still with us. And as the breath and depth of questions asked of these models increases, they’re being stretched beyond their already quite constrained capabilities.
Amidst this resurgence of interest in MMM is a resurgence of interest in Bayesian modelling, long thought to be ideal for the problem. Bayesian models can be hugely valuable - more accurate, nuanced and abundant in output than traditional MMMs. But this comes at a cost, and care and expertise are needed to extract the benefits.
In this post, I describe how to think about this problem and how to tackle it using Bayesian analysis.
The limitations of the old school
Old school MMM relied on a few key tenets. Gather 3 years+ of weekly data. Apply some variable transformations - diminishing returns, adstocks. Build a regression model, attempting to control for confounders like seasonality or economic fluctuations. Then use the framework of null hypothesis significance testing (NHST) to determine whether your sample of data contains enough information to reject the claim that advertising has no effect on sales across the channels you care about. Ideally do some model validation. Finally, use your model to calculate return on investments and optimal budget allocations.
There are two fundamental problems with this approach. 1) the signal is usually a faint whisper amongst a cacophony of noise, so rejecting the marketing has no effect claim is tricky, and 2) no one asked you that in the first place; they asked what data can do to guide us towards a better decision.
Leaning too heavily on a small sample of data and the already not fit-for-purpose framework of NHST leads to dubious conclusions and an enormous amount of wasted time.
A framework for learning and scenario planning
Marketers don’t care about the methods, they care about making better decisions. We can better orientate the analysis around that goal by thinking about the problem not as one of building a bunch of models to estimate ROIs, but as building a framework for learning and scenario planning.
This framework should be able to do the following:
Learn from the best information we have available and to keep on learning in a stable manner
Capture the complexity and nuance
Generate recommendations that are practicable and trusted
The art of making priors
A Bayesian model is a collection of probability distributions. Before we introduce any data, these distributions are called prior distributions. We can use priors to construct a solid foundation for learning from data. The foundation should encode what we know about the relationship between advertising and outcomes. This knowledge can come from a wide range of sources — other models, sector benchmarks, domain knowledge and importantly, from experiments.
We can use priors to do away with the crazy, time consuming NHST loop that analysts get into where they lean too heavily on small amounts of noisy data and the ritual of null hypothesis statistical testing, and burn through weeks trying to get a stable model that makes reasonable recommendations.
Instead, we should use priors to embed current thinking about what a reasonable marketing plan looks like, and use the data to pull us in one direction or another, learning (updating priors) as we go.
This is like regularisation in machine learning - where we anchor parameters to zero to avoid overfitting on noisy data, but instead of ‘shrinking’ towards zero, we’re pulling the parameters towards the status quo.
Because Bayesian models are generative, we can use our priors to simulate optimised plans and scenarios to sense-check it aligns with our thinking.
Dummy example
Fig. 1 shows prior response curves, constructed around a current proposed annual allocation between two channels, TV and Outdoor. Fig 2. shows the posterior response curves for each channel, after updating with data (or equally, experiment results). We see that the curve for Outdoor has dropped - this implies that we’ll want to reduce spend on this channel.
It does not say ‘there is no evidence the effect of outdoor isn’t different from zero, so stop spending on it’. The key thing to remember is that we are not simply surrendering to the data and asking what it thinks we should do. We’re constructing a framework that uses available information to guide us towards the best decision.
And of course, we can continue to update the curves with new data in an ongoing evolving chain, with the previous model’s posteriors acting as our new priors. This massively speeds up the model refresh process, meaning we can continually report stable, trusted results, confident they haven’t been totally thrown off course by some outliers.
No one cares about uncertainty
Bayesian MMMs generate a huge amount of outputs (‘exhaust’, as Richard McElreath calls it). Which can be daunting. It can also be tempting to bludgeon clients to extreme boredom talking about uncertainty and credible intervals.
In my view, there are three key principles for forming recommendations using this framework:
Guardrails, not rules
Instead of presenting a set of optimised budget allocations, we should present the range of optimised plans, acknowledging the uncertainty.
These can act as guardrails within which planners can move depending on their other objectives and ambitions.
Balancing risk
Is there a tradeoff between expected return and variance across different recommended plans? Can we present a menu of options for different brand risk profiles?
Balancing future learning
What is the optimal allocation to maximise long-run profitability, balancing exploiting what we know works in the short-run vs. long-run learning?
Concluding thoughts
The purpose of analysis is to guide decision making. This guidance should be framed in the context of existing business knowledge and should be presented as guardrails, sensitive to risk and other constraints beyond the scope of your work.
Bayesian methods allow you to develop sophisticated tools oriented around the goal of better decision making. For MMM, they allow us to move away from the framework of null hypothesis significance testing and dubious practices like ‘p-value hacking’, and instead answer directly the questions decision makers want us to answer.
About me
I am a data / marketing scientist and statistician, with 15+ years experience building models to solve problems. My focus areas are Bayesian modelling, customer analytics, causal inference and simulation. Client list includes Scribd, the Economist, ITV and Elvie.
I’m always keen to discuss and debate things statistical, especially Bayesian and especially MMM - so get in touch!