Is Bayesian MMM worth the faff?

Key takeaways

  • Traditional MMM is being stretched beyond its already quite limited capabilities

  • Bayesian MMM can address these issues and generate huge additional value IF used properly

  • If all you’re doing is swap a Bayesian model in place of an old school model, you’re getting all the faff and none of the benefits.

  • We can use priors to do away with null hypothesis significance testing, which wastes huge amounts of time and doesn’t answer the questions anyone cares about

  • Care is needed when we generate recommendations from Bayesian MMMs - and no one cares about ‘uncertainty’!

Introduction

Marketing Mix Modelling (MMM) is once again the de facto choice for marketers who want to understand and optimise their marketing investments.

This is both good and bad. Good because there is widespread appetite for statistics-backed decision making. Bad because many of the poor practices in traditional MMM are still with us. And as the breath and depth of questions asked of these models increases, they’re being stretched beyond their already quite constrained capabilities.

Amidst this resurgence of interest in MMM is a resurgence of interest in Bayesian modelling, long thought to be ideal for the problem. Bayesian models can be hugely valuable - more accurate, nuanced and abundant in output than traditional MMMs. But this comes at a cost, and care and expertise are needed to extract the benefits.

If all you’re doing is swap a Bayesian model in place of an old school model, you’re getting all the faff and none of the benefits.

In this post, I describe how to think about this problem and how to tackle it using Bayesian analysis.


The limitations of the old school

Old school MMM relied on a few key tenets. Gather 3 years+ of weekly data. Apply some variable transformations - diminishing returns, adstocks. Build a regression model, attempting to control for confounders like seasonality or economic fluctuations. Then use the framework of null hypothesis significance testing (NHST) to determine whether your sample of data contains enough information to reject the claim that advertising has no effect on sales across the channels you care about. Ideally do some model validation. Finally, use your model to calculate return on investments and optimal budget allocations.

There are two fundamental problems with this approach. 1) the signal is usually a faint whisper amongst a cacophony of noise, so rejecting the marketing has no effect claim is tricky, and 2) no one asked you that in the first place; they asked what data can do to guide us towards a better decision.

Leaning too heavily on a small sample of data and the already not fit-for-purpose framework of NHST leads to dubious conclusions and an enormous amount of wasted time.

A framework for learning and scenario planning

Marketers don’t care about the methods, they care about making better decisions. We can better orientate the analysis around that goal by thinking about the problem not as one of building a bunch of models to estimate ROIs, but as building a framework for learning and scenario planning. 

This framework should be able to do the following:

  • Learn from the best information we have available and to keep on learning in a stable manner

  • Capture the complexity and nuance

  • Generate recommendations that are practicable and trusted

The art of making priors

A Bayesian model is a collection of probability distributions. Before we introduce any data, these distributions are called prior distributions. We can use priors to construct a solid foundation for learning from data. The foundation should encode what we know about the relationship between advertising and outcomes. This knowledge can come from a wide range of sources — other models, sector benchmarks, domain knowledge and importantly, from experiments.

We can use priors to do away with the crazy, time consuming NHST loop that analysts get into where they lean too heavily on small amounts of noisy data and the ritual of null hypothesis statistical testing, and burn through weeks trying to get a stable model that makes reasonable recommendations.

Instead, we should use priors to embed current thinking about what a reasonable marketing plan looks like, and use the data to pull us in one direction or another, learning (updating priors) as we go.

We should use priors to embed current thinking about what a reasonable marketing plan looks like, and use the data to pull us in one direction or another

This is like regularisation in machine learning - where we anchor parameters to zero to avoid overfitting on noisy data, but instead of ‘shrinking’ towards zero, we’re pulling the parameters towards the status quo. 

Because Bayesian models are generative, we can use our priors to simulate optimised plans and scenarios to sense-check it aligns with our thinking.

Dummy example

Fig. 1 shows prior response curves, constructed around a current proposed annual allocation between two channels, TV and Outdoor. Fig 2. shows the posterior response curves for each channel, after updating with data (or equally, experiment results). We see that the curve for Outdoor has dropped - this implies that we’ll want to reduce spend on this channel.

We are not simply surrendering to the data and asking what it thinks we should do

It does not say ‘there is no evidence the effect of outdoor isn’t different from zero, so stop spending on it’. The key thing to remember is that we are not simply surrendering to the data and asking what it thinks we should do. We’re constructing a framework that uses available information to guide us towards the best decision.

Fig 1: Prior response curves, constructed such that the ‘null’ hypothesis is the current proposed plan

Fig 2: Posterior response curves - Outdoor is dragged down, but not to zero, as might be the case with traditional MMM - we’re regularising towards what we believe.

And of course, we can continue to update the curves with new data in an ongoing evolving chain, with the previous model’s posteriors acting as our new priors. This massively speeds up the model refresh process, meaning we can continually report stable, trusted results, confident they haven’t been totally thrown off course by some outliers.


No one cares about uncertainty

Bayesian MMMs generate a huge amount of outputs (‘exhaust’, as Richard McElreath calls it). Which can be daunting. It can also be tempting to bludgeon clients to extreme boredom talking about uncertainty and credible intervals.

In my view, there are three key principles for forming recommendations using this framework:

  1. Guardrails, not rules

    • Instead of presenting a set of optimised budget allocations, we should present the range of optimised plans, acknowledging the uncertainty.

    • These can act as guardrails within which planners can move depending on their other objectives and ambitions.

  2. Balancing risk

    • Is there a tradeoff between expected return and variance across different recommended plans? Can we present a menu of options for different brand risk profiles?

  3. Balancing future learning

    • What is the optimal allocation to maximise long-run profitability, balancing exploiting what we know works in the short-run vs. long-run learning?

Fig 3: Fried egg!

Here we are simulating thousands of optimised budget allocations between the two channels using the posterior distributions.

The ‘fried egg’ shows the spend region which most of the simulated scenarios recommend. The black ‘horizon’ line shows the spend levels giving ROI = £1 (using the means of the distributions).

We can see the highest (bluest) point is a total budget of around £25m, with a split of 1:2 Outdoor : TV.

We can see that the proposed total budget and split between TV and Outdoor is somewhat outside the region of optimality, but well within the region where campaign ROI > £1.


Concluding thoughts

The purpose of analysis is to guide decision making. This guidance should be framed in the context of existing business knowledge and should be presented as guardrails, sensitive to risk and other constraints beyond the scope of your work.

Bayesian methods allow you to develop sophisticated tools oriented around the goal of better decision making. For MMM, they allow us to move away from the framework of null hypothesis significance testing and dubious practices like ‘p-value hacking’, and instead answer directly the questions decision makers want us to answer.

GET IN TOUCH to arrange a Bayesian MMM demo!


About me

I am a data / marketing scientist and statistician, with 15+ years experience building models to solve problems. My focus areas are Bayesian modelling, customer analytics, causal inference and simulation. Client list includes Scribd, the Economist, ITV and Elvie.

I’m always keen to discuss and debate things statistical, especially Bayesian and especially MMM - so get in touch!

Previous
Previous

3 habits of highly effective analysts

Next
Next

How to estimate LTV:CAC using Bayesian models