Subscriber Lifetime Value with Hierarchical Bayesian Survival Analysis

4 Apr

We’ve been doing a lot of work in the area of customer lifetime value modelling. It’s a satisfying domain to focus on, not just because of some interesting statistical challenges, but also because you can really see how it helps clients make better decisions.

We’ve worked on some interesting use cases -

‘how can we measure customer lifetime value (CLV) in new markets?’
‘how can we measure the impact of switching from quarterly to monthly subscription durations on profitability?’
‘how can we combine customer lifetime value analysis with marketing mix modelling / media attribution analysis, to properly understand the return on advertising spend?’

There are various approaches for estimating CLV, depending on the business context, the data and the types of questions you want to answer.

An approach we’ve pursued with a number of clients is survival analysis. This is powerful because it nicely balances accuracy with light data requirements, and the outputs are intuitive and easy to understand.

The survival analysis bit of CLV is all about estimating customer retention. The output is a survival curve: a curve that gives the probability of a customer ‘surviving’ up to that point or beyond.

Here we’re just focused on subscription businesses (‘contractual’ settings). We’ve combined survival analysis, with a probability model of subscriber retention, within a Bayesian framework. This has proved an extremely powerful approach.

Why survival analysis?

Survival analysis is used to model time-to-event data, focusing on the occurrence of events, such as death, product failure, or subscription cancellation. A key aspect of these models is data censoring - i.e. being able to include in your model observations who haven’t had the relevant ‘event’ occur yet (active subscribers). We want to keep information about these observations in our model - we know that they’ve ‘survived’ at least this long.

Two appealing aspects of survival analysis in this context are 1) you get a survival curve, which are easy to interpret and can be plugged directly into a CLV formula and 2) you only need one row of data per customer, rather than a row that describes each point in time in their history. This latter point is important and can save you months of stitching stubborn datasets together!

Why hierarchical bayesian?

Bayesian models, while sometimes tricky to implement, have a number of well-known benefits, especially in business contexts:

Directly model and communicate uncertainty
Include prior knowledge where you don’t have much data - either other results / data, human knowledge or benchmarks
Flexible model design

Hierarchical models in particular are useful, because you can generate estimates / predictions for segments where you don’t have many data points (e.g. a new market, new product or under-represented demographic), by anchoring them to segments where you do have lots of information / data. Estimates for your new market for example will drift away from the ‘all markets’ estimate as you acquire more data / knowledge about it.

Why probability models of retention?

There are some commonly used parametric (exponential, weibull), non-parametric (kaplan-meier) and semi-parametric (cox) approaches to survival analysis. But pegging your survival function to a specific, carefully specified probability model of repeat purchase and churn is a far more powerful approach. Crucially, they allow you to capture customer heterogeneity, which makes a big difference to retention projections into the distant future (and consequently, to LTV estimates).

These models (see various papers by Peter Fader and Bruce Hardie) describe each component in terms of a sensible probability distribution - with a backstory, if you like. For example, the beta-geometric model starts with a story about churn: at each renewal opportunity, subscribers flip a coin to decide whether to renew or churn. The geometric distribution captures how many coin flips before a head (churn). For each individual subscriber, the bias of the coin (churn probability) is the same - they don’t become more or less loyal over time. But we allow for heterogeneity in subscriber retention probabilities using the beta distribution.

Show me the money

We’ve got two markets, A and B. We’ve operated in market A for a number of years and have a good history of subscriber data. We’re fairly new to market B.

The objective is to estimate retention (and so CLV) for each market.

Here is the number of new subscriptions by month for the two markets:

Here are the lifetime plots for a sample of the subscribers. Those with green dots are censored - they haven’t cancelled yet, so we don’t know their final lifetime length.

Only variables we need in our dataset are:

tenure to date
cancelled (0,1)
market

We’ve developed a custom R package (centricity) for fitting these (and many other) CLV models. It’s built on top of the Bayesian programming language Stan.

The model is hierarchical, so the parameters for the beta-geometric distribution for each market are drawn from the same parent distribution.

Because we have many fewer data points for Market B, it’s estimated survival curve has much wider uncertainty bands.

And similarly, when you plug the money numbers into the CLV formula you get more uncertainty around the value for Market B than Market A.

There are a few ways to properly validate these models e.g. splitting the data into a calibration period and a forecast period for a given set of cohorts and tracking cumulative renewals.

This plot simply shows the actual number of ‘full’ tenures vs. predicted for some artificially censored observations.

In Market A, where we have a lot of data points, the model fits the data well. In Market B where we don’t have many data points, our Bayesian model is preventing overfitting to noise - we get a sensible curve.

Concluding thoughts

Customer lifetime value is a hugely valuable metric. We can use it to track performance, to predict residual customer value, to calibrate acquisition and retention costs and to optimise marketing spend. It is core to the notion of customer centricity.

A Bayesian survival analysis approach balances the trade-offs between effort and accuracy, and between intuition / explainability and sophistication.

We have successfully deployed these models across a wide range of businesses, often within analytical tools that enable our clients to draw out insights, simulate decisions and optimise.

We have worked with a wide range of businesses on customer retention and lifetime value problems. Drop us a note to find out more about how we deliver value for our clients.

Get in touch

Duncan Stoddard

Subscriber Lifetime Value with Hierarchical Bayesian Survival Analysis

Why survival analysis?

Why hierarchical bayesian?

Why probability models of retention?

Show me the money

Concluding thoughts

Bayesian Shifted Beta Geometric model for customer retention

Unlocking the Power of Customer Lifetime Value: 5 Key Benefits for Your Business

DS Analytics & Machine Learning LTD