Using data and R to optimise price

Let’s imagine you’re a coffee shop chain and you want to know how to optimally price your flat whites to maximise profit. You’ve got some historic data on price and sales, and maybe some external data on weather or macroeconomic variables. How do you proceed?

In this post we give a simple example of how to estimate price elasticity and how to optimise pricing using R.


Write down a model

Models capture how variables (price and sales) are related to one another. Writing down the model structure is an essential first step to understanding how to analyse the data.

Here we have price and sales and maybe some other variables. Here is a simple model:

$$Sales = \alpha + \beta_1Price + \beta_2Weekday + \epsilon$$

In this model we have two predictor variables: price and a binary indicator variable for whether it's a weekday or a weekend.


Explore the data

Exploratory data analysis is an iterative process whereby you get to know the variables in your data, identify data issues (missing data, outliers) and discover patterns and relationships.

Key to exploratory data analysis is data visualisation. A great R package for pre-modelling variable visualisation is GGally. This plot is created easily using the ggpairs( ) function (check out our R package of the week blog post on GGally!).

Screen Shot 2018-07-24 at 13.04.31.png

We can see from this that y (sales) is right-skewed (top left distribution) and price p (price) looks about normal (middle distribution). There is a correlation of -0.6 between the two variables and the scatter plot shows this negative relationship (mid-left plot). We can see from the wkday variable (1 if it's a weekday, 0 if it's a weekend) that sales look higher on average on weekdays (top right boxplots).

Another crucial phase in exploratory data analysis is data transformations.

Variable transformations are used to:

  • Normalise non-normally distributed data

  • Linearise nonlinear relationships so we can get good coefficient estimates

  • Help ease model coefficient interpretation.


The most common variable transformation is to take the natural logarithm. This is especially common practice in demand modelling because it gives us an elasticity estimate that is constant no matter what the price level.


Here we're taking log's of both sides, sales and price (a so-called log-log model):

$$ln(Sales) = \alpha + \beta_1ln(Price) + \beta_2Weekday + \epsilon$$

The coefficient on price in this model gives us the price elasticity.


Elasticity is the % change in sales for a 1% change in price. If our elasticity estimate is < -1 when we say it is ‘elastic’: a 1% increase in price leads to a less than 1% fall in sales. This means sales revenue (price * sales) increases when price falls. Conversely, if elasticity is -1 > e < 0 we say it is ‘inelastic’: a price fall leads to a fall in revenue.

Fit the data to the model

A simple single line of code fits a linear regression model in R. Here y is a vector of sales, p is a vector of corresponding prices and wkday is a binary variable indicating whether it's a weekday or not.

lm1 <- lm(log(y) ~ log(p) + wkday)

Giving the model output:

Screen Shot 2018-07-24 at 13.15.22.png

We can see the log price variable has a coefficient of -1.2. What does this mean? This means that a 1% increase in the price level leads to a 1.2% fall in sales. This means price is elastic: revenue will increase if we drop price. Of course, as we drop price and sales increase, costs increase, so the profit maximising price is not zero.


Plotting a Demand Curve

We can use our model to plot a demand curve: that is, predicted sales at different prices, controlling for other confounding factors (in this case, day of the week).