Is data science becoming a market for lemons?

28 May

We’ve noticed a concerning trend. More and more in enquiries or at pitches we’re being asked questions like ‘can you help us build a neural network?’, or ‘we want to use machine learning, what sort of data do we need?’.

This is like saying to a builder ‘we’re looking for someone who can use a Phillips screwdriver. Are you familiar with this tool?’. It’s not an invalid question per se, but it’s a very strange way to start the conversation. Why do you want to build a neural network? What problems are you trying to solve? The cart seems to have bolted off in front of its horse.

Why is this happening? I think the answer could lie in the economics.

The market for lemons

A ‘lemon’ in 1950s American automotive parlance is a rubbish second hand car, which the buyer can’t determine is rubbish until after they’ve bought it. Think Matilda’s dad stuffing sawdust into engines.

50 years ago, Nobel prize winning economist George Akerlof wrote a seminal paper that analysed markets under conditions of asymmetric information. That is, where the buyer has access to less information about the quality of the product or service than the seller. He called it ‘the market for lemons’.

In markets with perfect information, the buyer and the seller know the true quality of the products being sold. Good quality second hand cars (‘peaches’) are sold for some price, X, and rubbish second hand cars are sold for some lower price, Y.

But what if the buyer can’t distinguish between a peach and a lemon? Then the going market price for all cars will be an average of the two, Z. Z will be lower than X, but higher than Y. Any seller with peaches won’t be able to stay in the market at these lower prices, so will exit. Buyers will be left paying a high price for the only cars left, the lemons. Eventually, the theory goes, they’ll realise all the cars left in the market are lemons and the price will come down to Y. But they’re still only able to get hold of rubbish cars.

So, is data science becoming such a market? Are buyers struggling to distinguish between good and bad data scientists? Perhaps these questions about neural networks are misguided attempts at weeding out the bad ones. Opening the bonnet and poking the engine.

Certainly it seems likely there’s information asymmetry. The subject is complicated, spoken in the language of mathematics and beset by hype. But there’s also clearly an enormous amount of great work being done, delivering huge value to all sorts of different organisations in different sectors.

The problem with ‘data science’

I think part of the problem is the term ‘data science’ itself. It’s a vague, catchall buzzword that makes a very clumsy attempt at categorising what’s become a hugely multifaceted discipline. Those now labelling themselves ‘data scientists’ have tended either to come from a statistics background, or a computer science background, or perhaps have just got a new degree in data science.

Those data scientists from a statistics background tend to be good at using maths to understand cause and effect relationships. They can add huge value to organisations by telling them what to set their prices at, or how much to spend on marketing. They can predict when a water pipe will burst or where outbreaks of crime are likely to occur.

Those from a computer science background tend to be better at building software. Models sitting in a production environment that recommend what film you might want to watch next, or whom you’re most likely to swipe right.

Obviously there’s plenty of overlap and experienced data scientists have a solid mix of analytical, programming, business and communication skills. But it’s possible that buyers are often ending up with ‘lemons’ simply because they’re hiring the wrong kind of data scientist.

What can be done about it?

The fundamental challenges are education and communication. There’s a myriad of confusing, often highly misleading buzzwords. New technologies with exotic sounding names come blurring in and and often straight out of view every day. It is our job as data scientists to get better at communicating what we do, why we do it and why it’s valuable. And to teach new practitioners to resist the temptation to throw fancy sounding algorithms at every problem without first thoroughly understanding stakeholder needs.

At DS Analytics we help organisations get value from data. We build statistical and machine learning models, provide expert training and mentoring in data science, R and python and develop tools and software.

Get in touch to find out more!

Duncan Stoddard

Is data science becoming a market for lemons?

New publication looking at the impact of point-of-care genotyping on routine clinical practice

COVID 19 testing meets Bayes’ theorem

DS Analytics & Machine Learning LTD