clustering
April 21, 2021

You might be wasting your time with clustering for segmentation

By Nick Radcliffe

Does your team make use of clustering & segmentation in your business? Many analytics teams will answer in the affirmative. Several data leaders will be proud of the customer segmentation they have delivered. Perhaps it has been enthusiastically adopted as a new language by your marketing team.

But you might be barking up the wrong tree. It often helps our thinking to listen to contrary voices. In this blog post, Prof. Nick Radcliffe presents the statistical case against customer clustering. It is well worth hearing, not least because of Nick’s robust statistical thinking. His previous blog posts helped us retain a sense of scale in our thinking. As a reminder, Nick is a visiting professor of Maths at the University of Edinburgh, as well as CEO of Stochastic Solutions & Data Lead for the Global Open Finance Centre of Excellence

Whether you agree with Nick or not, I recommend that analytics leaders & data scientists read his series of posts. Below I have shared his outline post on this topic. At the end of his summary, I include links to his blog, so that you can read the 3 full-length posts that he has published so far. Once you have, I’d love to hear your response. There is a debate to be had here…

The universe is full of clustering

It’s pretty obvious the distribution of matter in space is ‘lumpy‘. Matter clumps into planets, planets orbit stars, stars clump into galaxies, galaxies group together into clusters, and―wouldn’t you know it―clusters form superclusters. Thank gravity for that. There are clumps of matter and other things at smaller scales too.

Atoms are mostly empty space, with a lot of stuff at the centre (the nucleus), and electrons like to hang around at particular distances away from the nucleus (the shells). Though it’s hard to pin them down. Similarly, people clump together on the Earth. London, Tokyo & Sao Paulo are pretty crowded. The Sahara, the Highlands of Scotland & central Australia mostly aren’t. People are quite hard to pin down, too.

Cluster analysis is a set of techniques for taking the coordinates of a lot of objects (stars, particles, people…) and figuring out something about where the lumps are. There are lots of ways to do it.

But is customer clustering such a great idea?

Someone, deep in the mists of time, had the bright idea of applying cluster analysis to customers to figure out “where the clumps are”. The idea wasn’t to use geographical coordinates (of their houses, say) but to replace coordinates with customer characteristics. For instance, demographics (age, income etc), behavioural measurements (spend levels, frequencies, balances etc) or maybe attitudinal things like psychographics. That way, they thought, they might uncover the “natural groupings” of customers. Those could be useful for understanding their dynamics and for segmenting them.

While it was far from a stupid idea, it turns out that it was an extremely bad idea. It is one that at best has wasted countless thousands of hours of analyst time, and at worst has led to baseless conclusions and highly suboptimal marketing.

There are far too many problems to do justice to in a single blog post, so I won’t. Instead, I’ll list some headlines here, and via links at the bottom of this post I’ll go into more detail.

Why customer clustering is a bad idea

Here are some of the headline reasons that clustering customer characteristics isn’t useful.

  • There’s no real evidence that customers cluster.
  • Different customer characteristics are non-commensurate.
  • Circularity: practitioners think they’re just finding “the natural clusters”, but in fact, the results are entirely dictated by decisions made up-front (often without realizing it) about scaling. Different choices lead to different scalings, so clusters are unstable.
  • The curse of dimensionality means that clustering doesn’t really work in more than a few dimensions.
  • Clustering is undirected.
  • Clusters are hard to interpret. So people give them names. And then the names become the meaning.
  • For (almost) every problem tackled with undirected clustering, there’s a directed approach that will (almost always) work better.

For the avoidance of doubt (as lawyers say), and notwithstanding the impression the title may give, the problem isn’t with cluster analysis per se. That is a perfectly fine collection of statistical techniques. If you want to find the clumps in a low-dimensional space with commensurate dimensions, it’s exactly what you need. It’s just that that isn’t a very good description of a customer base.

Here are the detailed posts that I have posted so far:

Opening the customer clustering debate

Thanks to Nick for making a robust case against clustering customers. As a data leader who is proud of some of the customer clustering & segmentation work delivered by my team, I’m tempted to offer a rebuttal. But, it would be better to hear from today’s practitioners. Analysts or Leaders, does anyone want to make the case for clustering your customers (despite Nick’s health warning)?

Please get in contact with me directly to offer a blog post in response or add your comments using the boxes below. I look forward to the intelligent debate. Thanks again, Nick.

P.S. This blog post was originally published on The Scientific Marketer.