Art of Statistics
February 16, 2021

Mastering the Art of Statistics with this brilliant book

By Paul Laughlin

Continuing our focus on statistics this month, here is a review recommending “The Art of Statistics by Prof David Spiegelhalter.

You may well have already heard of the Faraday Prize-winning author, due to his regular TV appearances. He is also a regular guest on the BBC Radio 4 programme “More or Less“. It is no surprise to see him in demand from the media due to his ability to explain even more complex or nuanced statistics.

His latest book “The Art of Statistics, Learning from Data” is a valuable contribution to public understanding. It manages to achieve the difficult balance of both being a useful introduction for the novice & a useful reminder/challenge for those with a statistical background. Because of that, I recommend it for data & analytics leaders for a few reasons, which I’ll highlight at the end of this review.

First, let me give you an overview of what to expect from this book.

Applications of statistics to real-world problems

An engaging theme throughout this book is David sharing insights into the work he has done to help with public inquiries or criminal cases. This really helps to bring to life the wealth of issues & questions that require statistics. Lifting this material above being a theoretical textbook to one that inspires us to spot where statistics may be needed to help us understand what is really happening.

Cases cited (often with graphs of data) include:

  • Inquiry into Harold Shipman murders
  • Inquiry into child deaths at Bristol hospital
  • News report of risk eating bacon sandwiches
  • Report on number of lifetime opposite-sex partners
  • Report on relationship between parent & child heights
  • Inquiry into accuracy of breast cancer screening
  • UK Election polling
  • Sex ratio of UK births
  • Inquiry into doping in Sport

A tip here for analytics leaders is to find ways to share some analysis into topics that are currently newsworthy in your business or sector. With a subject like statistics that can be intimidating for many, it pays to hook people with relevance first.

A thorough grounding in the key statistical concepts

But beyond being relevant, Prof Spiegelhalter’s book is a usefully thorough introduction to Statistics for the layperson or new analyst. Step by step, with the help of examples, he explains the key concepts that underpin most mainstream statistical practice.

Aside from more challenging chapters (9 & 10) on the topics including Confidence Intervals, Central Limit Theoreum, Hypothesis Testing & P-Values – this text is very accessible if read in order. David’s experience as an educator shows through as he builds principle on principle to help us understand concepts. Even including introducing a fresh perspective to inference with the best introduction to Bayesian Statistics that I’ve read.

The statistical concepts that you can learn from this book include:

  1. Proportions, Relative Risk, Expected Frequency & need for Data Viz
  2. Distributions, Summary Statistics, Data Transformations & Factors
  3. Inductive Inference, Biases, Sampling, Normal distribution
  4. Correlation vs Causation, RCTs, adjusting for background factors
  5. Regression Models, Coefficients, regression to the mean
  6. Algorithms, over-fitting, black boxes & AI models
  7. Uncertainty intervals, Bootstrapping, sample statistics
  8. Probability Theory, chance, predictability & uncertainty
  9. Confidence intervals, Central Limit Theorem, Margins of Error
  10. Null hypotheses, P-value, hypothesis testing, Neyman-Pearson theory
  11. Bayesian methods, prior & posterior odds, likelihood ratio, Factors
  12. Poor statistical practice (questionable interpretation & deception)
  13. How producers, communicators & audiences can do better

A call to all data & analytics leaders to think more methodically

Prof Spiegelhalter concludes his book with another theme that pervades each chapter. A call on all of us to think more critically about both the statistics we read & those we produce. I advise anyone reading this book to make a copy of the 10 rules for better practice in the concluding chapter. A great summary & aide-memoire for leaders to remember day-to-day.

This echoes David’s early recommending of the PPDAC cycle. I’ve lamented previously the lack of using appropriate methodologies in too much data science practice. the same charge could be levelled at most analytics teams. What is needed is not just an agile working methodology for project management, but more methodical analytical thinking.

I agree with his recommendation to use the PPDAC cycle, which is:

  • P = Problem (get really clear on the question you need to answer
  • P = Plan (what measure? how to collect? analysis design?)
  • D = Data (collection, permission, cleaning, prep/transform)
  • A = Analysis (EDA, Data Viz, hypothesis generation & testing)
  • C = Conclusion (interpretation, next steps, communication/Data Viz)

Don’t miss out on this great statistics companion

As I mentioned earlier, I think that the genius of this book is how it operates both as an introductory text & a challenge to practitioners. That’s one reason why I heartily recommend data & analytics leaders buy a copy of this handy-sized paperback. Then, when we go back to public meetings, it is a useful guide to keep with you both as an aide-memoire & to lend to your peers in a business to help improve their statistical literacy.

It’s also always a pleasure to see when experts appreciate one another’s work & I was pleased to see both the praise for Tim Harford’s work (author of the next book I’m reading) and Alberto Cairo. I’ve reviewed Alberto’s “How Charts Lie” and it is an ideal companion to carry with you. Then you can help others improve their graphically too & with the twin weapons of statistics & data viz battle fake news wherever you go.

Enjoy your statistical thinking journey!