Thinking more deeply about Analytics, OR & AI with The OR Society
I had the pleasure of attending & speaking at The Operational Research (OR) Societies’ ninth annual summit (Analytics & AI Summit 2021 or #AS21).
Due to the global pandemic & UK lockdown, this was held online but was nonetheless well worth attending & engaging. A great line-up of technical expertise gave delegates time to think more deeply about their work.
Having previously spoken at the London Business School with The OR Society, I was glad to also join them for this national event. Many may assume that such a society is only concerned with the more limited remit of traditional Operational Research. My experience was quite the reverse, there was content here of relevance to most Analysts & Data Scientists.
How can you measure & improve fairness in your models?
In a time when society is thinking more critically about many forms of potential bias (racial, gender, etc) and has had reason to distrust algorithms the first talk was very timely. Mark Somers, Managing Director of credit risk consultancy 4Most Europe, explored the concept of fairness.
First Mark explored the philosophical question of what we mean by fairness. It is more multidimensional & complex than you think at first. he also usefully explained why people ate often not happy with an algorithm-driven rather than human-made decision. He highlighted 3 reasons:
- Only humans can be sanctioned in a way that feels like a disincentive
- Only humans understand wider context & social taboos
- Only humans can be expected to have empathy in decision making
To address this gap, Mark suggested a number of more tangible proxies for fairness that could be measured & thus considered by a model. These are a useful checklist to consider when reviewing your model inputs
- Causal (a helpful reference to Judea Pearl’s guidance on causality)
Mark also presented a Decision Tree he has produced to help guide modellers through points to consider when reviewing model parameters for fairness. The example model he reviewed made the case well for simplifying models (even with a small loss of accuracy) to both improve fairness & improve how well generalises for different scenarios.
To read more of Mark’s thinking on this important ethical topic, consider downloading his paper available here:
Summarising the People Skills that analysts need to have impact
Following, Mark, I shared a brief summary of the Laughlin Consultancy 9-Step Model of the Softer Skills that analysts need. I both shared my personal experience of why these skills matter so much & how other guests on the Customer Insight Leader podcast have echoed that thinking.
This brief talk summarised the meaning of each step in my model and provided a tip for the 6 steps within the Contracting & Delivering stages. You can read more about the relevance of this model to all analysts in this earlier blog post.
I have also uploaded a copy of the slides I presented to my SlideShare channel:
The Data Science of Hollywood Movies (using Emotional Arcs)
A fascinating presentation followed my talk, one that took us into the glamorous world of Hollywood blockbuster movies. Ganna Pogrebna, Lead for Behavioural Science at the Alan Turing Institute, shared work they have done to help movie makers create more successful movies.
Past research by scriptwriters had proposed that there were just 6 different emotional arcs used in the narratives of films. Ganna and her team use Text Analytics (sentiment analysis) to identify the emotional arcs of over 6k movies. Interestingly they initially sought to use all 157k listed on IMDB, but data preparation & clustering meant the useful data for analysis reduced to just 6k (a helpful reminder of the vital need for that step).
Anyway, here analysis (outlined in a very popular paper in The Journal of Operational Research Society) confirmed that previous qualitative research. They identified 6 clusters with consistent patterns to their emotional arcs:
- Rags to Riches (emotional arc of film is a rise in positive emotions)
- Tragedy (fall)
- Man in a Hole (fall then rise)
- Icarus (rise then fall)
- Cinderalla (rise then fall then rise)
- Oedipus (fall then rise then fall)
Modelling the compound effect of emotional arc & budget, the most profitable template is “Man in a Hole“. This is not the most popular with critics or fans, but it is the most talked about. It also produces the greatest return for your spend. Interesting multivariate analysis also enabled them to guide scriptwriters on elements needed for other genres to succeed.
Ganna also shared her analysis of the effectiveness of those brief film trailers. Her modelling was focussed on the complex metric of Emotional Valence (following Russell’s research of the importance of Arousal & Valence for emotional impact). Such brief trailers contain video, music & text, so the impact of all three was cleverly combined into a Cumulative Valence measure. Interestingly her analysis showed that the trailers that resulted in best revenue on first night were emotional rollercoasters.
If you ever wanted to share with someone a sexy use case for advanced analytics, this is surely a great example. You can read more about Ganna’s work in this article she shared on LinkedIn:
Experian presentation on Open Banking
To bring us back down to earth, our next presenter explored the world of Open Banking & why data analytics is needed there too. Ganna’s movies presentation was a tough act to follow, but Marilena Karanika did a good job of bringing to life the importance of this change for consumers.
As I have shared before when reviewing Tony Boobier’s book, Open Banking gives people the right to share their (UK regulated bank) banking data with other regulated providers. This can help make switching easier but it should also enable consumers to have a better view of all their finances. Marilena brought to life how this can help enable fairer (that word again) lending decisions & make form filling less tedious.
A part of this new world that is too rarely talked about is the categorising of financial transactions (in order to better understand behaviour). She shared Experian’s model for doing this, which is a new (to me at least) nomenclature of being Categorisation as a Service (CaaS). They manage to categorise banking transactions into 180+ categories with 95% accuracy & have an aggregation of these categories to provide consumer insights.
She shared a number of the challenges of such categorisation work, including a limited data picture (due to consumers not opting in all their accounts or not all providers yet being online). Another challenge is new providers & providers changing the descriptions they use in transactions. Much of what she shared reminded me of the challenges we faced when doing this work decades ago in Lloyds Bank, plus the benefits I saw for insurers. You can find out more about this service here:
Jack from Transport for the North
Our next presenter was the very engaging Jack Snape, who is Principal Data Analytics and Modelling Officer at Transport for the North. Jack gave us a real insight into the complexity of what his team need to model & forecast. He also provocatively raised the question of whether analytics can really help.
It was fascinating to see an overview of the Forecasting System they have developed. It includes both a tier focussed on modelling economic & land use and a tier modelling transport solutions & usage. These are joined together by a Python-based integration (or ‘translator’) tool to help model implications of changes.
Jack also helpfully explained the complexity of what they need to model compared to other common modelling challenges. He summarised these as:
- Small world systems (e.g. use cases for Probability Theory)
- Large complex world, but Stationary, systems (e.g. weather forecasting, requiring many layers of mathematical modelling, but laws of meteorology are consistent)
- Large complex world & Non-Stationary system (e.g. transport for a changing world when you don’t know how it will change in future)
Predictive Analytics does however help Jack’s work, through the building of simplified meta models. The first step is to have identified four (validated) viable future scenarios for need: Just About Managing; Prioritised Places; Digitally Disrupted; Urban Zero Carbon). Then they build synthetic large data sets to represent such a future world. Then they simplify the dimension into a meta-model. They test that model against the synthetic dataset to ensure it fits & generalises well. Finally, they can then use that model as a way to test out hundreds of transport options & see a forecasted impact. A clever approach, worth chatting to Jack about if relevant to you.
How are you ensuring your AI is validated & reliable?
Finally, we heard from the brilliant Prof David Hand from Imperial College, London (and former president of the Royal Statistical Society). He shared advice on the need to think about AI or Model validation & reliability.
His talk was structured around the 3 fundamental questions we need to ask ourselves when considering the suitability of a specific algorithm or modelling approach:
- What are we trying to do?
- What data do you have?
- How do you choose, validate, monitor & update your model?
David shared many examples that are relevant for modellers & data scientists. Highlights for me were recognising that different questions drive the stability of different designs/errors/sample sizes etc. Plus how to deal with the challenges of both ‘the data exhaust‘ & dark data.
He culminated in advising the kind of model effective monitoring that I recall happening at Lloyds Bank and can be so important. David also warned us to be cautious when using “off the shelf” algorithms; to be alert to the risks of default settings & biases within what may be a black box.
Some final questions for modellers to consider both in initial design & ongoing monitoring were also very timely:
- How will your model respond to unexpected changes in world/data?
- How will AI & humans work together? (allow for human error/context)
- How will AI systems work together? (IT ecosystem context)
- How will AI systems be updated? (when monitoring shows needed)
So many reasons to read this brilliant academic & prolific author:
Well worth attending and a society whose events you should follow
I hope that debrief was useful for you & gave you an insight into what an engaging online event this was. One to help your CPD as an analyst, statistician or data scientist. Something worth focussing on during confinement or working from home.
Having presented at two previous OR Society events, I recommend checking them out, especially the Analytics Network within that community. Who knows, perhaps you will even be persuaded to become a member. You can find out more about all their events here: