More thinking to save money – reduce, reuse and recycle your data
Continuing our thinking about ways for data leaders to save money during a recession, this post drills into saving on your data usage.
Following my last post reminiscing on the lessons I learned during past recessions, the early environmentalist slogan “reduce, reuse, recycle” has stayed in my mind. Beyond the workload and team thinking in my last post, I have begun to muse about how those approaches could apply to data.
Such reflection also caused me to recognise that some of the abundances of this century has perhaps become a pitfall. What I mean is the growth in available computing power & increase in the affordability of massive data storage. Now, please don’t think I’m a Luddite. I don’t regret either of these technological breakthroughs. But since scarcity is often the mother of invention, the counterpoint must surely be that an abundance of supply can lead to lazy thinking or stasis.
How could we think differently about our use of data?
To aid us in avoiding such a pitfall & to potentially save money in the process, I recommend noticing a number of converging forces. Organisations & their leaders these days need to concern themselves with a number of expectations & constraints. Let me call out just three.
Firstly, environmental concerns with regard to energy consumption (including data centres and thus the scale of data storage). Secondly, ethical & data protection concerns & regulations to curb the inappropriate use of especially personal data. Thirdly, economic realities, including reduced budget & the organisational search for ways to cut costs without reducing quality or output.
Rather than becoming depressed by the weight of expectations on today’s data leaders, I recommend seeing these as a classic Venn Diagram. In other words, I believe they are overlapping sets of interests. Suggesting that data leaders should aim for the proverbial ‘sweet spot‘ of addressing all 3 at once.
Inspiring an approach with Reduce, Reuse, Recycle
So, let me lay out my initial thoughts on this, They are far from complete or wise, but I hope they point the way to better brains thinking their way to a better solution. I will apply the phrase above as a framework to share what I believe could be 3 ways to address the 3 different concerns identified above.
(1) Reduce your use of data
We are so far down the yellow brick road on the way to Big Data City, such a title sounds like heresy doesn’t it? Although I feel like I am in good company, even Dr. David Speigelhalter in his book “The Art of Statistics“ shares ways that statistical thinking was more robust when analysts had to sample data. I just suggest that we have slipped into mindless storage of everything, just in case it helps.
I believe this mindset is breaching the concerns of all 3 of the communities listed above. Who knows how much energy is being wasted (and carbon released) by data server farms storing data that will never be used? Plus, I’m certainly aware that too many organisations have ‘survived‘ GDPR rather than radically changing the way they think about what data is really needed to meet their customers’ needs. This must be an opportunity to save money, if only on the growing costs of cloud provider subscriptions.
Although this will not be welcomed by many, I recommend going back to basics in the same way that I have seen work well for BI. Most organisations these days have too many dashboards & reports. Most are not being used to drive decisions & actions. So, BI leaders have learnt that periodic pruning is needed. Stop the automatic issuing of dashboards & see who complains. Such thinking could also be used to delete data that is not being used. Challenge analysts to either finally get around to their planned analysis/model building or lose the data being held ‘just in case‘. Then calculate the cost saving.
(2) Reuse your data & datasets to meet other needs
One of the useful insights in Bill Shamrzo‘s book “The Economics of Data, Analytics and Digital Transformation” was to watch out for opportunities for reuse. What Bill means by this tip is spotting opportunities to reuse data or analytics. Akin to the call I made in my last post for secondary research type thinking in the world of analytics.
I was recently interviewing a data leader for my podcast and he mentioned the need to think outside silos. He identified risks of myopic thinking in terms of role, function, business, sector & geography. Analysts & Data Scientists need to be mindful of creating value by maximising reuse of data & existing analytics. Identifying transferrable processes, customer understanding, models, transformations & insights – that could help with other business challenges.
A good discipline here can be to develop a common modelling dataset. Drawing from the wider pool of data still held in data lake or similar approach. Developing a sort of corporate memory by starting with a narrow high quality dataset and only adding to it as variables prove to add value. But also including those which prove important when explaining context or identifying implications. As well as it being less costly to optimise performance for this smaller dataset, it helps prompt analysts with what to consider using.
(3) Recycle your data by applying the scientific method
A number of my guests on the Customer Insight Leader podcast have come from a science background. Not just science degrees or PhDs but in some cases successful careers in academia. When you talk with such data leaders you begin to spot some of the rigour that they can bring to data science functions. Very much in line with the call for a workflow that produces repeatable results in Enda Ridge’s helpfully practical book “Guerrilla Analytics”.
Within academia, at least for those focussed on published peer-reviewed research, there can be a greater focus on the Scientific Method. That is a methodology with a robust feedback loop. Not just deploying a model and then moving on to the next project. Rather ensuring accurate data capture of the impact/outcome of such an intervention and its continued effectiveness.
These concerns are most often raised with the goal of improved statistical robustness. But there is also a cost saving angle. Implementing effective feedback loops, coupled with use of ceremonies like retrospectives, will improve future assumptions, analytics & models. Often reducing the time taken to investigate from scratch. Continual monitoring of the performance of all deployed models/data products can also automatically identify when they need rebuilding. This too offers cost savings both in terms of avoiding unnoticed reduction in benefits and in avoiding unnecessary rebuilds where still working.
Ok, so that won’t be right, but what could work?
I am painfully aware that I share the above as someone who is a lot less hands-on these days. I talk regularly with my client & friends, but that is not the same as still leading a data or analytics function. So, I’m not going to pretend that my thinking above can be implemented as outlined. I recognise that teams, processes, organisations & indeed the world is not that simple. But, I have explained my thought process well enough to inspire yours.
You are close to the action. You have your own data teams or are a specialist working within one. What would you recommend? What are the pragmatic opportunities to save money here (whilst also being more sustainable & ethical in our data usage)? I look forward to hearing about your brilliant ideas. It would also be great to hear how much money you save your organisation by implementing them.