Questionable practices
August 23, 2017

Questionable practices by Data Scientists – where do you stand?

By Paul Laughlin

Just because you can, doesn’t mean you should; this is perhaps the simplest way to introduce questionable practices.

Last year we posted on the desire amongst many data scientists to achieve social good through their work. As for all disciplines, there is also a potential “dark side to the capabilities of data scientists. So firms have or still do seek to use data as a weapon or persist with questionable analytical activities.

In this post we will explore 3 examples that should prompt your own reflections on data ethics and any implications for you.

I’ll share on uses of data science for: employee surveillance; winning elections & proliferating fake news. More encouragingly, I will close with how two organisations are working to advance a code of ethics for data scientists, as a positive response to this challenge.

But first, let’s explore the darker side of Data Science use.

Questionable practices 1: Live monitoring of employee behaviour

Our first example to consider is this interesting article by Leo King for Raconteur magazine. He usefully explains the improved productivity goal that has driven the use of IoT devices to monitor employees. From cameras, room sensors & wearables, a range of devices are being used to provide real-time data on what employees are doing.

Although there is potential upside in terms of more efficient use of office space & potentially even prediction of employee stress or health risks – there are also data consent & privacy risks. Using Data Science to analyse such a stream of data and understand workforce behaviour may work analytically, but without consent it will breach GDPR regulations.

Plus, as Leo points out, the bigger challenge is one of transparency and trust. A feeling that “Big Brother is watching you” may do more harm to culture & motivation than any advanced modelling improvements to office layout or workflows. Interesting new ethical challenges for HR use of analytics:

New workplace monitoring has to get workers’ consent – Raconteur

Businesses have long generated data with nearly every operation, but until recently employee behaviour was not automatically tracked. The opportunity to improve productivity has changed practice. Using monitoring technology, companies are discovering which parts of the office work well, what times staff are active and where people congregate.

Questionable practices 2: Winning elections due to social media data

Plenty of news coverage in recent years has touched on the “dark arts” of Data Scientists being used by campaign teams in both US & UK to win surprise election victories. From Barak Obama’s success, to Donald Trump, to the Brexit referendum – all have had their success attributed to canny use of Data Science.

This article, in the Guardian newspaper, reports on the plans by Information Commissioner’s Office (ICO) to investigate use of data science firms (including Cambridge Analytica). This investigation is prompted by allegations that Brexit referendum result was influenced by potential misuse of personal data.

You can judge the controversial & heated nature of this topic by the fact that this very article is currently the subject of legal complaint by Cambridge Analytica. I’m not surprised at this touches on the financial backing of Mercer family in US and the potential for subsidised use of expensive data science capabilities to get around laws around funding for campaigns.

This article also usefully highlights the current differences between US & UK laws with regards to use of personal data to inform campaign strategies and targeted messages.

Watchdog to launch inquiry into misuse of data in politics

The UK’s privacy watchdog is launching an inquiry into how voters’ personal data is being captured and exploited in political campaigns, cited as a key factor in both the Brexit and Trump victories last year.

Questionable practices 3: Proliferating ‘fake news’ to influence a generation

Continuing with the topic of targeted communications, based on Data Science models, here is an article from the other end of the political spectrum of UK ‘broadsheets‘ (as they used to be).

In this interview with The Telegraph newspaper, Sir Tim Berners-Lee outlines his concerns about use of Data Science and chatbots to intelligently proliferate ‘fake news‘ stories. Infamous, especially in association with the Trump presidential campaign, Tim argues that targeted fake news is becoming a major societal risk.

He makes a good case for the dangers of unfettered data harvesting & use (through Data Science) to ‘game the system‘ of targeted social media content. Perhaps it is a significant risk to democracy that political groups are now able to personalise content to different groups simply to achieve appeal (even if these are untrue and completely contradictory messages).

As the accredited creator of the World Wide Web, Sir Tim is no luddite and is worth hearing on this topic. The article also ends with a useful definition of 5 types of “fake news“.

Sir Tim Berners-Lee, World Wide Web inventor, urges crackdown on ‘shocking’ fake news

“The net result is that these sites show us content they think we’ll click on – meaning that misinformation, or fake news, which is surprising, shocking, or designed to appeal to our biases, can spread like wildfire,” he added.

Responding to Questionable Practices: The case for Data Ethics

Let’s end this post on a more hopeful note. As well as the commentators above, other organisations in the UK and Netherlands have been alert to these risks and are proposing a mitigation.

Starting in the UK, within the Alan Turing institute, the Society of Data Miners is working to propose a Professional Code of Ethics for Data Mining. This was discussed in a public meeting in March this year. Hopefully further progress will be reported this year.

One of those within that working group, Dr Sandra Wachter, has also published this useful article on how modern use of AI requires more than just minimal GDPR compliance. She makes a good case for the “Right to Explanation“, something worth firms (especially those using Data Science) planning towards:

Across the North Sea, in Holland, a collaboration between a number of Dutch universities has created the Responsible Data Science (RDS) initiative. This looks very promising as it faces into the challenges created by use of Data Science, with a view to future proofing methods taught and guidelines or regulations.

As well as publishing a number of useful resources & events, they have usefully identified 4 main challenges on which to focus. These are a neat summary of the ethical challenges touched on in all our above news stories:

  • Data science without prejudice – How to avoid unfair conclusions even if they are true?
  • Data science without guesswork – How to answer questions with a guaranteed level of  accuracy?
  • Data science that ensures confidentiality – How to answer questions without revealing secrets?
  • Data science that provides transparency – How to clarify answers such that they become indisputable?

Here is the official RDS site with their full mission statement and event links:

I wish the RDS team well with their work in this area and hope it is adequately publicised to help work towards the international collaboration that will be needed for any meaningful future standards.

Do you have any questionable practices in your use of Data Science?

OK, I don’t really expect this post to become a confessional or prompt for industry whistleblowers. But, perhaps it could prompt you to stop for a moment and consider the ethical implications of your work or others you have seen.

Are there any other examples of Data Science applications that trouble your conscience? If so, please she in our comments below or on social media – it would be great to provoke a wider debate on these key issues for future development of our profession.

Enjoy some time to think about data science ethics. It might also prompt more ethical creative ideas.