Customer Data Quality Reporting – the next 6 metrics you need
Time to return, to the important topic of Customer Data Quality Reporting. Which metrics should you report on, to ensure robust data quality controls?
After a week of training analysts in data visualisation, and speaking to marketers about how to run an NPS programme, back to part 2 of our new series.
You may recall, that in part 1 of his series, our newest guest blogger (Paul Weston) introduced this topic. In that post, Paul explained the importance of this reporting and shared the first 5 of his recommended metrics.
In this post, part 2 of his series, Paul shares the rest of his “great set of measures“. A final 6 that will help complete your view of current data quality & any issues to address.
So, back over to Paul to complete this list…
Field by field Quality Metrics (the final 6)
Entry replication
This measure often identifies where humans are being lazy, or trying to cheat. That is, cheat either system constraints (e.g. mandatory fields), or performance measures around data capture. It normally involves using their own details, or a standard set of fictitious details, instead of asking customers for theirs. Typical examples are email addresses, identity numbers and mobile phone numbers. The ‘maximum’ allowed number of records, with a single value, needs to be set for each field. Then records counted, where entries in relevant fields are being repeated. (i.e. This is the 287th time we have seen the email address p.weston@btinternet.com).
Key Linkage Integrity
This is a more advanced measure, but potentially one of the most important. In relational databases, the whole concept is to avoid storing data multiple times. So, if the database stores 5 people, who live at the same address, or 50 people who work at the same company, then it gives each address or company a reference number. Then it only stores this reference number (foreign key) for each person. As an absolute minimum, the quality measurement should check that this foreign key refers to an address or company that still exists. Ideally, it should also identify whether the linked records are of good quality, or not. It is of little value, having a customer linked to an address which is not mailable, or to a company with the name of “aaaaa”.
Field Interdependency
The values in some fields should work in conjunction, with those in other fields, or at least not conflict with them. A gender of “M”, should not be in the same record as a salutation of “Ms.” (probably). A company called “Industries plc”, should not appear in a record with an enterprise type of “Partnership”. Individual checks, and sometimes more than one, are required. They are needed for each field, and the corrective action is not always obvious, from a record failing one of the checks. Sometimes the check is for specific values, but sometimes just a simple data presence check is needed. For instance, a ‘Date of Death’, should not appear unless a customer status is deceased.
Exclusions Avoidance
This measure is, to some extent, a ‘mopping-up measure‘. It’s designed to identify spurious values, in fields. This is based on growing experience, of the invalid entries that customers or staff provide. It involves comparison against a regularly maintained exclusions list, for each relevant field. Examples of exclusions for phone numbers may include widely advertised numbers, that stick in people’s minds. Examples of exclusions for names, would include the organisation’s standard profanity and offensiveness checks, as well as certain famous names.
Capture/Update/Validation Recency
The ability to implement this measure, at a field level, is rare. It requires that “Date-of-last-change”, or a similar attribute, is stored against each field; as opposed to being stored once for a record. The audit functions of some CRM systems and Customer Databases provide this facility. But turning it on, can impact performance. There is an increasing focus on recording more granular ‘last-change’ or ‘last validation’ data. This is especially as it impacts the requirements of privacy and data protections legislation, like GDPR in Europe.
User Confidence
This measure is also rarely seen, but is extremely insightful. It relies on a research approach, among users of the data. This is normally relatively small-scale, and not overly formal. It may be done as screen pop-ups inviting / requiring users to score their level of confidence; in the data that they are accessing at the time. The measure can help identify the user communication activity needed. Comms to accompany data improvement work, in order to maximise its value.
Customer Data Quality Reporting – keep on learning
Thanks to Paul for completing his list, with more advanced metrics, that go beyond what is currently in place for most teams. I was especially pleased to see consideration of the implications of GDPR, i.e. on field-level recency data.
I hope Paul’s 11 ‘great set of measures‘ have inspired you; to revisit your customer data quality reporting. If you are revisiting that, please let us know your views on Paul’s recommended metrics.
Are there some you’ve decided you need? Do you have others beyond the scope of these recommendations?
Customer Data Quality Reporting gets so little air time, compared to Big Data or Data Science. Let’s spend time sharing best practice on this important quality control.
Next week, in Paul’s final post, he will move on to the topic of record-by-record measures. What more could you want as you prepare for Christmas? Thanks again ‘Santa‘ Weston, ho ho ho.