Data Quality – Why You Simply Cannot Ignore It
Data being an important corporate asset is a given. Placing a definitive value on data, however, is a lot harder than doing so for tangible assets. But if poor data quality costs organisations, how can they address this effectively?
Start by identifying the main problem when faced with Data – which is that poor Data Quality almost always adds to your costs, and consequently impacts your profitability.
Organisations and people rely on information all the time and this information is created from data.
You rely on it to run operations, make decisions and take appropriate action. However, if the data which you rely on is incomplete, invalid or not compliant with business rules, you then have a Data Quality problem.
Take this one step further. If the unreliable data is then used by business applications, this then turns into an Information Quality problem.
Is there a distinction between Data Quality and Information Quality? You bet.
The quality of one attribute of data can simultaneously affect the results of many business processes, across the Information Value Chain; therefore, Data Quality must consider the requirements of all users of the data.
Is there a way to find out whether you are on the right track and what sort of first steps you should initiate if you are new to this? Yes there is.
To get a good sense of what impact poor quality data has on organisations and the consequences this may bring, I spoke to Di Joseph, an Information Quality Specialist based in Cape Town, South Africa who has shared the same platform with international speakers such as John Zachman and Ronald Ross. Having been involved in running and supporting many large Data Migration projects in the past and having conducted many Data Quality Assessments, Di walks us through this perilous journey, keeping jargon to a minimum and helping to make things easy to understand.
What is Data Quality?
The first thing you need to wrap your head around is how things are defined.
Di explains that quality is a measure of fit for purpose, the absence of defects (an objective measure) and the possession of the desired features (subjective). Take, for example, a new car or a meal consumed at a restaurant. You decide, using a range of criteria, how the quality of that product is to be perceived.
The criteria and weightage used to determine the quality can, and frequently does, differ from person to person. Consequently, this should be seen as a subjective measure.
The same can be said of Data Quality.
“The quality is determined based on a number of combined criteria. What makes it more difficult to obtain and measure is the fact that the same data can be used by many different people across the organisation, often for different purposes, at the same time,” Di elaborates.
It is this factor that makes it important, because in order for Data to meet Quality expectations, it needs to satisfy the criteria of purpose and value for every one of the different users.
“The quality of one attribute of data can simultaneously affect the results of many business processes, across the Information Value Chain; therefore, Data Quality must consider the requirements of all users of the data. Data Quality is, quite simply, data that is compliant with the requirements of all the business processes that use that data”, Di expands.
Let’s take a look at some formal definitions for Data Quality.
“Consistently meeting all knowledge workers’ and end-customers’ expectations”.
– Larry English, one of the most highly respected authorities in the world on how to apply Quality Management principles to Total Information Quality Manager and author of Information Quality Applied
“Data is of high quality if it is fit for its intended use in operations, decision making and planning”.
– Tom Redman, “the Data Doc”, Navesink Consulting Group, whose definition is based on Joseph Juran)
Data Quality, Data Management, Data Governance – how does it all fit in?
Understand that these terms are not interchangeable – each has very specific meaning. Data Quality and Data Governance, which are two aspects of the much larger topic, Data Management, refer to two very different things.
Data Governance :
- is the organisation’s corporate commitment to managing data and information with the same rigour, formality and control ‘mindset’ as managing finances or any other high-value asset;
- involves senior management;
- is the act of leading the data resource function and managing related investments; and
- ensures that processes and standards are put in place to address all aspects of Data Management (including Data Quality) and that the required roles are addressed to implement the processes.
Data Quality Management :
- is the implementation and execution of specific processes to measure and improve the quality of the data;
- includes monitoring Data Quality, reporting to Stakeholders, correcting Data Defects and investigating Root Causes; and
- is supported by Data Governance which appoints the necessary roles (for example, Data Owners and Data Stewards) to ensure that Data Quality requirements are implemented and monitored.
Ramifications of poor Data Quality
The impact of Data Quality problems is often not recognised.
According to Di, these problems do not receive the attention they should; consequently, organisations continue to waste money. Unless and until sufficient attention is given to the scale of the problem and organisations are able to accurately gauge the losses resulting, it may be impossible to fully appreciate how much the organisation is at any disadvantage from relying on such data.
When context is added to data, data then begins to have meaning. It becomes information.
The problems associated with poor data quality are wide-ranging. Loss of customers, due to data problems, can impact customer service levels. There may be non-compliance with regulatory requirements, thus resulting in the payment of fines to the affected regulatory bodies. There may be ineffective or flawed decision making, due to incorrect data or simply poor information.
There may be wrong or inappropriate communication within the organisation itself. There may be a never-ending stream of expensive data correction work to undertake. Any scrap and rework may also lead to increased operational costs. If any of the problems caused by poor data quality become publicly known or generate media interest, this may then lead to public embarrassment which results in a loss of credibility. Market share loss will therefore be an inevitability.
Understand the subtle difference between Data and Information
Di explains the difference between Data and Information.
All Data Quality problems will result in Information Quality problems when the data is used by a business process.
Data is a representation of a fact about a thing. These may be raw facts about real world things (entities) and useful features of those things (attributes). These may comprise a set of discrete, objective facts. When each data attribute complies with the rules relating to that piece of Data, we have Data Quality.
When context is added to data, data then begins to have meaning. It becomes information. For example, the data value 01-12-2015 could refer to your client’s date of birth. When all the appropriate data is presented to users, in a concise and meaningful manner, you then have Information Quality.
According to Di, data problems, if they exist, are inherent in the facts about the thing and are often related to the accuracy and validity of the attribute.
“If a date attribute contains ’13’ in the month field, this is clearly a data quality problem. Information problems are evident when the data is presented for use in a specific context. All Data Quality problems will result in Information Quality problems when the data is used by a business process. In addition, problems with the context or presentation of the data can result in Information Quality problems. For example, a poorly formatted report can lead the user to misunderstand the data provided, even when the data itself is not incorrect, and this will result in issues with the quality of the information.”
When a data item complies with all the rules associated with the attribute, you then have Data Quality. When the right data is presented to the right person at the right time, in a concise, useable and meaningful manner, you have Information Quality.
Measuring Data Quality
We measure Data Quality using Data Quality Dimensions and Business Rules.
Some of the most common Data Quality Dimensions to which Data should comply include :
- uniqueness / duplication;
- consistency; and
What is a Business Rule? A business rule is a statement that defines or constrains some aspect of the business. Business Rules describe the operations, definitions and constraints that apply to an organisation2.
Each of these dimensions can and should be measured using automated software which checks every item against the defined Rules for that attribute and provides detailed statistics on each Data Quality Dimension.
When we measure Data Quality, we measure it in in terms of the Data Quality Dimensions and the Business Rules. Results are reported back in the form of a Scorecard intended to provide the overall status of the data in scope. The scorecard is presented graphically for ease of use by the stakeholders.
Thought preceding action – improving the quality of data
As there are several ways in which data can be defective, there are several ways in which you can improve the quality of this data.
Depending on the problems you find when measuring the data, you can then determine how best to improve its quality. For example, data may be incomplete, it may not be unique (duplicated) or it may not comply with the Business Rules.
Data Correction can be undertaken using manual or automated processes.
- data that is inaccurate and does not comply with business rules needs to be corrected to represent the accurate values;
- data that is missing needs to be obtained. It may be possible to derive it from other existing data or it may need to be obtained from external sources;
- data that is duplicated needs to be matched, then merged and de-duplicated where appropriate, or managed to be kept aligned.
Data Improvement is often a long process and uses techniques such as:
- Data Derivation;
- Data Enrichment or Augmentation.
However, it is important to, firstly, understand what went wrong! You do this by determining the root causes of the Data Quality problems, preferably prior to fixing the data.
Good quality all the way
How do you ensure that new data collected, from the initial process to the final steps, is of good quality? The short answer is Continuous Process Improvement across all areas of data collection and usage.
Di explains that this typically involves implementing preventative steps to improve and monitor processes to prevent Data Quality problems, as opposed to correcting existing data defects. She suggests that a data model should be developed to reflect all the data of interest to the solution.
This data model should address the following:
- The solution must be developed based on a rigorous Systems Development Life Cycle (SDLC) methodology;
- Data Security must be considered to ensure that the relevant information is protected from inappropriate access;
- Data Quality requirements must be considered as part of the development process and not as an afterthought;
- Data Governance must ensure that the data in scope has an owner who is accountable for the quality of the Data.
Di stresses that Data Quality should always be considered within the bigger picture of Data Management (DM). You can get an appreciation for DM by looking at a framework such as DAMA DMBOK. The DAMA DMBOK Framework shows the functional areas that need to be considered to ensure that the data which is collected and used is of good quality. An example is provided below.
Beginning your Data Quality initiative
Step 1 – Measure the quality of existing data
Start by understanding the quality of existing data that you consider critical to your business processes. This is best done by conducting a formal and objective Data Assessment.
Step 2 – Tabulate your findings
On completion of your assessment, you should be able to provide accurate and complete measures of the current quality of data. Report the statistics and findings in summary scorecards and detailed tables.
Consequent to this, you will be able to move on to two areas to take action on. Firstly, you will identify the problems that exist within the data. Secondly, you will identify the actions that can be taken to further analyse and improve on the quality of such data.
Step 3 – Present your results
When presenting your results, you should always start with the end in mind. This means that you interpret the results, where possible, and relate it back to the business.
Ask yourself these questions :
- What risks exist due to the problems identified?
- What costs are being incurred due to these issues?
- What potential benefits would fixing the problems provide?
Go on to identify the areas which are being impacted by these problems and consider whether support improvement activity should be implemented.
As you prepare your presentation, consider who will benefit from Data Quality improvement, whether directly or indirectly. As much as possible, report the statistics graphically, so that it’s easier to digest and understand.
Data Quality Management presents many significant challenges. In order to ensure success, there must be a champion leading the way, who can effectively impart, to all relevant parties involved, the impetus necessary not only to recognise the problems but also take responsibility for resolving them.
Do you agree with this? We’d love to hear from you, feel free to share this article on social or add your comments in the space provided below.
Based in Cape Town, South Africa, Di Joseph is an Information Quality Specialist. Di has been involved in running and supporting large Data Migration projects, developing Data Management Strategies, developing and implementing Data Management Methodologies, conducting data Quality Assessments and developing and delivering courses. Over the past 15 years, Di has spoken at conferences throughout Europe and the United States and has always been well received. Di conducts training courses regularly in South Africa to well known corporate clients.
1. 5 Best Practices for Data Quality Management by Priya Singh
2. It’s Not About Being Data-Driven by Jim Harris
3. How to Measure the Cost of Data Quality Problems by Loraine Lawson
4. Why Poor Data Quality is Impacting Profits by Dennis McCafferty, a CIOInsight slideshow
1 from Paper 098 – 29 Data Quality Management: The Most Critical Initiative You Can Implement by Jonathan G Geiger
2 as defined by Ronald G Ross
All in the detail image courtesy [email protected] Images on data quality courtesy Di Joseph.