Is Data Quality the Same As Data Accuracy?
Digital Age, Data, and Data Quality
The current age that we are living in, is the digital age, with data being the fuel of the new economy.
“Using dirty data to fuel your business is like putting the wrong kind of fuel in your car”
While good data is a valuable asset and a source of myriad opportunities, bad data is a tremendous burden and an organizational liability. Hence, data quality is an important topic of discussion.
Data quality can be defined as evaluating data’s fitness to use (that is, serve their purpose) in a given context.
However, data quality isn’t only an aspect of data that determines their fitness for use, but is also a function or subdiscipline of data management.
There are several myths around data quality. In this article we discuss one the most common myths of data quality:
Data quality is data accuracy.
Data quality is the same as data accuracy.
Data quality is only about data accuracy.
Data Quality Myth
Sustaining high quality data is a challenge that most organizations face, and the data quality arena is surrounded by its own set of myths. This misleads people when it comes to making data quality management-related decisions. These myths can slow down, hinder, or put a stop to an organization’s data quality management efforts or the deployment of data quality projects or initiatives.
“Data quality is data accuracy” is one of the most common myths of data quality. The general misconceptions are that data quality is synonymous to data accuracy, or that data quality is only about data accuracy. When people think about high quality in relation to data, they tend to think about the accuracy aspect only. When an organization is under the influence of this myth, data accuracy becomes its only data-quality improvement goal.
What is Data Accuracy?
Data accuracy refers to how closely or how well the data stored in a system reflect reality. It is the degree to which data correctly describe the characteristics of the real-world object, entity, situation, phenomena, or event.
Measuring data accuracy requires that an authoritative source of reference be identified and available to compare the data against. This makes it difficult to measure data accuracy or acquiring an authoritative source of reference is not easy. For example, if the data shows that the place of birth for Ren Ray is Sydney, Australia, but his actual place of birth is Melbourne, Australia, then the data are inaccurate. However, without an authoritative source of reference, such as a birth certificate or passport that document an individual’s place of birth, it is not possible to ascertain where Ren Ray was actually born.
Data must not only reflect reality, they must also be complete, valid, and consistent. For data to be accurate, they need to be complete in the first place (that is, values need to be present). For data to be valid, they must conform to some sort of standard. As a validity example, as per ISO’s list of country codes, AU is a valid country code, but AAA is not. Data can be valid but not accurate. For example, if a person’s postal address records “AU” as the country code when the person is actually residing in the United States, then the data are complete and valid (because AU is a valid code) but fail the accuracy test.
Consistency means that exactly the same data appear the same way across different data sets. As a consistency example, if one data set records a name as John Smith, but the other data set reports this person’s name as John Smyth, then the data are inconsistent; at least one of the sets is inaccurate.
If data are accurate, then they meet all the tests above. Because of data accuracy has a close relationship with other data quality dimensions such as completeness, validity, and consistency, data accuracy is often seen as an overarching dimension and data quality is often equated with data quality.
Why is Data Quality Not Same As Data Accuracy?
Although data accuracy is one of the important characteristics or dimensions of data quality, and therefore shouldn’t be overlooked, accuracy alone doesn’t completely characterize the data quality. In order for data to be of high quality they should fit for the intended purposes.
Data quality has several dimensions, known as data quality dimensions, that enable the measurement of the quality of data. These dimensions include, but are not limited to completeness, uniqueness, granularity, precision, consistency, accessibility, security, traceability, trustworthiness, conformity/validity, timeliness, integrity, currency, volatility, and so forth.
For example, if data are accurate, but not delivered in time for reporting purposes, the data wouldn’t be considered of high quality because the intended purpose wasn’t served. Data might also be accurate but not granular enough to serve the business need. Data might be be accurate but not trustworthy, which would result in stakeholders hesitant to use it. If data are accurate but not accessible to authorized people, or conversely hacked by unauthorized people (that is data security is poor), they are also not of much use and, thus, the data quality is poor.
Concluding Thoughts and Future Research
Undeniably, data are normally considered of poor quality if erroneous values are associated with the real-world entity or event. However, data quality is about striking a balance between all data quality dimensions. Depending on context, situation, the data themselves (e.g., master data, transactional data, reference data), business needs, and the industry sector, different permutations and combinations of data-quality dimensions would need to be applied.
Ironically, while data quality dimensions have been in existence for a few decades, there is no universal agreement on the number or definitions of data quality dimensions. Future research will be focused on this aspect.
To learn more about data quality and its myths, challenges, critical success factors, strategy, DQ dimensions, data profiling, and more, including how to measure data quality dimensions, implement methodologies for data quality management, and data quality aspects to consider when undertaking data intensive projects, please read Data Quality: Dimensions, Measurement, Strategy, Management and Governance (Quality Press, 2019). This article draws significantly from the research presented in that book.
If you have any questions or any inputs you want to share, comment here or connect on LinkedIn.
References:
Mahanti, Rupa. Data Quality: Dimensions, Measurement, Strategy, Management and Governance. Quality Press. 2019.
A version of this article was first published on QualityDigest.com in March 2022 and later in Data Driven Investors Newsletter on Medium in November 2022.
Biography: Rupa Mahanti is a consultant, researcher, speaker, data enthusiast, and author of several books on data (data quality, data governance, and data analytics). You can connect with Rupa on LinkedIn or Research Gate (Research Gate has most of her published work, some of which can be downloaded for free) or Medium.