What is Metadata?
Metadata are data. More precisely, metadata are data about data. While this is the most common definition of metadata, this definition is not very helpful.
Dr. Malcolm Chisholm, President of Data Millennium, and thought leader, author, and speaker in data governance and data management, provides a more suitable definition of metadata. He defines metadata as
“the information needed to understand and manage the data assets of an enterprise.”
Metadata helps answer fundamental questions about the data, such as who, when, what, why, where, and how. Metadata can include elements such as, but not limited to, the following:
Names
Definitions
Descriptions
Subject area
Context of usage
Classifications
Sensitivity
Creator/author of the data
Origin/source of the data and lineage
Ownership / roles and access permissions
Relationship with other data
Version information
Location
Technical attributes (for example, length, data type, nullability, formats, and cardinality)
Quality thresholds
Valid values or range of values
Sample values
Other attributes
Are Metadata Important?
Metadata are important! Jonathan Sunderland, in his post on LinkedIn, uses a beautiful analogy to explain the importance of metadata -
Data without metadata is like a supermarket full of tins with no labels.
He further elaborates on the analogy by explaining the implications of not having metadata. In his words,
"Customers would have to come along and open each tin until they found what they wanted, spoiling the contents for others (and themselves). The supermarket would not trust customers, as they damaged the stock. The customers would not trust the supermarket as they don't help them decide what to buy."
In the same way, without metadata, it would be hard to say what the dataset is about.
And sometimes, even after opening the tins, you might not be able to fully make out what is inside if the labels are missing; the same thing happens with data without metadata. Without metadata, you might end up making wrong assumptions about the data. But unlike the contents of the supermarket tins which might get damaged when we open them, we might not damage the data but rather damage ourselves if we use the data wrongly.
Is Metadata Quality Important?
Yes, it is! In fact, quality of metadata is as important as the quality of data.
Data quality dimensions like completeness, accuracy, and currency but not limited to these only, apply to metadata too.
Pete Youngs explains the importance of quality of metadata by extending Jonathan Sunderland's metadata analogy on tins in the supermarkets in response to Jonathan Sunderland's post as follows:
"Actually, some tins have labels but there are no ingredients. And others just have ingredients listed but there’s no description of what it is. The quality of your metadata is important!"
How complete or detailed your metadata is, is important in determining whether data is fit for use or in making the right decisions using the data. This is demonstrated by the following examples:
Data without Metadata—An Example
The data set in Table 1 has no metadata. Just by looking at the rows and columns in this data set, it is impossible to tell what this data set represents.
Data with Metadata—Example 1
Table 2 tells us that the data represents customer data, but it does not tell what type of customers this data set has or what the individual columns or data elements in this set represent.
Data with Metadata—Example 3
Table 3 not only tells us that the data represents customer data, but it also tells us, through the column headers, what the individual columns or data elements in this set represent.
Data with Metadata—Example 4
Figure 1 not only tells you that the data represents customer data and what the individual columns or data elements in this set represent, but also the definitions of the individual columns and what values they hold.
However, there is still scope for improvement. For example, the length and quality thresholds are not defined for the following data elements: first name, last name, and organization name.
Concluding Thoughts and Further Research
In the absence of metadata, it’s impossible to determine what these data values represent, and hence, it’s impossible to assess their quality. Metadata are the first input to the process of measuring data quality, and hence also need to be of good quality.
Metadata has always been important, but with the vastness and universal presence of data, it has become even more important. Hence, effective metadata management is essential to having complete, accurate, and consistent metadata. Metadata management is critical for any organization trying to make the best use of data. Future research will be focused on different aspects of metadata management.
To learn more about data quality and its myths, challenges, critical success factors, strategy, DQ dimensions, data profiling, and more, including how to measure data quality dimensions, implement methodologies for data quality management, and data quality aspects to consider when undertaking data intensive projects, please read Data Quality: Dimensions, Measurement, Strategy, Management and Governance (Quality Press, 2019).
If you have any questions or any inputs you want to share, comment here or connect on LinkedIn.
References:
Mahanti, Rupa. Data Quality: Dimensions, Measurement, Strategy, Management and Governance. Quality Press. 2019.
Mahanti, Rupa. Statistics Spotlight: Data About Data, Quality Progress Magazine, ASQ, Vol. 55, Iss. 7, 2022
Biography: Rupa Mahanti is a consultant, researcher, speaker, data enthusiast, and author of several books on data (data quality, data governance, and data analytics). You can connect with Rupa on LinkedIn or Research Gate (Research Gate has most of her published work, some of which can be downloaded for free) or Medium.
Thanks for reading The Data Pub! Subscribe for free to receive new posts and support my work.