Understanding Data Decay
“ Latte must be consumed fresh, Data must be kept fresh!”- Cesar Augusto Lima
The above quote by Cesar Augusto Lima was in response to a Linkedin post by George Firican in relation to natural decay of both latte and data.
Data decay, or aging of data, is data degradation over time, resulting in bad data quality (Mahanti 2019). While there are some data, such as date of birth and place of birth, which are evergreen and not subject to decay; that is, if you have captured the data values correctly in the first place and the data are untouched, the data will not change. Time has no impact on such data. However, other data are subject to aging and decay, even if left untouched. The time factor and what triggers the decay are the variables.
George Firican, the Founder of Light On Data and Thought Leader made a very interesting comment in his post with a cup of Latte (see image at the top) on Linkedin on October 31, 2022, His statement was as follows –
“The design of the latte starts to be ruined after a while even if left untouched. Same with certain data…”
What George Firican has said is very true and an illustrated using the latte analogy very beautifully. For example, stock market data are extremely volatile and change every few seconds.
Other data like passport expiration date are comparatively less volatile, with an expiration ranging from five to 10 years.
Still, in case of other data, such as contact data, decay are event driven, such as change in addresses triggered by expiration of rent lease dates or due to movement to a new location for job purposes, change in telephone numbers due to transfer to a new country or due to operator changes, and so on. 25 to 30 percent of an organization’s contact data can go bad each year under normal circumstances (Neubarth 2013). Hence, if an organization’s customer database has 12 million customer contact records, then approximately 3 million to 4 million customer contact records will be obsolete annually, resulting in significant dollar costs in terms of postage as well as missed opportunities (Mahanti 2019).
How do you keep contact data relatively fresh?
Contact data are an organization’s critical data and need regular maintenance. To ensure that data are up to date, it’s important to set guidelines for how often each field should be updated. For example, as per Reserve Bank of India guidelines, Indian banks require Know Your Customer (KYC) updates to happen every once in three years, and more frequently if a transaction has not occurred. While Know Your Customer (KYC) standards are designed to protect financial institutions against fraud, corruption, money laundering and terrorist financing, but since it establishes customer identity, the contact details are also updated.
Concluding Thoughts
Guidelines and processes for updating data should be defined for critical data elements in an organization. This will ensure that data are up-to-date, of high quality, and fit for usage.
Acknowledgments: Many thanks to Cesar Augusto Lima for allowing to use his statement/quote and to George Firican for permitting to use his photograph as a cover image in this article.
This article draws significantly from the research presented in the book Data Quality: Dimensions, Measurement, Strategy, Management and Governance (ASQ Quality Press, 2019). Future research will focused on how to measure data quality and data quality strategy.
Note: This article was first published on QualityDigest.com in March 16 2023.
Biography: Rupa Mahanti is a consultant, researcher, speaker, data enthusiast, and author of several books on data (data quality, data governance, and data analytics). You can connect with Rupa on LinkedIn or Research Gate (Research Gate has most of her published work, some of which can be downloaded for free) or Medium.