Nahla Davies examines what constitutes an applicable knowledge integrity framework, and the way insufficient frameworks harm knowledge high quality.
Should you requested most corporations whether or not they have an information integrity framework, they’d say sure with out hesitation. They’d level you to a shared drive, possibly a Confluence web page, probably a colour-coded spreadsheet with tabs labelled ‘Validation Guidelines’ and ‘Possession Matrix’. It appears official. It’s obtained a brand on it. Somebody even added conditional formatting.
However right here’s the factor: wanting like a framework and really functioning as one are two wildly completely different realities. Throughout industries, organisations are complicated documentation with governance, and the hole between these two issues is the place knowledge high quality quietly falls aside. The issue isn’t that groups don’t care. It’s that they’ve satisfied themselves the spreadsheet is sufficient.
The spreadsheet lure is extra frequent than anybody admits
There’s a sample that performs out in almost each mid-size org that’s undergone some sort of digital transformation push within the final 5 years. Somebody in knowledge engineering or analytics will get tasked with ‘constructing an information integrity framework’. They do their analysis, pull collectively some finest practices, and create a doc. Perhaps it lives in Google Sheets, possibly it’s a Notion database, possibly it’s an precise PDF that obtained emailed round as soon as after which forgotten about. No matter type it takes, it checks a field. Management sees it and feels reassured.
The difficulty begins when that doc has to outlive contact with actuality. Information pipelines change. New sources get added. Crew members rotate. And that spreadsheet? It doesn’t replace itself. It doesn’t ship alerts when a schema shifts or when a essential area begins returning nulls at twice the same old price. It simply sits there, frozen within the second it was created, slowly turning into a historic artifact relatively than an operational software.
What’s worse is that folks preserve referencing it as if it’s nonetheless correct. Selections get made based mostly on validation guidelines that haven’t been reviewed in months. Possession columns checklist individuals who’ve left the corporate. It’s the organisational equal of navigating with a map from 2019 and questioning why you retain hitting useless ends.
And it’s not a distinct segment drawback. A 2023 Gartner survey discovered that poor knowledge high quality prices organisations a mean of $12.9m per 12 months. That quantity doesn’t come from dramatic, headline-grabbing breaches. It comes from the sluggish, invisible accumulation of unhealthy information, missed anomalies, and unchecked assumptions {that a} static doc merely can’t catch.
What an actual framework really appears like
So what separates a functioning knowledge integrity framework from a well-formatted spreadsheet? It comes down as to whether the factor can function with out somebody manually babysitting it. An actual framework is embedded in your infrastructure. It’s automated, observable and responsive.
Which means validation checks run as a part of your knowledge pipelines, not as a quarterly audit somebody remembers to do within the final week of the quarter. It means the info is accurately annotated and that there’s monitoring in place that flags anomalies in actual time, whether or not that’s a sudden spike in null values or a mismatch between supply and vacation spot row counts. Instruments like Nice Expectations, Monte Carlo and dbt assessments exist particularly to deliver this type of rigor into the workflow.
It additionally means possession is enforced by way of tooling, not simply documented in a tab. When an information asset has a registered proprietor in an information catalogue, and that catalogue integrates together with your alerting system, accountability turns into structural. It stops being one thing you must chase folks about in Slack.
There’s a cultural element right here, too. Organisations with mature knowledge integrity practices deal with knowledge high quality as a product concern and are higher ready to determine correct AI governance. Product managers care about it. Analysts flag points proactively as a substitute of working round them. Engineers write assessments for knowledge the identical approach they write assessments for code. That sort of tradition doesn’t emerge from a spreadsheet. It emerges from management, making it clear that knowledge integrity is a precedence, not a aspect challenge somebody handles when issues are sluggish.
The businesses getting this proper are inclined to share just a few traits. They’ve invested in observability throughout their knowledge stack. They deal with schema modifications as occasions that require assessment, not issues that simply occur silently. They usually’ve moved previous the concept that documentation alone equals governance.
Why it issues extra now than it did 5 years in the past
The stakes round knowledge integrity have shifted considerably. 5 years in the past, a nasty report in a reporting dashboard was annoying however manageable. Right now, that very same unhealthy report is likely to be feeding a machine studying mannequin that’s making automated selections about credit score, hiring or affected person care. The blast radius of poor knowledge high quality has expanded as a result of the programs consuming that knowledge have turn out to be extra autonomous and extra consequential.
Regulatory stress can be mounting. Frameworks just like the EU’s AI Act and evolving knowledge privateness rules are placing extra scrutiny on how organisations handle the info that powers their merchandise. It’s getting tougher to shrug off knowledge high quality points as ‘technical debt we’ll get to ultimately’. Regulators wish to see proof of governance, and a spreadsheet with final 12 months’s date on it gained’t lower it.
There’s additionally the aggressive angle. Firms that may belief their knowledge transfer quicker. They make selections with extra confidence. They spend much less time reconciling conflicting studies and extra time really appearing on insights. Information integrity isn’t glamorous, however it’s a kind of foundational issues that quietly determines whether or not an organisation can execute on its technique or simply speak about it.
Remaining ideas
The uncomfortable fact is that the majority knowledge integrity frameworks weren’t constructed to be frameworks in any respect. They had been constructed to fulfill a request, to test a compliance field, or to present somebody one thing to current in a gathering.
And that’s advantageous as a place to begin. Each mature system began someplace. But when your ‘framework’ remains to be a spreadsheet that nobody’s touched in six months, it’s time to be trustworthy about what you even have.
Actual integrity requires automation, observability and cultural buy-in. The spreadsheet was by no means the vacation spot. Deal with it because the tough draft it at all times was, and begin constructing one thing that may really sustain together with your knowledge.
By Nahla Davies
Nahla Davies is a software program developer and tech author. Earlier than devoting her work full time to technical writing, she managed – amongst different intriguing issues – to function a lead programmer at an Inc. 5,000 experiential branding organisation, the place shoppers embody Samsung, Time Warner, Netflix and Sony.
Don’t miss out on the data it’s worthwhile to succeed. Join the Day by day Temporary, Silicon Republic’s digest of need-to-know sci-tech information.
Elevate your perspective with NextTech Information, the place innovation meets perception.
Uncover the most recent breakthroughs, get unique updates, and join with a world community of future-focused thinkers.
Unlock tomorrow’s traits right this moment: learn extra, subscribe to our publication, and turn out to be a part of the NextTech neighborhood at NextTech-news.com
