If you haven’t experienced the frustration of trying to wade through duplicate and incorrect data, you’re one of the very few. Dirty data clogs up our databases, integration projects and creates obstacles to getting the information we need from the data. It can be like trying to paddling through a sea of junk.
The value of our data is providing reporting that is accurate and business intelligence that enable good business decisions. Good data governance is critical to successful business as well as meeting compliance requirements.
So how do we avoid the pitfalls of poor data quality?
Perform quality assurance activities for each step of the process. Data quality results from frequent and ongoing efforts to reduce duplication and update information. If that sounds like a daunting task, remember that using the right tools can save substantial time and money, as well as create better results.
Take the time to set clear and consistent rules for setting up your data. If you inherited a database, then you can still update the governance to improve your data quality.
How to update data governance?
Recommendation: Updating data governance will almost always require new code segments being added to existing data import/scrub/validation processes. A side effect of adding new code segments is a “cleanup”. When code is updated to promote data governance, it is usually only applied to new data entering the system. What about the data that was in the system prior to the new data governance code? We want all the new data governance rules to hit new data as well as existing data. You’ll need build the new code segments into separate processes for (hopefully) a one-time cleanup of the existing data. Applying the updated data governance code in conjunction with executing the “cleanup” will bring data governance current, update existing data, and maintain a uniform dataset.
Which are the most important things to update?
- Translation Tables
- Stored Procedures
- Database Views
- Validation Lookups, Tables, and Rules
GIGO – garbage in = garbage out. Rid your data of the garbage early and avoid a massive clean up later. The C-suite appreciates that you’ll run more efficient projects and processes as well.