Additionally, attendees will join our panel of experts for a round-table discussion on Big Data & Integration challenges facing CIOs now. Talk with Actian Chief Technologist , Jim Falgout, about Hadoop and Big Data Analytics and more.
As CEO of Emprise Technologies, I’ve seen just about every cause there is for integration project failure. Often, there is more than one issue slowing down the project, sometimes a confluence of events – a periodic “perfect storm” develops, which derails integration projects and causes failure. I’m teaming up with Actian’s Chief Technologist, Jim Falgout to share the secrets we’ve learned for ensuring data integration and big data project success.
Don’t miss out on the opportunity to be part of the Big Data & Integration Summit NYC 2013. Register Now! Do you have any topics to suggest for the Summit? Provide us with your comments below. This is YOUR Summit!
In his latest post on the Actian corporation hosted Data Integration blog,data management industry analyst, Robin Bloor laid out his vision of data flow architecture. He wrote, “We organize software within networks of computers to run applications (i.e., provide capability) for the benefit of users and the organization as a whole. Exactly how we do this is determined by the workloads and the service levels we try to meet. Different applications have different workloads. This whole activity is complicated by the fact that, nowadays, most of these applications pass information or even commands to each other. For that reason, even though the computer hardware needed for most applications is not particularly expensive, we cannot build applications within silos in the way that we once did. It’s now about networks and grids of computers.”
Bloor said, “The natural outcome of successfully analyzing a collection of event data is the discovery of actionable knowledge.” He went on to say, “Data analysis is thus a two-step activity. The first step is knowledge discovery, which involves iterative analysis on mountains of data to discover useful knowledge. The second step is knowledge implementation, which may also involve on-going analytical activity on critical data but also involves the implementation of the knowledge.” Read more->
If you haven’t experienced the frustration of trying to wade through duplicate and incorrect data, you’re one of the very few. Dirty data clogs up our databases, integration projects and creates obstacles to getting the information we need from the data. It can be like trying to paddling through a sea of junk.
The value of our data is providing reporting that is accurate and business intelligence that enable good business decisions. Good data governance is critical to successful business as well as meeting compliance requirements.
So how do we avoid the pitfalls of poor data quality?
Perform quality assurance activities for each step of the process. Data quality results from frequent and ongoing efforts to reduce duplication and update information. If that sounds like a daunting task, remember that using the right tools can save substantial time and money, as well as create better results.
Take the time to set clear and consistent rules for setting up your data. If you inherited a database, then you can still update the governance to improve your data quality.
How to update data governance?
Recommendation: Updating data governance will almost always require new code segments being added to existing data import/scrub/validation processes. A side effect of adding new code segments is a “cleanup”. When code is updated to promote data governance, it is usually only applied to new data entering the system. What about the data that was in the system prior to the new data governance code? We want all the new data governance rules to hit new data as well as existing data. You’ll need build the new code segments into separate processes for (hopefully) a one-time cleanup of the existing data. Applying the updated data governance code in conjunction with executing the “cleanup” will bring data governance current, update existing data, and maintain a uniform dataset.
Which are the most important things to update?
Validation Lookups, Tables, and Rules
GIGO – garbage in = garbage out. Rid your data of the garbage early and avoid a massive clean up later. The C-suite appreciates that you’ll run more efficient projects and processes as well.
We all know the kind of profiling that is completely unacceptable and that’s not what I’m talking about here. I neither condone nor practice any kind of socially unacceptable profiling. But there IS one type of profiling that I strongly recommend: Data Profiling. Especially before you migrate your data.
If you think that sounds like a luxury you don’t have the time to fit into your project’s schedule, consider this: Bloor Research conducted a study and found that the data migration projects that used data profiling best practices were significantly more likely (72% compared to 52%) to came in on time and on budget. That’s big difference and there are a lot more benefits organizations realize when they use data profiling in their projects.
Data Profiling enables better regulatory compliance, more efficient master data management and better data governance. It all goes back to the old adage that “You have to measure what you want to manage.” Profiling data is the first step in measuring how good the quality of your data is before you migrate or integrate it. It allows monitoring the quality of the data throughout the life of the data. Data deteriorates at around 1.25-1.5% per month. That adds up to a lot of bad data over the course of a year or two. The lower your data quality is, the lower your process and project efficiencies will be. No one wants that. Download the Bloor Research “Data Profiling – The Business Case” white paper and learn more about the results of this study.
Pervasive has recently developed an effective utility for migrating Data Integrator v9 projects into Pervasive Data Integrator v10. The process is quick and relatively smooth; however, there is the potential for challenges to arise due to the complex nature of most DI projects. If you are thinking about transitioning from v9 to v10, please reach out to Emprise to learn how our team of Certified Pervasive Developers can help your transition to v10 be successful.
Emprise Technologies is proud to be a Platinum sponsor of Pervasive IntegrationWorld 2013. We are also sponsoring the Data Clinic. If you are going to be at IntegrationWorld, come by the Data Clinic and ask one of our Pervasive certified consultants questions about Data Integrator. Bring your toughest Data Integration questions: The Emprise team has collective 30,000 hours of Pervasive work, so we doubt you’ll be able to stump us. But we’re open to your trying! See you at IntegrationWorld 2013. We’ll be in the Hyatt Hill Country Ballroom A-C from 10:15 a.m. until 4:00 p.m. on Monday, April 15 and again on Tuesday, the 16, from 9:20 a.m. – 12:00 p.m.