Data Integration Dos and Don’ts

Actian CTO of Cloud Technology Partners, David Linthicum, recently discussed the Data Integration Dos and Don’ts.

In this article David discusses that many enterprises deployed some sort of data integration technology within the last 20 years. While many enterprise insiders believe they have the problem solved, most don’t. His advice? There needs to be a continued focus on what the technology does, and what value it brings to the organization.

Data integration is not something you just drop in and hope for the best. There needs to be careful planning around its use. IT is the typical choice to do the planning, select the technology, and for ongoing operations.

However, the need for data integration typically comes from outside of IT. Those who understand that data should be shared between systems, as needed and when needed, in support of core business processes, are typically the ones crying for more and better data integration technology. IT responds to those requests reactively.

David continues to explain that now things are changing more quickly than they have in the past, including new impacts on IT as well as end users. Specifically, these changes include:

  • The use of public cloud resources as a place to host and operate applications and data stores. This increases the integration challenges for enterprise IT, and requires a new way of thinking about data integration and data integration technology.
  • The rise of big data systems, both in the cloud and on-premise, where the amount of data stored could go beyond a petabyte. These systems have very specialized data integration requirements, not to mention the ability for the data integration solution to scale.
  • The rise of complex and mixed data models. This includes no-SQL type databases that typically serve a single purpose. Moreover, databases are emerging that focus on high performance, and thus need a data integration solution that can keep up.

To support these newer systems, those who leverage data integration approaches and technology have more decisions to make. Indeed, these can be boiled down to some simple dos and don’ts.

Do create a data integration plan, and architecture. No matter if you have existing data integration solutions in place or not, you need to consider your data integration requirements, which typically include lists of source and target data stores, performance, security, governance, data cleansing, etc.. This needs to be defined in enough detail that those in the IT and non-IT organization can both understand and follow the plan. This should also include a logical and physical data integration architecture, as well as a detailed roadmap so the amount of ambiguity is reduced.

Do allocate enough budget. In many cases, there are just not enough resources focused on the data integration problem. If we do develop a plan, the tasks and technology in that plan need to be funded.  Lack of funding typically means data integration efforts die the death of a thousand cuts, and the data integration solutions don’t solve the problems they should solve. That costs far more than any money you think you’re saving.

Don’t take the technology for granted. Many enterprises believe that most data integration solutions are the same, and don’t spend the time they need should to evaluate and test data integration technology. Available data integration technology varies a great deal, in terms of function and the problem patterns they can address. You need to become an expert of sorts in what’s available, what it does, and how it will work and play within your infrastructure to solve your business problems.

Don’t neglect security, governance, and performance. Many who implement data integration solutions often overlook security, governance, and even performance. They do this for a few reasons. Typically, they lack an understanding of how these concepts relate to data integration, and/or they lack an adequate budget (see above). The reality is that these are concepts that must be baked into the data integration solution from logical architecture to physical deployment. If you miss these items, you’ll have to retrofit them down the line.  This is almost impossible, certainly costly, and let’s not forget the cost of the risk you’ll incur.

Linthicum believes some of this seems obvious, most of what’s stated here is not followed by enterprise managers when they define, design, and deploy data integration solutions and technology. The end result is a system that misses some of the core reasons for deploying data integration in the first place, and does not deliver the huge value that this technology can bring.

The good news for most enterprises is that data integration technology continues to improve, and has adapted around emerging infrastructure changes, including use of cloud, big data, etc..  However, a certain amount of discipline and planning must still occur.

Advertisements

Emerging IT Trends – The Age of Data

Big Data is Big Business
Big Data is Big Business

These are truly exciting times! The volume and velocity of data available to every business is astounding and continues to grow. IT industry leaders are talking about where technology is going, what the future holds and the impact all of this will have on the world.

Robin Bloor took a minute to review the path to the present, in his guest post “The Age of Data”, on the Actian blog this week, before revealing the vision he and IT Industry thought leader, Mike Hoskins have of the future for data.

“Mike Hoskins, CTO of Actian (formally, Pervasive Software) suggested to me in a recent conversation that we have entered the Age of Data. Is this the case? ” Bloor begins his post with a review of history. “The dawn of the IT industry could certainly be described as the Age of Iron. Even in the mainframe days there were many hardware companies.” I agree. In the past, the focus was on the machines and what they could do for humans.

Bloor continues, “Despite the fact that computers are only useful if you have applications, the money was made primarily from selling hardware, and the large and growing companies in that Age of Iron made money from the machines.” You can guess the monicker Bloor gives the next phase of IT history: “The Age of Software”. The volume of databases and applications available for organizations to buy exploded. And that got messy. Lots and lots of file types, formats, languages, programs led to multiple versions of records and interoperability nightmares.

What’s next? Bloor suggests it’s the Age of Data. It’s about the data and the analytics it can provide us. This is the Cambrian explosion that will be one of the primary topics discussed at the Big Data & Integration Summit NYC 2013. Actian Chief Technologist, Jim Falgout and I will present our views on emerging trends and lead a roundtable discussion with other industry leaders about the impact all of this will all have on business. I invite you to join what promises to be a lively conversation and attend the Summit.

Based on feedback from  industry leaders and customers, the Emprise Technologies and Actian teams have created a handful of sessions designed to deliver best practices that IT professionals can take home and use immediately to improve IT project success. These include “How to Win Business Buy-in for IT Projects”, “Avoiding the Pitfalls of Data Quality” and “Creating Workflows That Ensure Project Success”. I hope you’ll come join us. If you can’t make to New York, we’re planning to take the Big Data & Integration Summit on the road, so leave us your requested cities and topics in the comments below. We look forward to hearing from you.

Announcing! The Big Data & Integration Summit NYC 2013

Actian Corporation and Emprise Technologies are co-hosting The Big Data & Integration Summit on September 26, 2013 in NYC and invite CIOs and IT Directors to attend and join in the conversations. #BDISNYC13 This event is free and features a fast-paced agenda that includes these topics and more: 

Register Now for the Big Data & Integration Summit NYC 2013
Register Now for the Big Data & Integration Summit NYC 2013

Additionally, attendees will join our panel of experts for a round-table discussion on Big Data & Integration challenges facing CIOs now. Talk with Actian Chief Technologist , Jim Falgout, about Hadoop and Big Data Analytics and more.

As CEO of Emprise Technologies, I’ve seen just about every cause there is for integration project failure. Often, there is more than one issue slowing down the project, sometimes a confluence of events – a periodic “perfect storm” develops, which derails integration projects and causes failure. I’m teaming up with Actian’s Chief Technologist, Jim Falgout to share the secrets we’ve learned for ensuring data integration and big data project success.

Don’t miss out on the opportunity to be part of the Big Data & Integration Summit NYC 2013. Register Now! Do you have any topics to suggest for the Summit? Provide us with your comments below. This is YOUR Summit!

Summit Agenda

Register Now for the Big Data & Integration Summit NYC 2013
Register Now for the Big Data & Integration Summit NYC 2013

 

The Data Flow Architecture Two-Step

Robin Bloor's Data Two Step
The Two Step Data Process

In his latest post on the Actian corporation hosted Data Integration blog, data management industry analystRobin Bloor laid out his vision of data flow architecture. He wrote, “We organize software within networks of computers to run applications (i.e., provide capability) for the benefit of users and the organization as a whole. Exactly how we do this is determined by the workloads and the service levels we try to meet. Different applications have different workloads. This whole activity is complicated by the fact that, nowadays, most of these applications pass information or even commands to each other. For that reason, even though the computer hardware needed for most applications is not particularly expensive, we cannot build applications within silos in the way that we once did. It’s now about networks and grids of computers.”

Bloor said, “The natural outcome of successfully analyzing a collection of event data is the discovery of actionable knowledge.” He went on to say, “Data analysis is thus a two-step activity. The first step is knowledge discovery, which involves iterative analysis on mountains of data to discover useful knowledge. The second step is knowledge implementation, which may also involve on-going analytical activity on critical data but also involves the implementation of the knowledge.” Read more->

“The Cost of Poor Data Management”

It is surprising that data quality is still a concept that is viewed as a luxury, rather than a necessity. As an unapologetic data quality advocate, I’ve written white papers and blog posts about the value of  good data management. It takes the efforts of many to change  habits. In her blog post, The Costs of Poor Data Management, on the Data Integration Blog, Julie Hunt breaks down the impact data quality has on business.

Here’s an infographic on the cost poor data quality can have on business.

Global research - Bad customer data costs you millions

She points out that the areas of data quality deserving the greatest focus are specific to each organization. If you read my post, “Avoiding Data Quality Pitfalls”,  you know that I’m a proponent of good data governance. Update early and often. My top four suggestions are:

  • Translation Tables
  • Stored Procedures
  • Database Views
  • Validation Lookups, Tables, and Rule

What are yours? Read Julie’s post, and send me your comments.

Avoid Data Quality Pitfalls

If you haven’t experienced the frustration of trying to wade through duplicate and incorrect data, you’re one of the very few. Dirty data clogs up our databases, integration projects and creates obstacles to getting the information we need from the data. It can be like trying to paddling through a sea of junk.

The value of our data is providing reporting that is accurate and business intelligence that enable good business decisions. Good data governance is critical to successful business as well as meeting compliance requirements.

Image

So how do we avoid the pitfalls of poor data quality?

Perform quality assurance activities for each step of the process. Data quality results from frequent and ongoing efforts to reduce duplication and update information. If that sounds like a daunting task, remember that using the right tools can save substantial time and money, as well as create better results.

Take the time to set clear and consistent rules for setting up your data. If you inherited a database, then you can still update the governance to improve your data quality.

How to update data governance?

Recommendation: Updating data governance will almost always require new code segments being added to existing data import/scrub/validation processes.  A side effect of adding new code segments is a “cleanup”.  When code is updated to promote data governance, it is usually only applied to new data entering the system.  What about the data that was in the system prior to the new data governance code?  We want all the new data governance rules to hit new data as well as existing data.  You’ll need build the new code segments into separate processes for (hopefully) a one-time cleanup of the existing data.  Applying the updated data governance code in conjunction with executing the “cleanup” will bring data governance current, update existing data, and maintain a uniform dataset.

Which are the most important things to update?

  • Translation Tables
  • Stored Procedures
  • Database Views
  • Validation Lookups, Tables, and Rules

GIGO – garbage in = garbage out. Rid your data of the garbage early and avoid a massive clean up later. The C-suite appreciates that you’ll run more efficient projects and processes as well.