Data-driven organizations know that the key to success through data is to make it available to as many people across the organization as possible. But every organization’s data is growing at exponential rates and is getting more complicated. This growth and complexity make current data preparation processes more challenging and often inefficient.
To get the right data out to the right people as quickly as possible organizations need to focus improvements on reducing the time required to gather and prepare data. At the same time, they need to ensure the management and use of data align with governance programs.
Why data preparation is so important
Data is leveraged for strategic, operational and financial decision-making and if it’s incomplete, inaccurate, or poor quality the consequences could be severe. This is why data preparation is so important, and it starts from the time the raw data (structured and unstructured) is first ingested to how it’s defined and structured, consolidated, transformed and connected to other datasets.
In the TDWI Best Practices report “Improving Data Preparation for Business Analytics: Applying Technologies and Methods for Establishing Trusted Data Assets for More Productive Users”, author and senior director of TDWI Research for business intelligence (BI), looked at the current challenges of data profliling and preparation and how new technologies are helping improve the process.
As part of the report, TDWI did a survey of business and IT executives to get their views on data preparation in their organization. The survey found that only 7% were “very satisfied” with how easily they could find relevant data and understand how to use it. And while 70% said IT was responsible for making it easier to find data, 44% said it was important to increase users’ ability to perform self-service data preparation.
Traditional methods of data preparation no longer work. These manual, time-consuming processes that rely heavily on IT can’t help organizations keep pace with changing markets. New innovative practices and technologies are helping organizations get a better handle on their data from the beginning.
These new technologies support:
- Self-service data preparation
- Data catalogs that enable more effective data preparation through data federation
- Automatic cataloging of datasets based on higher-level business rules
- Data federation
- Stronger governance and stewardship of data
The benefits of improved data preparation
Organizations that focus on improving data preparation look for tools that enable them to capture business definitions alongside technical metadata. It’s important to record tribal wisdom about data assets because it provides a level of understanding technical metadata doesn’t offer, such as how the data has been used, the applicability of certain metrics and algorithms and different ways to look at the data when performing ad hoc discovery. A data catalog is an essential tool for consolidating and coordinating data definitions that include both business and technical metadata.
The TDWI report noted a recent BI project where 37% said that 61-80% of their time was spent preparing data (28% said 41-60%). Not only is data preparation time-consuming, but analytics workflows are also considerably slowed when data preparation and analytic processes are completely separated.
It’s interesting that BI and data exploration have enjoyed the use of self-service tools for a while, yet we are only just starting to look at self-service tools for data preparation and transformation. Fortunately, the technology and business practices are available to support the growing needs of data-driven organizations.
The TDWI report is an excellent read, packed with insights into the challenges and opportunities of improved data preparation. I encourage you to take a look and download your copy now