At Attivio, we work with some of the world's largest banks and manufacturing companies. As they invest more in Hadoop, they also require more from it. They recognize its value in dealing with extremely large and diverse data sets. But they're also looking for enterprise features, and data governance is often at the top of the list.
When we and a number of our customers joined Atlas—the Hortonworks data governance initiative—we brought our unique capabilities around data discovery and dealing with unstructured data. Many companies have chosen Hadoop as their new data platform and need a way to integrate their legacy data sources. We can connect the Hadoop ecosystem with external legacy systems.
At the end of May, the city of Boston named Andrew Therriault as its first CDO—chief data officer. Therriault, former Director of Data Science for the Democratic National Committee, comes with an impressive background. He has a B.A., M.A. and PhD. in politics from NYU. Before joining the DNC, he served as senior data scientist with Greenberg Quinlan Rosner Research, whose client list includes global companies, advocacy groups, and political organizations as well as former presidents. So, he's certainly qualified.
You understand that data is the lifeblood of innovation and competitive differentiation. The key is to get the right data into the hands of business analysts when they need it. Sounds simple in theory, but challenges abound. However, for every challenge enterprises face surfacing and connecting the right data, there is an answer.
Let me explain:
From Process Bottlenecks to Busting Bottlenecks
For any data gathering process to work well, business and IT must be aligned. When they aren’t on the same page, when there needs to be a continual back and forth discussion about what data is needed, there is a bottleneck. IT doesn’t know what the data means, business analysts and data stewards don’t necessarily know what data is there.
The answer to this question is straightforward within the context of quantitative disciplines like mathematics in which “linear” and “non-linear” are well defined and differentiated. The answer is less obvious in reference to data management and analysis. The industry acknowledges that a traditional, strictly linear IT-centric approach is ineffective in view of today’s evolving data landscape. Many organizations advocate bypassing IT altogether with non-linear solutions emphasizing self-service data discovery, preparation and analysis in order to accelerate the transition from raw data to insights.
When it comes to gathering the right data and finding the relationships that make that data more meaningful, there’s one role that knows how to do it best - the data steward. That’s why they are often referred to as data detectives.
The Trustee of an Organization’s Data
Data stewardship is an important role for an organization. The data steward is a trustee of an organization’s data. They don’t own the data, but with so many internal and external data sources available for use, the data steward’s responsibility is to understand what is available and how it connects to provide real value.
Your organization’s data is a competitive advantage. But how can you take advantage of it, if you don’t know what you have? How can you leverage it to provide data-driven insights to decision makers? There are several roles in the organization who can help, starting with IT.
Data, Data Everywhere
IT organizations are responsible for an ever-increasing amount of data every year. This data comes from many sources including websites, CRM, ERP, contact center logs, social media, documents, IoT and much more. Data is stored in business applications, data warehouses, file shares and other locations within the organization.
The Chief Data Officer has never been a more necessary role in the organization than it is today. Organizations capture and store more data than ever before, and it’s growing exponentially every year.
Not only is business data growing, but we are seeing new types of data continually entering the mix. Data is structured, unstructured and semi-structured. It’s stored in big data lakes, in business applications, in file shares, and other places across the organization. There’s so much data that even the CDO isn’t completely aware of what’s out there.
And then there’s the external data. CDOs are becoming aware of the need to bring in external data sources that provide relevant, and sometimes essential, information to support decision making.
There's a lot of talk these days about how to streamline the data supply chain. And the discussions often boil down to how to control an organization's data and how difficult and time consuming it is for business users to access it. As I wrote recently for DataInformed, highly structured systems for managing data like master data management (MDM) and enterprise data warehouses (EDWs) put a kink in the data supply chain. They aspire to a single version of the truth but at a cost in time-to-insight few enterprises can afford to pay.