If you're a CDO, how would you describe your most important role: as gatekeeper or innovator? Or are you walking a tight rope between the two? Those questions figured prominently at the 10th annual MIT Chief Data Officer & Information Quality Symposium held in July.
When the role first emerged, gatekeeper probably occupied most of a CDOs waking hours. Nevertheless, as more organizations became aware of just how much value could be derived from their data, expectations have changed. Yes, CDOs still need to keep enterprise data safe, but they also need to keep the data supply chain running smoothly for data scientists and business analysts.
As any developer knows, perfect software doesn’t just happen it, pardon the pun, “develops” over time. Developers engage in a seemingly everlasting iterative process involving bug fixes and changes that can last for the lifetime of an application. But writing the software is only half the battle; it must then be deployed.
For big data companies like ours that run software across distributed networks, this is no small task. In particular, a developer makes changes, runs tests, identifies errors or processing improvements to address, and then makes more changes.
I recently attended Hadoop Summit 2016 where not surprisingly there was a lot of conversation about topics other than Hadoop. For example, the importance of ecosystem partners to any Big Data solution.
It was a great conversation. Carey pointed out that although data scientists do spend a lot of time on analytics, they also spend just as much or more time "wrangling" their data environments and trying to find data and move it where they need it. And that's why EMC turned to Attivio and Zaloni. Check out the rest of the discussion.
Attivio is excited to be a part of EMC’s new Big Data Solution. It’s not generally available yet, so we thought we’d have a chat with Ted Bardasz from EMC to give you a look at what this new platform offering is and how Attivio fits in.
Q: Tell me about your role at EMC.
TB: I am the Senior Director of Product Management in the Converged Platform Division, responsible for our Big Data and Native Hybrid Cloud solutions.
At Attivio, we work with some of the world's largest banks and manufacturing companies. As they invest more in Hadoop, they also require more from it. They recognize its value in dealing with extremely large and diverse data sets. But they're also looking for enterprise features, and data governance is often at the top of the list.
When we and a number of our customers joined Atlas—the Hortonworks data governance initiative—we brought our unique capabilities around data discovery and dealing with unstructured data. Many companies have chosen Hadoop as their new data platform and need a way to integrate their legacy data sources. We can connect the Hadoop ecosystem with external legacy systems.
You understand that data is the lifeblood of innovation and competitive differentiation. The key is to get the right data into the hands of business analysts when they need it. Sounds simple in theory, but challenges abound. However, for every challenge enterprises face surfacing and connecting the right data, there is an answer.
Let me explain:
From Process Bottlenecks to Busting Bottlenecks
For any data gathering process to work well, business and IT must be aligned. When they aren’t on the same page, when there needs to be a continual back and forth discussion about what data is needed, there is a bottleneck. IT doesn’t know what the data means, business analysts and data stewards don’t necessarily know what data is there.
The answer to this question is straightforward within the context of quantitative disciplines like mathematics in which “linear” and “non-linear” are well defined and differentiated. The answer is less obvious in reference to data management and analysis. The industry acknowledges that a traditional, strictly linear IT-centric approach is ineffective in view of today’s evolving data landscape. Many organizations advocate bypassing IT altogether with non-linear solutions emphasizing self-service data discovery, preparation and analysis in order to accelerate the transition from raw data to insights.
When it comes to gathering the right data and finding the relationships that make that data more meaningful, there’s one role that knows how to do it best - the data steward. That’s why they are often referred to as data detectives.
The Trustee of an Organization’s Data
Data stewardship is an important role for an organization. The data steward is a trustee of an organization’s data. They don’t own the data, but with so many internal and external data sources available for use, the data steward’s responsibility is to understand what is available and how it connects to provide real value.
There's a lot of talk these days about how to streamline the data supply chain. And the discussions often boil down to how to control an organization's data and how difficult and time consuming it is for business users to access it. As I wrote recently for DataInformed, highly structured systems for managing data like master data management (MDM) and enterprise data warehouses (EDWs) put a kink in the data supply chain. They aspire to a single version of the truth but at a cost in time-to-insight few enterprises can afford to pay.
There’s no mistaking that you need to leverage your data to gain a competitive advantage. We hear it from the analysts, experts and those who have been there and know what the power data can do.
Finding the Right Data
But it’s not as simple as pulling up all your data sources, connecting them together and then doing the analysis. What happens instead is you spend days, weeks, even months trying to find the right information across all the silos of applications and data repositories inside your organization, many of which are hidden from view.