A Data Catalog for a Modern Data Architecture

A Data Catalog for a Modern Data Architecture

Wayne Eckerson points out in a recent report that, "Business intelligence (Bl) is fueled by two opposing forces: top-down BI, in which the corporate IT group imposes standards on the delivery of data and reports to ensure a single version of truth, and bottom-up BI, in which business unit analysts create their own reports with custom data sets." Or more simply, it's the data warehouse folks versus the fans of Hadoop and the data lake when building a modern data architecture.

As more and more organizations adopt Hadoop, it's just a matter of time before these two groups have to reach some kind of working agreement—aka the modern or "hybrid" data architecture.

Most organizations take on Hadoop with a legacy data architecture already well established. That architecture supports multiple application and data silos with various access controls and governance policies.

It's easy for net new data to go directly to Hadoop, but what about all the enterprise data residing in data warehouses, one-off data marts, and other silos? Some may have aged out of value. While a good-size subset might be worth moving to Hadoop, most organizations are moving forward with a hybrid data architecture.

The question is where does the valuable data live and how do you evaluate it, regardless of source? How do you create agile data access via a modern data architecture?

modern data architecture

Bridging the Gap between Data Warehouse and Data Lake

The more data sources—structured and unstructured—business users and data scientists can quickly find, aggregate, and understand, the more precise and insightful the analytics they can generate. After all, you can't analyze what you can't find or know exists.

Attivio addresses this problem by building a virtual data catalog—from files stores and application silos to EDWs and Hadoop. As it builds the catalog, Attivio creates an intelligent semantic layer that discovers the meaning and relationships of all information in the ecosystem—from structured data to unstructured content.

This catalog "knows" where all your data lives and how to access it. That knowledge is critical when it’s time to classify the business value of data, move it, and determine where it should go—like a data lake.

Right now, as chief data officers try to manage the transition from legacy data architectures to Hadoop, a data catalog removes IT from its gatekeeper role. As business analysts and data scientists need access to data, it streamlines the data supply chain for standalone analytic tools and embedded analytics—even when data storage changes.

To find out more about agile data access while transitioning to a modern data architecture, look for us at Strata+Hadoop, New York, September 27-29.


Gartner Magic Quadrant for Insight Engines 2019
Attivio was recognized for our completeness in vision and ability to execute.