Accelerate ROI on Hadoop by Making All Data Accessible

Despite Gartner’s observation that 41% of organizations are unsure if their Big Data ROI will be positive or negative, they remain keenly interested in investing in Big Data technology so as to take advantage of data-driven use cases in an effort to improve predictions and forecasts, exploit IoT opportunities, identify new products and services, and improve operational efficiency. Other tactical business drivers include real-time decisions and insights. It’s tempting for organizations to climb on the Big Data bandwagon while overlooking the unique set of corresponding challenges attributable to the:

  • High volume and the variety of data requiring analysis
  • Requirement to store and process data that comprises multiple formats
  • Necessity to combine these data with other enterprise data in order to derive valuable insights.

Hadoop is a commonly used platform to support Big Data initiatives. It seemingly offers a cost-effective means of capturing and storing data in raw form, and provides robust processing capabilities. However, data in Hadoop is typically stored in many different formats that are not always readily accessible. This is mitigated by building metadata that provides an understanding of the data in Hadoop and facilitates building of relevant logical or physical schemas to enable data access. The metadata catalog combined with the ability to easily access the data are key to accelerating ROI on Hadoop, which in turn contributes to overall ROI from Big Data initiatives.

Challenges to Achieving ROI on Hadoop

As a multi-purpose platform hosted on inexpensive commodity hardware, Hadoop is viewed by organizations as a cost-effective approach to supporting vertical (domain-specific) use cases, and also horizontal use cases including:

  • Operational data store
  • Data consolidation
  • Data archiving
  • Data lake (centralized pool of disparate data sources)
  • Digital shoebox (dump data here and figure out its usefulness, if any, later)
  • Staging area for analytics

Although Hadoop might appear to represent a relatively inexpensive investment in the beginning, once an organization shifts from merely storing the data in Hadoop to processing and analyzing the data, challenges emerge such as a lack of semantics and predefined schemas (schema-on-read), determination of value in the data, and acquisition of specialized skillsets, all of which complicate realization of Hadoop ROI. Additionally, organizations need to realize that determining Hadoop ROI extends beyond the Hadoop platform. Hadoop ROI must incorporate downstream pipelines or applications that leverage Hadoop in support of decision-making since downstream processing can be impacted by incomplete and inaccurate data.

While some of the data in Hadoop may originate from a well-structured data source, the mere capability to store data in multiple formats makes it challenging to infer relationships in the data and easily combine data stored in Hadoop, issues that are not encountered with a traditional data platform. Traditional data access methods are simply not efficient, or not the most suitable, for accessing data in Hadoop.  

Metadata is foundational to enabling Big Data initiatives and a must-have to effectively work with data in Hadoop. It empowers users to collaborate on the data by providing visibility into the data, facilitating understanding of the data, and centralizing knowledge. Metadata is especially crucial for government-regulated organizations that must account for inventorying and categorizing data assets for data retention and preservation compliance.

How Attivio Accelerates ROI on Hadoop

Attivio targets the major hurdles to maximizing value on Hadoop: discovering the data, understanding the data, and accessing the data. Attivio overcomes these challenges by enabling automatic generation of a semantic data catalog and a convenient eCommerce-like experience to search for and unify data which significantly reduces the time and effort in working with data in Hadoop. Attivio delivers the following key capabilities:

  • Spiders across all data sources and builds a semantic metadata catalog
  • Provides an automated way to find, understand, and correlate disparate data sources (structured, semi-structured, and unstructured)
  • Employs different advanced analytical techniques depending on the information it encounters
  • Offers an eCommerce-like shopping experience for data that is easily accessible to the business
  • Unifies and provisions data sets for data-driven applications and BI tools

In order to successfully execute Big Data initiatives and realize corresponding ROI, organizations must prepare to support Big Data use cases by facilitating fast and easy access to disparate data sources which opens the door to leveraging the untapped potential in the data.

Gartner Magic Quadrant for Insight Engines 2019
Attivio was recognized for our completeness in vision and ability to execute.