One of the biggest changes in the upcoming Sherlock release of the Attivio Platform is moving to a Hadoop-based architecture for big data applications. Hadoop provides us with a number of low level capabilities that we no longer have to manage at this scale, such as resource management (YARN), coordination (Zookeeper), storage (HDFS) and bookkeeping (HBase). One of the side benefits of YARN based resource allocation is the ability to scale up or down your system’s footprint with a few simple YARN commands. At Attivio, we’ve taken this ability and used it to scale up/down our index, for total flexibility when it comes to performance, cost or index management.
Good question. What is the point? The point is to create measurable business value from enterprise data. Of course, before measurable business value comes insight. The Modern Data Architecture (MDA) recognizes that insight can lie hidden in data of all types—structured or unstructured, messy or modeled, historical or realtime.
As I’ve mentioned in prior blogs, the biggest use cases we see in Hadoop these days come from the risk and compliance functions of large banks. Initially, many banks and other financial services institutions (FSIs) adopted Hadoop out of sheer necessity despite its early immaturity on the governance front. With primary analytic use cases such as Know Your Customer, eCommunications Surveillance, and Anti-Money Laundering, FSIs need analytic solutions that can run at massive scale—the Hadoop sweet spot.
Writing on the O’Reilly.com site back in August, CEO Jessie Anderson of Smoking Hand, a training company for Big Data technologies, commented on the overall complexity of Big Data, NoSQL technologies, and the distributed systems that deploy them.
Chief Data Officers certainly have first-hand knowledge of this complexity and the hurdles it presents to extracting the maximum value out of business data. Complexity takes a variety of forms throughout the Big Data stack. Let’s start at the bottom.
If you missed the Strata+Hadoop Conference in New York City last week, here’s a quick recap.
From September 26-29, 10,000 experts came together to share best practices, innovative technology news, and network with their contemporaries around all things data—data science, big data, and data in the enterprise. Sessions and keynote speeches focused on a wide array of topics. Members of the Attivio team had a few favorites, including:
As many of our customers move their entire compute environment—including analytics and storage—into the Hortonworks distribution of Hadoop, Attivio has focused on gaining greater integration with Hortonworks. Having one platform repository in which to execute makes everyone's life simpler.
Organizations recognize the value of Hadoop in dealing with extremely large and diverse datasets. But they're also looking for enterprise features such as data governance. Attivio can help in both areas. Here's Attivio CTO and co-founder Will Johnson discussing Hadoop integration at a recent Hortonworks user conference.
Forrester just released its latest Wave report. Unlike many on more mature technologies, this report on native Hadoop BI platforms included only six vendors, of which Attivio was one. In Forrester's estimation, there are no leaders in the market, just contenders and strong performers.
Hadoop is fast becoming the center piece of a modern data architecture. And Cloudera Enterprise provides the centralized management and robust support that you need to effectively operate Hadoop.
The modern data architecture stores data as is; it doesn't require pre-modeling. It needs to accommodate volume, velocity, and variety including structured and unstructured information. Hadoop does this very well.
Many of our customers use Hadoop. And Attivio's full-platform certification on Cloudera Enterprise will help them streamline their modern data architecture by offering agile data access wherever data resides.
I recently attended Hadoop Summit 2016 where not surprisingly there was a lot of conversation about topics other than Hadoop. For example, the importance of ecosystem partners to any Big Data solution.
It was a great conversation. Carey pointed out that although data scientists do spend a lot of time on analytics, they also spend just as much or more time "wrangling" their data environments and trying to find data and move it where they need it. And that's why EMC turned to Attivio and Zaloni. Check out the rest of the discussion.
At Attivio, we work with some of the world's largest banks and manufacturing companies. As they invest more in Hadoop, they also require more from it. They recognize its value in dealing with extremely large and diverse data sets. But they're also looking for enterprise features, and data governance is often at the top of the list.
When we and a number of our customers joined Atlas—the Hortonworks data governance initiative—we brought our unique capabilities around data discovery and dealing with unstructured data. Many companies have chosen Hadoop as their new data platform and need a way to integrate their legacy data sources. We can connect the Hadoop ecosystem with external legacy systems.