Home Blog Industry Insights Missing Some Key Points with Big Data
Follow Me on Pinterest

It's not surprising to see innovative companies like EMC getting involved with Hadoop. Interest in analyzing and extracting value from the deluge of customer, machine and sensor-generated data-log files, click streams, position data, etc — is at an all time high. As companies increasingly engage with customers online we can expect the deluge to increase ... and increase ... and increase! This is the world of Big Data.

Many people say the Big Data movement is about "unstructured data". But they are missing an important point. Log files and click streams are not really unstructured; they are just relatively unfamiliar, and sometimes variable structures. But even if all this information was traditional, structured data, the average database still couldn't handle the deluge cost-effectively. The sheer volume is a key aspect of the Big Data challenge, and the more successfully you engage with end users - the more interaction you offer them - the more data you have to deal with in return.

What about the other sources that contain important information about customers and buying habits? It is a wealth of value that is today largely untapped. Emails, open-ended survey questions, web forms, call logs, discussion boards, SharePoint and Wiki sites — this is the true "unstructured content" that completes the picture of customer perception. It is moreover the best source to create a useful internal view - employee and partner behavior for example.

Unstructured data is not that different from structured data. It tells you what happened, and probably where. Unstructured content, on the other hand, explains WHY things happen. The ability to process and analyze this unstructured content is what prevents most of the Big Data players from presenting a comprehensive view.

The challenge of aggregating and analyzing unstructured content is significant. Human expression is shockingly diverse, varies by location and changes over time. Assembling the elements required to analyze and mine unstructured content requires a lot of expertise and software.

Another Big Data challenge can be the rate or "velocity" at which the information arrives - and the rate at which it may be desirable to analyze. And beyond velocity, complexity is a big challenge. Hadoop for example assumes all data is equal, and that analysis need consider no more than a single "slice" of that data. But many analytics require analysis and correlation across the entire set..

On the plus side: companies like Attivio have been focused on aggregating, enriching and analyzing unstructured content - as well as data! — for years. Attivio AIE complements Big Data infrastructure by bringing unstructured content into the analysis framework and by presenting Big Data in context to remove information blind-spots in business applications and automated business processes. AIE includes the essential text analytic capabilities such as Entity, Concept, Key Phrase and Sentiment Analysis that help transform unstructured content into meaningful insight. You can load enriched, summarized output into Hadoop using connectors. AIE scales linearly across commodity hardware, so you can also index your analysis, then set it up so that your Hadoop job can look across all of it by querying AIE. AIE supports both SQL and keyword querying, so it will be quite familiar to your developers.

Ultimately, Attivio completes the Big Data picture by delivering what Gartner has defined as Extreme Information - volume, velocity, variety and complexity. Complete the Big Data picture. Add unstructured content to your Big Data stack.

Trackback(0)
Comments (0)add comment

Write comment
smaller | bigger

security image
Write the displayed characters


busy