It's hard to recall a more overused, yet loosely defined term than Big Data. It's in the title of almost every industry event, email promotion, and marketing blast that comes across my desk. The Red Sox even had a Big Data Bobblehead the other night — for the record, I didn't think it looked anything like the real guy. Yet despite Big Data's meteoric rise into mainstream culture, we are hardly at a point where an agreed upon definition exists.
Some people focus on volume, others on velocity, and we have even seen descriptions like this:
"Big Data is a collection of data sets so large and complex that they become
difficult to process using on-hand database management tools" ~ Wikipedia
Hmmm. Not exactly language you want to use at the next board meeting. And don't get me wrong, I fully embrace the trend toward analytic-driven decision making and firmly believe that Big Data will profoundly change the world that we live in (see: .com, amazon). But unless we put some parameters around what Big Data is, and more importantly, what it isn't, the term will continue to be diluted by software vendors, consulting firms, and bandwagoners alike (see: computing, cloud).
So let's start off with a couple of basic definitions that outline the core components of the information landscape. After all, you can't have Big Data without first identifying Little Data™ (copyright, Randy McLaughlin):
Structured Data - rows and columns. The hallmark of early data management programs, structured data is typically stored in relational databases, where the "relationship" between each row and column offers additional analytical value. For example, a stock price is of limited use unless we also know the name, currency, and date. The only problem? Structured data comprises just a small fraction of the information in a given organization. Most estimates have the total volume somewhere between 15-20%. So by definition, a Big Data strategy that focuses exclusively on structured data will omit vast quantities of information — both inside and outside of the firewall.
Unstructured Data — a little more progressive. This data type, sometimes referred to as semi-structured data, consists of machine-generated information that does not fit neatly into a relational database. Examples include log files, clickstream, and XML. Vendors such as Hadoop and Splunk have made headlines by supporting unstructured data, but as my colleague Julio Gomez notes, Big Data is about much more than storing all of the log files and web activity that you used to throw away. Agreed — crunching log files is so 2011...
Unstructured Content — the jackpot! This category includes human-generated information, such as emails, documents, analyst reports, and social media. Rich content that goes well beyond structured data and explains why things are actually happening. For example, structured data can tell you that IBM's share price is down 8.3% year over year, but the answer to why that is the case lives in the unstructured content: In the analyst report that forecasts a down year for tech stocks, in the industry blog that voices concern over IBM's cash flows, and in the broker email that notes the market has reacted negatively to Big Blue's most recent string of software acquisitions. Unstructured content offers opinion, explanation, and most importantly, insight. Launching a Big Data strategy that fails to incorporate unstructured content will not only be limited in scope, but light on business value.
So has big data jumped the shark? Not by a long shot, it has simply evolved, morphed if you will, into an approach that focuses not only on high volumes of structured data, but also on a wide variety of unstructured content. Big Content. Where functionality such as text analytics, sentiment analysis, and dynamic correlations enable firms to gain insights from massive amounts of human generated information. The kinds of insights that help businesses generate new analytics and drive competitive advantage. And that's something that will never jump the shark.
Big Data & New Business Value Creation
This paper will explore this vital Big Data shift towards business value creation, and the new technology essential to realize that new value. You will discover why database and information management systems are not enough, and explore unified information access: a highly flexible, agile “Analyze Everything” platform well-suited to deliver new Big Data value – without ripping and replacing existing information infrastructure.