Unified Information Access Blog

Welcome to Attivio's Unified Information Access Blog. Join us for discussions on topics ranging from enterprise search solutions, information access insights, Agile software development methodology to programming with Java. We hope you'll find the articles informative and participate in the discussions by leaving a comment.

Share

The following is an excerpt from a new whitepaper “Beyond the Data Warehouse: A Unified Store for Data and Content” by Dr. Barry Devlin, one of the foremost authorities on business insight and data warehousing.

Content, or soft information, has always been of interest to the business in a wide range of processes, from marketing to executive decision-making. The explosion in volume and variety of soft information driven, in particular, by the Internet has sharpened that interest. However, with years of experience in business intelligence and data warehousing behind them, many users are clear that what they really need is an integrated view of soft information with the harder data already available in the warehouse. While soft information on its own does have value, the real business advantage will come from exploring the entire set of hard and soft information free from the limitations of the pervasive, predefined data structures of hard information.

Content and data are closely related. Data is what IT has made of content in order to control and process it in the structured world of computers. Content as simple as “I’ll buy that red car” is transformed into a purchase transaction, with defined fields, allowed value ranges and keys normalized in a database. The use of two distinct words, “data” and “content” is unfortunate, since both are the same concept — information. Content is softer information, while data is harder; two terms at opposite ends of a continuum. At the softer end, information exists as commonly used and interpreted by humans — documents, images, etc. Hard information is the structured records and fields suitable for logical and numerical computer processing, for example, in operational systems.

Conceptually, soft information is the original source of all hard information. In designing operational databases, hard information is defined up-front through a person modeling soft information for computer use. Simply put, modeling separates the meaning (what is an order) and relation-ships of terms (price and quantity as part of an order) from the values (ten items at $100 each) they may take in a particular instance. Meaning and relationships can also be distilled from soft information on the fly — during ingestion into a content store or even during use of the content — using text mining and analytic tooling that essentially automates the same modeling process.

Perhaps because of its more formal structure, data is often assumed to be more accurate and reliable than content. The aim of a “single version of the truth” is widespread in data warehousing. In reality, both assumptions are misleading. Reliability and accuracy of information depend solely on its source, and the format doesn’t affect the quality of the information. Some sources are simply more or less dependable than others. Like accuracy and reliability, truth is also a relative term, as any reading of eye-witness reports can confirm. Resetting these erroneous beliefs is vital, especially for data warehouse experts, as we bring data and content together.

Harder information exists today in the regimented databases of operational systems, data ware-houses, and so on. Softer information is found in a wide variety of content stores from the “world wild west” of the Web and social media to well-managed stores of e-mails, documents, call center logs, etc. in enterprises. To meet the demand to provide access to all relevant information — regardless of its source or form — technology leaders need to look for methods that unite information without losing either the relationships so valued in the database realm or the context and nuances so important in content.

Dr. Devlin then goes on to define a unified information store (UIS) architecture as the approach to unification. The heart of this store is a core set of business information, indexes and metadata, originating from up-front enterprise modeling and text analytics of information when loaded and at the point of use, which ensure both data quality and agility. The business outcome is analytics that combine the precision of data querying with the relevance of content search, independent of the information source and structure.

Interested in learning more? Click here to download “Beyond the Data Warehouse: A Unified Store for Data and Content”

Trackback(0)
Comments (0)add comment

Write comment
smaller | bigger

security image
Write the displayed characters


busy

Attivio on LinkedIn

 

blue-rss-icon.png

Enter your email address:

 

Articles by Date

Recent Posts

Thinking Like a Tester

As a member of what was back then, just a three-person QA team, my heart sank when I read the title of one of our early...
Read More...

What AIE and unified information access mean for developers

There has been a lot of press recently on unified information access and how it enables business users and IT staff to reduce the time it takes to provide...
Read More...

The (Real) Semantic Web Requires Machine Learning

The (Real) Semantic Web Requires Machine Learning
We think about the semantic web in two complementary (and equivalent) ways. It can be viewed as: • A large set of subject-verb-object triples, where...
Read More...

More on Triples and Graphs

More on Triples and Graphs
One of the follow-up questions I've received regarding the post on Triples...
Read More...
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8