Unified Information Access Blog

Welcome to Attivio's Unified Information Access Blog. Join us for discussions on topics ranging from enterprise search solutions, information access insights, Agile software development methodology to programming with Java. We hope you'll find the articles informative and participate in the discussions by leaving a comment.

Share

Too often getting access to all the relevant business information we need has forced us to undertake a journey across multiple sources, using different technologies to discover what we need - and often actually settling for less than the complete picture. Until recently there hasn't been a unified source or a single access method for finding everything.

In his recent white paper, "Beyond the Data Warehouse: A Unified Store for Data and Content," data warehousing expert Dr. Barry Devlin points out that legacy technologies forced information into content and data silos. According to Dr. Devlin, "The use of two distinct words, ‘data' and ‘content' is unfortunate, since both are the same concept-information. Content is softer information, while data is harder; two terms at opposite ends of a continuum. At the softer end, information exists as commonly used and interpreted by humans-documents, images, etc. Hard information is the structured records and fields suitable for logical and numerical computer processing, for example, in operational systems.... placed rigorously in defined fields, with only certain values allowed."

One of the challenges of the transaction-focused database is that it relies on a schema and data modeling that are determined in advance of actual use of the data. Dr. Devlin writes that "for hard information, modeling is performed at design time, and permanently stored in the database structure and metadata. While this provides data quality and consistency as well as efficiency in use, it lacks agility to respond to unexpected queries....Hard information has its structure hardened when the schema is created. Because, in practice, schema change is cumbersome, all information must conform to the model (one part of the large cost of the ‘T' of ETL)."

Dr. Devlin talks about the need to unite data and content - because "the business neither understands nor accepts the difference [between data and content].... To meet the demand to provide access to all relevant information-regardless of its source or form-technology leaders are looking for methods that unite information without losing either the relationships so valued in the database realm or the context and nuances so important in content." In calling for a unifying architecture, Dr. Devlin addresses two approaches to creating what he terms a "unified information store": efforts to expand access to content within data applications and unified information access technologies that approach the problem by using a universal index for all types of information.

Integrating Text Analytics with Database Data

A unified information store provides the platform for deeper insight and examination by bringing together data and content from diverse sources. For this store to be as useful in combining data and content as each source is in its native application, the store needs to retain the most powerful information aspects of both data and content. For data, this is the relationships, normalization and cardinality; for content it is the relevance and context. And, of course, there is a requirement for sophisticated yet flexible querying, which means both SQL and the fuzzy search queries that deliver the Google experience online.

There's another key value from unifying information: just as applications add value to data by computation and by taking advantage of normalization and cardinality, it's possible to gain additional value from content by applying text analytics. These processes, such as sentiment analysis and classification, do more than identify data items in text. They also perform complex analysis such as sorting and linking related material, perceiving the attitude expressed in an author's opinion, identifying interesting elements in text by extracting statistically improbable phrases for use in discovery and navigation, etc. Some methods produce numerical data that can be charted and graphed along with traditional data from databases.

Using AIE Text Analytics in Database Applications

Attivio's Active Intelligence Engine™ (AIE) offers text analytics that include entity extraction (pattern- and dictionary-based and statistical), sentiment analysis (document and entity level), document classification, dynamic clustering and key phrases. In addition to applying text analytics in AIE applications, you can also easily bring the data derived from text analytics directly into a database or business intelligence application. This is possible because AIE includes a JDBC driver that lets the database or BI application treat the AIE index like any JDBC data source. For example, a BI dashboard can include the results of sentiment analysis about each product whose performance is tracked in the BI application.

This flexibility allows businesses to extend their existing data applications with AIE's unified information access and text analytics functionality or create dashboards built directly atop AIE. Either way, AIE lets you increase the return on information assets by providing a unified information store that provides access to data, content and text analytics and the diverse query methods to explore all insight relevant to the issue at hand.

Trackback(0)
Comments (0)add comment

Write comment
smaller | bigger

security image
Write the displayed characters


busy

Attivio on LinkedIn

 

blue-rss-icon.png

Enter your email address:

 

Articles by Date

Recent Posts

Thinking Like a Tester

As a member of what was back then, just a three-person QA team, my heart sank when I read the title of one of our early...
Read More...

What AIE and unified information access mean for developers

There has been a lot of press recently on unified information access and how it enables business users and IT staff to reduce the time it takes to provide...
Read More...

The (Real) Semantic Web Requires Machine Learning

The (Real) Semantic Web Requires Machine Learning
We think about the semantic web in two complementary (and equivalent) ways. It can be viewed as: • A large set of subject-verb-object triples, where...
Read More...

More on Triples and Graphs

More on Triples and Graphs
One of the follow-up questions I've received regarding the post on Triples...
Read More...
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8