Unified Information Access Blog

Welcome to Attivio's Unified Information Access Blog. Join us for discussions on topics ranging from enterprise search solutions, information access insights, Agile software development methodology to programming with Java. We hope you'll find the articles informative and participate in the discussions by leaving a comment.

Home Resources Blog Attivio Introducing Key Phrases
Follow Me on Pinterest

Last month Attivio announced general availability of the Attivio Active Intelligence Engine™ (AIE) version 2.1. AIE 2.1 includes a slew of new features and refinements that collectively realize our widely accepted (and often imitated) vision: Unified Information Access (UIA). Simply put, UIA asserts that information, regardless of type, needs to be available to users through a single interface. Of course the user is free to consume the information using a variety of methods: search interfaces, Business Intelligence (BI) tools, reporting engines, and applications - which may be search or BI based, or entirely new combinations of any of those.

One of the new core features in AIE 2.1 is the Key Phrase or Statistically Improbable Phrase extractor. It looks at the frequency of words and phrases in a document and finds the phrases that best distinguish the document from a large group of "background" documents. These words and phrases aren't usually named entities, but they're informative about the content of the document. The extracted key phrases are aggregated in a multi-value field and can then be presented as a facet.

Here are some examples of Key Phrases for queries resolved against an index of typical news content:

AIE_KP1.jpg

Key phrases are very natural for navigation. They are particularly effective at helping users disambiguate relatively broad queries and get at underlying elements which offer more detail.

AIE uses a language model built from ten billion words of text and compressed to less than 20 MB to implement key phrase extraction. For example, a bi-gram model can be used:

AIE_LM1.jpg

The algorithm tries to find a balance between rare phrases, and good, informative ones. The rareness is derived from the document content, but the quality of each phrase is determined using the language model.

AIE_LM2.jpg

Key phrases can be used to link data together conceptually as well. Documents can be clustered rapidly, at query time, by ORing different sets of phrases together. More on this in a subsequent post

Trackback(0)
Comments (0)add comment

Write comment
smaller | bigger

security image
Write the displayed characters


busy