Unified Information Access Blog

Welcome to Attivio's Unified Information Access Blog. Join us for discussions on topics ranging from enterprise search solutions, information access insights, Agile software development methodology to programming with Java. We hope you'll find the articles informative and participate in the discussions by leaving a comment.

Share


Last year my colleague Jonathan Young wrote a blog post Untangling the Semantic Web: Finding Threads of Gold, in which he noted that "Although the semantic web sounds like a panacea in theory, it does not have a great track record in practice."

At least one reader took this to mean we don't find the core problem interesting. Nothing could be further from the truth! A plurality of our customers and partners place great value on understanding and discovering entities and the relationships between them. Jonathan goes on to explain this later in his post - "...in designing [AIE] we picked a few of the golden threads from the semantic web and combined them with a number of techniques which have been shown to improve the search experience". Some of the relevant capabilities include:

  • Dictionary and statistical named entity extraction in multiple languages

  • Regular expression extraction of any pattern, including common natural language templates like "Sid Probstein is the CTO of Attivio"; you can combine named entity discovery with other search, e.g., find the term "Attivio" near a person entity, etc...

  • Relationship modeling, e.g., friend-of-a-friend, using our query-side JOIN() operator.

We can also easily deal with many of the semantic web standards that are XML based - we speak XML "out of the box". For example it is easy to ingest content in RDF (Resource Description Framework), and emit search results (as well as extracted entities and relationships) as RDF triples.

OWL Web Ontology Framework is also an XML standard that we can make use of. Our automatic classifier can be trained to tag documents organized in most any structure - ontology, taxonomy or controlled vocabulary.

In the near future we are looking at implementing automated extraction of relationships. Initially this will be done by extracting and analyzing more generic patterns (e.g., Noun Phrase - Verb - Noun Phrase) as well as fuzzier approaches to templates (as noted above) and free text.

Fundamentally, our view is that we have an extremely elegant solution to the underlying problem of discovering, modeling and using entities and relationships. Triples are 'hot' because they are simple and easy for people (and legacy tools) to work with. AIE can handle much richer data structures, and by combining those with proximity and relational JOINs you can achieve truly remarkable insight into entities and how they relate to the rest of the universe.

Trackback(0)
Comments (2)add comment

Anonymous said:

...
Saying that OWL and RDF are an XML standard is misguided - XML is only a data exchange format, for RDF better (shorter, easier to write) alternatives such as N3/Turtle exist.
OWL2 is in fact moving away from XML (and RDF) as its main syntax [1] - it prefers a simpler syntax that is more intuitive and more in line with its semantics.

[1] http://www.w3.org/TR/2009/CR-owl2-syntax-20090611/
July 31, 2009

JO'N said:

...
I admit that the phrase "XML standard" is ambiguous betwen "a standard for XML" and "a standard using XML", but it's pretty clear in context what Sid meant. In any case, there are nearly an infinite number of possible languages that could be used for a data interchange format for RDF/OWL, but it's a pretty well-known fact that almost all the semantic web uses XML as an underlying data format. It would certainly be a welcome change if semantic web standards used a data format more easily read by humans, whether it's as simple as s-expressions or anything else. However, given the current not-so-blistering rate of semantic web standards releases (as well as the danger that whatever momentum that the semantic web has will be stymied by mandated format changes), I decline to hold my breath for OWL2 to be finalized, let alone be implemented in the wild.
July 31, 2009 | url

Write comment
smaller | bigger

security image
Write the displayed characters


busy

Attivio on LinkedIn

 

blue-rss-icon.png

Enter your email address:

 

Articles by Date

Recent Posts

Thinking Like a Tester

As a member of what was back then, just a three-person QA team, my heart sank when I read the title of one of our early...
Read More...

What AIE and unified information access mean for developers

There has been a lot of press recently on unified information access and how it enables business users and IT staff to reduce the time it takes to provide...
Read More...

The (Real) Semantic Web Requires Machine Learning

The (Real) Semantic Web Requires Machine Learning
We think about the semantic web in two complementary (and equivalent) ways. It can be viewed as: • A large set of subject-verb-object triples, where...
Read More...

More on Triples and Graphs

More on Triples and Graphs
One of the follow-up questions I've received regarding the post on Triples...
Read More...
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8