Unified Information Access Blog
Welcome to Attivio's Unified Information Access Blog. Join us for discussions on topics ranging from enterprise search solutions, information access insights, Agile software development methodology to programming with Java. We hope you'll find the articles informative and participate in the discussions by leaving a comment.
Last year my colleague Jonathan Young wrote a blog post Untangling the Semantic Web: Finding Threads of Gold, in which he noted that "Although the semantic web sounds like a panacea in theory, it does not have a great track record in practice."
At least one reader took this to mean we don't find the core problem interesting. Nothing could be further from the truth! A plurality of our customers and partners place great value on understanding and discovering entities and the relationships between them. Jonathan goes on to explain this later in his post - "...in designing [AIE] we picked a few of the golden threads from the semantic web and combined them with a number of techniques which have been shown to improve the search experience". Some of the relevant capabilities include:
-
Dictionary and statistical named entity extraction in multiple languages
-
Regular expression extraction of any pattern, including common natural language templates like "Sid Probstein is the CTO of Attivio"; you can combine named entity discovery with other search, e.g., find the term "Attivio" near a person entity, etc...
-
Relationship modeling, e.g., friend-of-a-friend, using our query-side JOIN() operator.
We can also easily deal with many of the semantic web standards that are XML based - we speak XML "out of the box". For example it is easy to ingest content in RDF (Resource Description Framework), and emit search results (as well as extracted entities and relationships) as RDF triples.
OWL Web Ontology Framework is also an XML standard that we can make use of. Our automatic classifier can be trained to tag documents organized in most any structure - ontology, taxonomy or controlled vocabulary.
In the near future we are looking at implementing automated extraction of relationships. Initially this will be done by extracting and analyzing more generic patterns (e.g., Noun Phrase - Verb - Noun Phrase) as well as fuzzier approaches to templates (as noted above) and free text.
Fundamentally, our view is that we have an extremely elegant solution to the underlying problem of discovering, modeling and using entities and relationships. Triples are 'hot' because they are simple and easy for people (and legacy tools) to work with. AIE can handle much richer data structures, and by combining those with proximity and relational JOINs you can achieve truly remarkable insight into entities and how they relate to the rest of the universe.

Anonymous
said:
|
... Saying that OWL and RDF are an XML standard is misguided - XML is only a data exchange format, for RDF better (shorter, easier to write) alternatives such as N3/Turtle exist. OWL2 is in fact moving away from XML (and RDF) as its main syntax [1] - it prefers a simpler syntax that is more intuitive and more in line with its semantics. [1] http://www.w3.org/TR/2009/CR-owl2-syntax-20090611/ |
JO'N
said:
|
... I admit that the phrase "XML standard" is ambiguous betwen "a standard for XML" and "a standard using XML", but it's pretty clear in context what Sid meant. In any case, there are nearly an infinite number of possible languages that could be used for a data interchange format for RDF/OWL, but it's a pretty well-known fact that almost all the semantic web uses XML as an underlying data format. It would certainly be a welcome change if semantic web standards used a data format more easily read by humans, whether it's as simple as s-expressions or anything else. However, given the current not-so-blistering rate of semantic web standards releases (as well as the danger that whatever momentum that the semantic web has will be stymied by mandated format changes), I decline to hold my breath for OWL2 to be finalized, let alone be implemented in the wild. |
Articles by Date
Recent Posts
Thinking Like a Tester
What AIE and unified information access mean for developers
The (Real) Semantic Web Requires Machine Learning
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8

