Unified Information Access Blog

Welcome to Attivio's Unified Information Access Blog. Join us for discussions on topics ranging from enterprise search solutions, information access insights, Agile software development methodology to programming with Java. We hope you'll find the articles informative and participate in the discussions by leaving a comment.

We think about the semantic web in two complementary (and equivalent) ways. It can be viewed as:

• A large set of subject-verb-object triples, where the verb is a relation and the subject and object are entities

OR

• As a large graph or network, where the nodes of the graph are entities and the graph's directed edges or arrows are the relations between nodes.

As a reminder, entities are proper names, like people, places, companies, and so on. Relations are meaningful events, outcomes or states, like BORN-IN, WORKS-FOR, MARRIED-TO, and so on. Each entity (like "John O'Neil", "Attivio" or "Newton, MA") has a type (like "PERSON", "COMPANY" or "LOCATION") and each relation is constrained to only accept certain types of entities. For example, WORKS-FOR may require a PERSON as the subject and a COMPANY as the object.

How semantic web information is organized and transmitted is described by a blizzard of technical standards and XML namespaces. Once you escape from that, the basic goals of the semantic web are (1) to allow a lot of useful information about the world to be simply expressed, in a way that (2) allows computers to do useful things with it.

Almost immediately, some problems crop up. As generations of artificial intelligence researchers have learned, it can be really difficult to encode real-world knowledge into predicate logic, which is more-or-less what the semantic web is. The same AI researchers also learned that different people will almost inevitably create knowledge encodings that can't easily be compared, because they use different — sometimes subtly, maddeningly different — basic definitions and concepts. Another difficult problem is to decide when entity names refer to the "same" real-world thing. Even worse, if the entity names are defined in two separate places, when and how should they be merged? For example, do an Internet search for "John O'Neil", and try to decide which of the results refer to how many different people. Believe me, all the results are not for the same person.

idata-semantic-web.jpgAs for relations, it's difficult to tell when they really mean the same thing across different knowledge encodings. No matter how careful you are, if you want to use relations to infer new facts, you have few resources to check to see if the combined information is valid.

So, when each web site can define its own entities and relations, independently of any other web site, how do you reconcile entities and relations defined by different people?

One technique is to require (or STRONGLY SUGGEST) the use of a shared ontology. (For our purposes, an ontology is one person's — or one company's — semantic web).

Perhaps, if it were carefully designed, it would be possible to allow anyone to add to it without making it unusable. Wikipedia might serve as an inspiration here. However, this is generally impractical, for a number of reasons:

  1. A lot of smart people have tried to do this in the past, and they've obviously failed.
  2. Wikipedia has grown a community that is good — perhaps too good — at discussing how articles should be written. However, it's not clear that any community could become competent to discuss semantic web issues in detail - and to come into agreement about them.

The major problem is the "open-world" requirement implicit in the semantic web. In a closed world or a limited domain - even if the limited domain isn't small — it's possible to agree on the ontological issues and get to work. Many companies have put a lot of effort into creating their domain ontologies, and some have even found a day-to-day use for them. However, it takes a lot of work, and continuously ongoing work, to maintain a good domain ontology.

Even if companies were willing to open-source their ontologies, their domain is closed — and once you start trying to knit different domain ontologies together, you quickly start seeing the problems discussed above.

By the way, the fact that the semantic web has failed to be widely adopted has, I think, a simple explanation: it's really difficult, much more so than learning HTML, and the practical payoff is not obvious, to put it mildly.

As an aside, Attivio's unified information access architecture allows corporate ontologies to be directly imported, so a user can search through them, or perform SQL queries on them, including joins. Joins, in particular, are a powerful tool for understanding semantic web ontologies, and for using them to improve search and other kinds of business intelligence work. (You can read about our newly awarded join patent here.)

Is there a solution? Can the creation of domain ontologies be automated — or at least made easier? Will something make it possible to combine different domain (and different site) semantic webs — at least with some minimum guarantees about reliability? I think so, and here's why.

At Attivio, we've been working on using statistical machine learning to learn how to extract relations from plain text. We're still working on it — it's a difficult problem — but we're making real progress and I'm pretty sure that we'll discuss the details of our work in future blog posts. For now, though, it's clear to us that there's a real advantage in being able to associate probabilities with the entities and relations that we find in a document, especially when we can accumulate information from millions of documents (or more). If we build a knowledge graph with weights on the entity nodes and relational edges, we start having a way to measure the reliability of different parts of a semantic web. We can also determine, for two separate semantic webs, what entities and relations we know are the same or different, and where we're unsure.

Human ontology builders can't create probabilities like that, since humans are even worse at statistics than they are at semantics. (No blame here — both are really confusing to think about!) However, there's been a lot of research into relation and event extraction, as well as in machine learning using big data (or extreme information, if you prefer). So it's now possible to create tools that substantially help the process of building ontologies.

And, making no promises we'll regret, we hope that we'll be able to talk more about it soon.

Author Bio

John O'Neil has written and designed software for search, natural language processing and machine learning for 10 years. After receiving a Ph.D. in computational linguistics from Harvard University, he has worked for Lingo Motors, where he designed their main commercial product and ended up with his name on a number of their patents, as well as other search engine companies where he worked to increase search relevancy and accuracy. He also worked for over five years at Basis Technology, Inc., where he was the designer and lead developer for the Rosette Linguistics Platform, their language processing and entity extraction suite of products.

digital_converge.jpgIn my last post I described how a convergence strategy — building a platform that enables consolidation across products and even entire industries — produces massive profits and growth. Now I want to focus on how a convergence strategy can be applied within the enterprise.

As I have previously mentioned, unified information access (UIA) is a fundamental element of a successful convergence strategy. It's not possible to offer a consolidated experience if the information required by end-users is trapped in silos. But information is only part of the problem; end users also need applications and supporting business logic. In an era where lean IT is the standard, is it reasonable to expect that the latter two will be easily integrated?

While it is possible to extract the information and unify it, can we really imagine having one application to do our entire, complicated job?

The answer is that this is exactly what Apple faced in the B2C market. The same solution will work and will yield extreme results. Instead of taking an evolutionary view that the application silos will be knitted together through modest, "baby step" improvements over many years, take a revolutionary view: do it all at once! It's the only way it will really work.

Here's a cookbook for adopting a successful convergence strategy.

1. Pick a set of applications that matter to some group of end users; customers who generate revenue are probably the best choice.

2. If you haven't already, set up teams who will sustain these applications.

3. Extract the information from each application and unify it with a UIA platform. This will take some time and effort, but it is the prerequisite for everything else you will do, and the significant benefit that will result.

4. Link and organize the information together automatically, using taxonomies, ontologies and text analytics like entity extraction.

5. Wrap the underlying application APIs up into new, converged APIs that implement the common operations people will want to apply to the information. For example, if you have 10 systems that handle sales, you want to have one new sellStuff() method that handles the behavior for all 10 systems. You can use an Enterprise Service Bus (ESB) or some sort of middleware to do this; but a complete UIA platform like Attivio's Active Intelligence Engine will have built-in workflow systems that can handle this task already.

6. Build a convergence application that showcases the information effectively, regardless of which application it came from. It should at minimum allow users to query across all silos, and then invoke the most common operations as appropriate to whatever results were ultimately selected as relevant. Make sure this convergence application — which is really the platform (along with the back-end) for offering a consolidated experience — can be expanded without rewriting — through a modular approach.

7. Layer on capabilities that further drive productivity, conversion or collaboration — such as personalization and collaboration.

The architecture ends up looking like this:

Attivio Convergence Application Architecture

In time, you will realize massive cost-savings as you stop investing in each application (and each update) for the end-users. The converged application will cost less in terms of training and support, while keeping people engaged with a truly integrated approach to their interactions with your company — yielding pervasive new revenue opportunities.

Attivio is proud to work with several organizations that have successfully implemented enterprise information convergence. If you'd like to learn more about this, please contact us.

There is a clear benefit to assembling the things needed for a particular task in a single place. This is incredibly true in business, where managing the distance between design, manufacturing, delivery and service is vital. Doing so not only reduces costs, but also drives meaningful competitive advantage.

In the world of tangible goods, this strategy is manifested in many interesting ways. In the 70's and 80's, many companies began building platforms on which variations of the same product could be assembled. The Chrysler K-Car. The Sony Walkman. A common "core" drives down cost and thus increases profits. Another variation was the "platform store", offering vast numbers of products at the lowest possible prices. Companies like Wal-Mart and Lowe's used their buying power to expand their reach, pass on greater savings to customers, and produce record profits.

In the 90's, the Internet emerged as an exciting new mechanism to deliver and support unprecedented combinations of goods and services. Companies like RCN began offering phone, Internet and cable TV services as a single product with a single bill. In retail, Amazon, consolidated what would previously have been an entire mall into a single website. Markets that were once safe and discrete, like movie rental, are now gone — mere products in a much larger catalog, available from a few huge online players (Amazon, Netflix) at historically low prices. The ultimate evolution of this strategy is a platform that supports consolidation, which in turn helps provide a complete, high-level customer experience.

The emergence of mobile computing produced an even bigger consolidation opportunity: to grab entire markets with a single platform. Apple's iTunes, combined with the iPod, iPhone and/or iPad, literally replaces thousands of products in the B2C space. Here's the list of products one might reasonably expect to replace with an iPhone 5:

Phone, Rolodex, Calendar, Still Camera, Video Camera, TV, VCR, DVD Player, Digital Picture Frame, Map, Globe, GPS, Browser, Weather Report, Calculator, Note Pad, Email, iPod, Music Store, Voice Recorder, Flashlight, Travel Clock, Stock Portfolio tracker, Newspaper, Zagat Guide, Compass, Dictionary, Language Translator, Magic 8-Ball, Portable Hard Disk/Flash Drive, Metronome, Video Game Console, Encyclopedia, Book, Remote Control, Radio, White Noise Generator, Guitar Tuner, Barcode Scanner, Video Conferencing System, Guitar Amplifier, DJ Station...

How many companies will lose huge amounts of market share because of Apple's juggernaut of a platform? It's hard to imagine but one might be tempted to say "most of them" — the B2C ones at least.

Google has its own version of this, though not quite as complete. Google's web search plus their various services cut across many online services, and their Android phone offers much of the same replacement effect as Apple's does, especially especially in light of their just announced purchase of Motorola Mobility.

It is not short-term analysis that leads to Apple's massive valuation. Apple is pursuing strategies that will lock consumers in for decades and produce massive profits and growth beyond what any single market leader could expect. It is also not an accident that the mobile Internet enabled the company. Intangible goods are a growing currency with worldwide adoption potential. Companies like Microsoft and Sony, which couldn't move past incremental improvements in their huge product lines will need to work very hard at catching up, and it may well be too late.

APPL-MSFT-SNE.jpg

Many people, myself included, have begun to refer to the general approach of building a platform that enables consolidation as a "convergence" strategy. The telecom industry has used the term for years, and the dictionary definition certainly supports it: "...the merging of distinct technologies, industries, or devices into a unified whole ..."

In my next post, I will discuss how a convergence strategy can produce equally great results within the enterprise.

More Articles...

Page 1 of 10

Start
Prev
1

Attivio on LinkedIn

 

blue-rss-icon.png

Enter your email address:

 

Articles by Date

Recent Posts

Thinking Like a Tester

As a member of what was back then, just a three-person QA team, my heart sank when I read the title of one of our early...
Read More...

What AIE and unified information access mean for developers

There has been a lot of press recently on unified information access and how it enables business users and IT staff to reduce the time it takes to provide...
Read More...

The (Real) Semantic Web Requires Machine Learning

The (Real) Semantic Web Requires Machine Learning
We think about the semantic web in two complementary (and equivalent) ways. It can be viewed as: • A large set of subject-verb-object triples, where...
Read More...

More on Triples and Graphs

More on Triples and Graphs
One of the follow-up questions I've received regarding the post on Triples...
Read More...
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8