Unified Information Access Blog

Welcome to Attivio's Unified Information Access Blog. Join us for discussions on topics ranging from enterprise search solutions, information access insights, Agile software development methodology to programming with Java. We hope you'll find the articles informative and participate in the discussions by leaving a comment.

There has been a lot of press recently on unified information access and how it enables business users and IT staff to reduce the time it takes to provide information to make better business decisions. The Active Intelligence Engine not only provides value to business users, there are also a number of advantages for developers that make life much easier:

Index now, think later

One of the greatest advantages of AIE is the ability to facet and join on fields without having to do a lot of preprocessing or, more importantly, design work. For example, you might not know ahead of time that your customer database records are linkable to customer comments on your website, but you can easily find out with a single query after both information sources are indexed. A number of POCs and development spikes we have conducted have followed the pattern of indexing everything possible and then trying to infer relationships using queries. Unlike a database where primary and foreign keys must be setup ahead of time, the index does not require this sort of predefined and rigid schema definition. Also, many of these features are so cheap to leave on, that tuning isn't always necessary.

That's not to say that there aren't advantages to doing some tuning of the schema or the ingestion workflows, but it isn't required in order to start seeing the power and ROI of using our system.

Develop locally, deploy globally

Developing for a single node, single JVM system is a straightforward process for most any platform. Some platforms also make it fairly easy to write business logic for a large distributed system. The key advantage we've found is being able to develop and test in a small localized environment, but then deploy to a large distributed system and not be surprised by system behaviors. In addition, it's important to be able to use a standard debugger when developing locally in a fully functional system, but then also to be able to use the same debugger in a distributed environment. AIE topology files provide an abstraction that separates the system functionality from the system deployment. This allows operations teams the ability to scale the system for QA, staging and production environments without having to worry about functional issues with the configuration.

Some other systems like Hadoop force users into somewhat complex development models. AIE's development models strive to support the "I want to do X to Y" in the simplest possible manner. We ship a sample transformer that implements some simple business logic and more importantly, we ship a unit test for the transformer.

Learn one API, let us handle the details

One of the hardest parts of building enterprise wide applications is the need to work with multiple different APIs. In addition, each system has its own idea of what a user is, what it means to have permissions to read a document and more importantly, what a document is to begin with. If you can't define and normalize all of these concepts it's impossible to join, group, categorize and make decisions based on the data. AIE not only provides connectors to these back-end systems, we also handle normalizing each system's data to a standard format that is accessible via our API. A user in Active Directory can have permissions to a document in Documentum and a document in SharePoint. More importantly, the permissions are applied transparently at search time so that developers don't have to worry about doing any sort of post filtering of results.

Attivio's development environment is meant to hide all of the enterprise ugliness from developers and present a single user, group, document, acl, and query concept. If userX can see records in 10 repositories, we handle the details. If you want to join data from your internal SharePoint server to your CRM system based on a support person's contact information, we can do that for you as well.

Author Bio

Since graduating from MIT with a degree in Computer Science, Will Johnson has worked for Altavista and FAST for over 7 years. At Altavista Will developed AV's real time indexing solution used by news aggregators who demanded instantaneous access to news as it arrived. In addition he was one of two engineers responsible for developing the Altavista QIndexer product that was used by the large majority of AV's customers. At FAST, Will developed high speed database connectors as well as developing search UI's and tool sets used across the organization. In addition Will also worked on many of the largest and most complex sales engagements and deployments for customers around the world, specializing in distributed systems for many of the largest internet publishers, directories as well as internal knowledge management systems. Will is a founder, one of the Chief Architects at Attivio and a really nice guy.

In a recent blog entry posted at IDG Connect, I wrote about "connecting the dots", in this case recasting the dots as being the silos in which enterprise data remains heavily trapped - incompatible and difficult and costly to combine. Anyone who regurlarly reads this blog understands that I firmly believe that unified information access (UIA) offers real solutions to this challenge.

However, there is a lot more to say about connecting the dots. Some organizations view the problem differently; they can, for some job functions, teams or even departments, bring together an awful lot of information in one place, especially if it is all of the same or relatively similar type. Analysts who depend on published news, for example, probably worry less about the silo problem, because these silos are getting easier and easier to unify. A feed aggregator, more often than not, can provide a lot of value in this regard.

The challenge, in this case, then becomes one of finding the dots of interest in an ocean of dots.

Databases have long had an approach to this...write a better query! Many BI applications ask users to reduce the universe of possible answers very early on by focusing on a date range, specific source, or other "dimensions". The goal is to reduce the scope of the query so the user can more easily comb through the results. The problem of course, is that a reduction that's driven by dates may eliminate the interesting "dots", or, more problematically, that there simply are not enough dimensions defined.

In the UIA world, techniques like faceted browsing can be used to disambiguate any query using any structured data. For example if you search for "bond" it may offer you a choice of more specific noun-phrases such as "James Bond", "Bond Trading", "Chemical Bond" or even higher level categories like "Movies", "Finance" and "Chemistry". The advantage of faceted browsing is that it is dynamic — based on the data available — and interactive — as you click on a facet you reduce the result set and are presented with new relevant facets. Attivio's Active Intelligence Engine (AIE) actually recommends the best facets for each query, and can be used on a per-source basis to show facets that cut across silos, and those that drill into them.

Exposing useful information to allow the user to disambiguate a somewhat broad query is just the first step, however. It's also important to create additional metadata that can aid in this process. Identifying and fielding entities (such as people, locations or organizations) is an example of this. Analyzing the sentiment of documents, or entities within documents, is another. Being able to classify content to a particular set of subjects is another. All of these things create more facets that aid the user and help filter down a huge result set to a more modest and hopefully manageable one.

Another important step to help users identify dots of interest is to retain and use relationships between them. In a previous post on this topic I explained the difficulties of handling highly structured data in a search engine. Flattening information loses relationships. In the context of identifying interesting dots, however, relationships are extremely important. A "dot" may be interesting because they have performed certain transactions, are a member of particular group, or are related to another dot.

Using a UIA approach of combining relationships and content enrichment approaches, along with traditional dimensional analysis, brings a whole new set of possibilities to those seeking to "connect the dots".

I am in Tel Aviv enjoying a beautiful summer afternoon with a new customer when my mood is altered by a credit card snafu. Back in the states, a wary bank has decided to turn off my credit card due to unfamiliar charges. I quickly explain who I am and provide the security information necessary to activate my card...and pay for the meal.

My colleagues at the dinner begin to laugh as it is often the case that international visitors, particularly Americans, have this issue in Israel. Israel has a very large percentage of foreign travellers, some of whom are not law abiding and have a habit of stealing credit card information and using it to pay for their holidays.

Of course to me, this just represents another opportunity for Attivio to assist the world. What if the bank fraud department was using Attivio?

The suspicious charge would be received and flagged. Using the workflow capabilities in AIE, the bank could create a set of rules that are triggered when a transaction is flagged by the system. This could include a set of auto-run queries that are executed against the data pulled from their systems so that a credit administrator would receive an alert on their dashboard, with all the related charges and could quickly surmise that the charge was not fraudulent. These workflows would return results that show the credit card had been used to purchase my airline ticket to Israel, confirm the ATM withdrawal at Newark prior to departing for Tel Aviv and show that I checked into the hotel successfully.

With Attivio, the bank not only has the ability to detect fraud and avoid annoying a customer, it also has the opportunity to invoke a revenue generating campaign by offering potential travel or purchase suggestions to me. Perhaps a rental car special, travel insurance, accident coverage, special dinner promotion at a great restaurant or maybe a night in Jerusalem. Attivio enables savings, new revenues and most importantly...brand loyalty, as a card as smart as one powered by Attivio will be used forever.

Shalom

More Articles...

Page 1 of 8

Start
Prev
1

Attivio on LinkedIn

 

blue-rss-icon.png

Enter your email address:

 

Articles by Date

Recent Posts

Thinking Like a Tester

As a member of what was back then, just a three-person QA team, my heart sank when I read the title of one of our early...
Read More...

What AIE and unified information access mean for developers

There has been a lot of press recently on unified information access and how it enables business users and IT staff to reduce the time it takes to provide...
Read More...

The (Real) Semantic Web Requires Machine Learning

The (Real) Semantic Web Requires Machine Learning
We think about the semantic web in two complementary (and equivalent) ways. It can be viewed as: • A large set of subject-verb-object triples, where...
Read More...

More on Triples and Graphs

More on Triples and Graphs
One of the follow-up questions I've received regarding the post on Triples...
Read More...
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8