The Patented Composite Join in the Attivio Platform
One of the foundational technology differentiators of the Attivio Platform is the ability to perform Query Time Joins of data across both structured and unstructured data. Last year we received our latest patent on an extension of that technology called a “Composite Join” and it has enabled us to deliver some awesome solutions for our customers.
The Query Time Join
Before we get into composite join, let’s take a step back. The concept of a join between two tables is well understood in the realm of databases. For example:
- The ‘Customers’ table has information about your customers such as their customer id, name, address, email, etc.
- An ‘Orders’ table has some information about what products they bought, for what price, when and usually some key such as their customer id to link them back to the customer table.
In a database, it’s simple to run queries across these tables using a SQL JOIN. This enables us to answer questions like:
Show me all the customers who live in Myrtle Beach and ordered Product X last year
If you extend this into a knowledge management or general search context, it’s easy to see how you might want to split up document content from document metadata. The content of a research report or news article might never change but the metadata (tagged keywords, how many page views, permissions, etc.) might change daily. If the system had to reprocess the entire document every time metadata changes, it would be very expensive. Separating the two solves this problem in a very elegant manner.
Attivio has been doing this for years using our patented JOIN technology.
The problem with this solution – without composite join – is that you need to know which part of a document your query terms are in. This is true even in a database. For example, if you just want to find documents that contain the keywords “machine learning” and “artificial intelligence,” you don’t necessarily care if they match in the document content or the metadata. You could express this query as:
Show me content parts that contain ML + AI
OR
Show me content parts that contain ML + metadata parts that contain AI
OR
Show me content parts that contain AI + metadata parts that contain ML
OR
Show me all metadata parts that contain ML + AI.
As you can see, having to enumerate all the permutations gets expensive as the number of terms in your query increase. It gets even worse if you have more than one part of a document.
Composite Join
Enter composite join. This capability allows us to treat all parts of a document as a single entity for matching purposes, but maintain the separation for update purposes. Even in an incredibly complex use case, you can express the query in a simple fashion:
Show me all COMPOSITE_JOIN(documents + metadata + user comments + admin notes) records that contain AI + ML + Attivio
In the example above, when metadata changes – users add new comments about a document, admins make editorial notes – it is immediately reflected in the index, without having to reprocess the source document. The query results reflect the changes.
We have used this composite join technology to drive the use case above in a number of customer implementations, and we are starting to see its applicability in some new and interesting use cases, such as expert location. Instead of using documents as the main result, you can use people as the main record type. Then you can bring in authored content and personal profiles via composite join. Using our text analytics capabilities, we can identify entities and skill sets in the unstructured text, but allow users to see people/experts when they run a search, instead of documents.