Home > Platform > Compare > Legacy Enterprise Search

Search Is Not Enough

We are a company deeply rooted in search. Our combined experience includes over 100 years working for some of the biggest search companies in the industry. We have come together to build something better at Attivio. Our products are assembled from scratch, abandoning all legacy technology to bring you the sleekest, easiest to use and most powerful engine out there. Learn more about why we dominate our competition.

Workflow

AIE gives you complete control for indexing content, processing queries and returning results by passing them through multiple processing stages before they reach their destinations. These stages are organized into workflows that support branching, conditional logic and parallel processing. Most of the stages are provided out of the box (e.g. content decomposition, term extraction, results sorting), but you can also create your own. Workflows are unique to AIE. As an example, zip files and emails with attachments have historically been problematic with existing search engines, but with AIE, they are processed automatically through a simple looping workflow that indexes the container first and then the contained items. Video and audio workflows can process the files first for their meta-data (defined and derived properties) to make them visible to the search community immediately. They can then spawn a separate task to generate the voice-to-text transcription, that when completed, can be added to the meta data in the index at a later time (transcriptions take a long time to run because they do so in real time with the media).

Active

Most enterprise search vendors can change a query into an alert and let it persist, watching as new content is brought into the index to see if it satisfies the query conditions. For some of them, you may also define a simple action, for example, send an email. AIE’s active capabilities go much further. AIE lets you define an alert or trigger anywhere in a workflow and use any of the bi-directional connectors to ensure the action reaches its target. Examples of alerts include sending notification via email or to a mobile device through SMS; writing enriched data back into a database through SQL; triggering another application to take action; or posting an event into an MQSeries event queue.

Facets

For exploration, the search bar is not the user interface of choice; the navigator, or “facet”, is. A facet is usually a piece of text on the screen representing a property of an object, general concept or term, or part of a graphic tag cloud or heat map, that when clicked returns information relevant to the facet. Facets are generally used to refine a set of search results by “drilling down” through terms or properties to progressively define the details of the query until only the “right” results remain. Facets are especially powerful for discovery because they guide you through the content based on what the content says.

The problem with search engines today is that you must define your facets before the indexing even starts, and if you wish to change them, you likely will have to re-index the content again. Not a pleasant proposition, especially when it can take weeks to build an index. AIE’s unique patent-pending approach dynamically recommends the best facets for each query based on the query’s results. It also recommends the order in which they should be displayed. For example a search for “laptop” might recommend the facets “CPU”, “screen size”, and “weight”; whereas a search for “server” might recommend “CPU”, “memory”, and “number of rack units”. AIE will still allow you to create static facets. Combining the two approaches is often the best solution. In our example, “price” can be defined as the first facet for every search.

JOINs

The JOIN command in the SQL language is the key command of the relational database environment. It defines the cross-section of results among two or more database tables. For example, a request for “our 100 best-selling products in the last quarter” would extract the invoices by “JOINing” the table of all products with the table of all invoices where the intersection would be all the products whose invoices with dates in the last quarter add up to the 100 largest total amounts. The JOIN is possible because the invoice table contains a product ID number that links to the invoice’s product in the product table.

Now, imagine extending the JOIN to unstructured content like documents and email. This is another unique feature of AIE. To illustrate, let’s change our example to “blog and press information about our 100 best-selling products in the last quarter”. A database engine would reshape the web logs and RSS news feeds to fit in the database and then perform the JOIN. The challenge would be to determine which logs and feeds are relevant to include in the first place. A search engine, on the other hand, would select the relevant logs and feeds, but determining the products would be hard. At the very least, the final search query would be quite long, “OR’ing” together every product name. AIE’s unique JOIN feature understands how to JOIN any two objects that conceptually share a common property. The property could be a field in a database, a tag in a document or an entity extracted from the content of the text itself. Our example is now doable. How search engines tackle databases in general is a broad problem. The conventional practice is an a priori SQL query executed at index time rather than query time. This means the index only ever has one, static set of database results. To execute a different query is to reconfigure the index and re-index the content. AIE extracts all the data from every table in the database at index time, but performs the JOINs at query time within the AIE engine using various techniques like MapReduce. Aside from greater flexibility in data response, the system performs much, much faster.

Scale & Performance

AIE was designed from the beginning with scale and performance in mind.

Here are some key statistics:
Engine core occupies 20MB disk space – most search engines are around 1GB
Content in index is 20-40% of its original size – with some search engines it is 300%
Engine can ingest content at a rate of 1,000 docs/sec (about 30GB/hour) – possibly the fastest to date in the industry

These are real-world numbers all achieved on the same server, a standard production box typical for search index storage and a document size typical of enterprise content. The system was not tuned separately for each test. Also, the index scales linearly like many systems do, but with AIE you can add the hardware when needed without impacting the running system because you do not have to re-index to expand the index capacity. This means lower setup cost. You can even physically partition data in separate data silos.

 

case studies

OEM/Embedded Compliance Solution
Problem:
A well-known media company needs to audit all the content in their organization on a regular basis to see if any information breaks corporate privacy rules (e.g., PCI,...more

news

article imageNewsroom

Wednesday, March 10, 2010 - Enterprise Systems, article by Attivio CTO Sid Probstein Balancing the... more

article imageEvents

Enterprise Search Summit, Hilton New York, May 11-12, 2010 Join Attivio at the Enterprise Search... more

industry leading facts

  • AIE uses a probabilistic relevancy model, and allows tuning via explicit or implicit feedback. With AIE you can let the engine do its job automatically without...more