Attivio's Active Intelligence Engine (AIE) is built on a scalable parallel asynchronous messaging system (more detail later). One of the aspects of such a system is that messages do not usually get processed in the same order in which they are sent. Usually paying this price for a high-speed scalable content ingestion system is a no brainer. But there are cases where the order of processing is a nice-to-have, and some cases where it is absolutely required for correctness. Some examples:
The ability to support these types of operations while maintaining a high-throughput scalable system that can ingest structured and unstructured content is a key requirement of unified information access (UIA). The alternative is to pre-join all your data which dramatically limits the types of queries that can be executed and the way updates to content are processed. AIE has always been able to handle these use cases, but until recently it was a computationally expensive option.
New in release 2.2 is the ability to support the processing of messages within a group together and in order while letting all other messages and groups be processed independently. This is a unique capability among ETL-like scalable data processing systems like ours.
Grouped message processing allows a set (usually small) of related messages to be processed as a group (in order) when needed. This capability can be turned on or off by individual document processing transformers. What this means is that when the grouping isn't required (the transformer doesn't have side-effects which depend on the group ordering) then the messages are processed independently, delivering the maximum throughput. When a document transformer does require this transaction-like behavior, a simple configuration change is all that is necessary. This change causes the following semantics to come into play for the component:
The component that most heavily uses message groups is the ContentDispatcher (the gateway component for the AIE index).
Message grouping is easy to use. In the client API example below, doc1 and doc2 will be processed as one group and a DeleteByQuery and doc3 will be processed as a second group.
Content feeder example
ContentFeeder feeder = new ContentFeeder(...);
feeder.startMessageGroup();
feeder.feed(new AttivioDocument("doc1"));
feeder.feed(new AttivioDocument("doc2"));
feeder.endMessageGroup(); // must always be called at the end of a group.
feeder.startMessageGroup();
// an example of replacing a doc and all children as a single atomic event
feeder.deleteByQuery(new WorkflowQueue("defaultQuery"), "parentid:doc3", QueryLanguages.SIMPLE);
feeder.feed(new AttivioDocument("doc3"));
feeder.endMessageGroup();
The architecture of AIE is based on the Staged Event-Driven Architecture (SEDA) pattern. In SEDA, each pool of components has a work queue in front of it. Components (document transformers are a type of component) work on their input queue and forward system messages to the queue of the next component in the workflow. The SEDA architecture allows AIE to manage processing via sizing the queues and component instances all while processing content in an asynchronous fashion. As a result, back pressure can be applied if one component gets overwhelmed and that back pressure will eventually flow all the way back to the client application.

AIE supplies pluggable transports that allow components to be located in separate processes and on separate machines. In this way, processing can be scaled across multiple machines as needed. When a message (which may contain multiple documents) is received it is transferred to one of the available instances of the component for processing.
