The ability to control who has access to structured data is a capability we pretty much take for granted, and yet, this remains one of the biggest challenges organizations face when dealing with their unstructured content. While most information retrieval solutions have added some form of access control, there are still major issues with the two most often used approaches - early binding and late binding. Attivio's Active Security capabilities were specifically designed to address these shortcomings.
Early binding was really the first and most simplistic approach to controlling access to information. The approach basically revolves around the idea of tagging content with the list of users and groups who have access rights to that piece of content during ingestion. At query time, the security system (AD/LDAP/etc.) retrieves the list of groups that a particular user is a part of and this information is turned into a filter that is applied to the query. This approach, while it does provide controlled access, has several significant drawbacks that make it almost unusable.
First, because content is tagged with the access control lists (ACLs) directly; this method requires that the content be re-ingested/re-processed every time the set of users/groups for a document changes. This becomes an almost insurmountable problem in certain scenarios. When you have content that is very large, for example a 20mb PDF, image content to be OCR'd or complex text analytics, you must completely re-process the content, which is extremely time and resource intensive.
Second, as if often the case, changes to ACLs are not applied to single documents, but rather to large sets of content (folders, drives, etc.), so the number of documents that must be re-processed can be huge.
Third, because documents are tagged directly with the complete list of users/groups that have access, including groups-of-groups, even minor changes to the structure of your organization can trigger massive re-processing efforts.
The direct result of these drawbacks is tremendous latency (days in some cases) in updating access control of content. This leaves significant security gaps in solutions that require either limiting access to content or limiting the functionality of solutions. In addition, because systems using early binding require constant reprocessing of content the systems, they require significantly more hardware in order to keep up with the required changes.
Early binding Is a brittle solution that:
The late binding approach came as a result of the previously described latency and brittleness issues of the early binding approach. Late binding requires no processing of content during ingestion. When it is issued, a query is executed with no access control filtering and each result in the full result set is looked up in the system it came from in order to determine if the user executing the query has access to the system. Again, this method also has several significant drawbacks.
First, because access control must be applied at query time, there is a substantial impact on the performance of the system. For example, if a result set is 10,000 results and only 20 of them are available to the current user, the system still must process and check all 10,000 results to eventually supply only the small subset the user is allowed to see.
Second, performing lookups for each document in the result sets in real-time put an enormous strain on the source and security systems. These systems typically are not designed and scaled for this type of interaction, which can negatively impact performance across the entire organization (i.e., others using Active Directory) or require additional hardware investment to scale these systems out to support the additional load.
Third, because of the extensive processing that's needed to provide a filtered result set, most solutions using this approach either do not support paging through the results or, if they do, then performance is terribly slow.
Lastly, because late binding is applied to an unfiltered result set, useful features such as faceted navigation, spelling suggestions, etc cannot be used because the facets returned could potentially include details of documents the user is not privileged to see. In addition, the counts and summations provided along with a result set, including the total result count, will not be accurate. Because of this, extremely valuable tools cannot be utilized or can only be used in a limited capacity due to the security concerns they introduce.
The late binding approach attempts to, unsuccessfully, close the latency gap found in the early binding approach and because of this, it is rarely used.
Late binding has the following shortcomings:
In order to mitigate the issues that plague early and late binding these approaches are sometimes combined to create a hybrid access control system. This hybrid approach provides a base level of access control using the early binding approach while attempting to reduce the latency inherent in early binding by layering late bound access control to close the gap. While this approach can help plug the latency holes introduced by early binding it also brings with it all of the negatives of both systems (content reprocessing, security system load, data leakage, etc.). In addition, the latency reduction is only seen in limiting latency of reduction of access to content, but does not improve latency for increases in access which can lead to over-trimming of result sets.
Attivio's Active Intelligence Engine is the only software on the market today that actually solves the problems described above, plus much more. In order to accomplish this, we first had to take a step back and understand the way access controls are actually used within organizations. The first thing we came to understand was that changes in access control are primarily changes in access to sets - sometimes large - of content, as well changes to the user/group structure itself. In addition, we found that within organizations where access control is critical (e.g. finance, intelligence), that other factors often play a critical role in controlling access to content. These factors include location, device, time of day, temporary access and many more. From this understanding, we designed and built Attivio Active Security.
At the root of our Active Security model is the idea of breaking up the access control problem into its constituent parts; users, groups, documents and ACLs. To accomplish this, Active Security models documents, ACLs and user/group hierarchies as independent records within the Attivio universal index, enabling discrete control by allowing for independent updates to any part of the system. At query time these pieces are brought together, in a single query execution, using a combination of Attivio's patented JOIN operator and Attivio's GRAPH operator.
First, using the GRAPH operator, the hierarchy of groups - including groups-of-groups - that a user has access to is traversed. This set of user/group records is then bonded to the ACLs using a specialized JOIN operation (appropriately called ACL) which performs both ALLOW and DENY operations. This set of ACLs is then bound to the original document set using an INNER JOIN that produces a properly filtered set of results.
Attivio Active Security brings huge improvements in minimizing update latency for access control changes. As ACLs are not bound to the underlying document content, changes, even to large sets, can be made very quickly because the original content does not have to be re-processed.
Unlike late bound security systems, Attivio Active Security does not interact with security/directory systems at query time. All access control is performed within Attivio AIE. By taking the security/directory system out of the query, Active Security both minimizes load on those systems and substantially improves query performance compared to late-bound systems. The only interaction Active Security has with security/directory systems is in periodically updating the user/group hierarchy stored within AIE. This update mechanism takes advantage of incremental update capabilities provided by security systems in order to both minimize load and reduce change latency.
Attivio Active Security has been implemented by some of our largest customers. We've deployed the model on systems containing hundreds of millions of documents and in systems with hundreds of thousands of users and groups, delivering sub-second query response times.
One of the greatest things about Attivio Active Security is how easy it is for both administrators and developers to work with. Deploying Active Security is simple, just point it at your Active Directory or LDAP system to pull in your users/groups and then start ingesting content using one of our connectors. For developers building user interfaces, it is even easier as all they have to do is pass the ID of the user to Attivio AIE when executing queries. Active Security will automatically inject all access controls into those queries, no matter how complex.
While Attivio Active Security is very simple to deploy out of the box, it is also extremely extensible. It was built from the ground up with the understanding that oftentimes the access control requirements imposed by business and legal compliance rules do not map to a standard user/group model. Using the same building blocks, we've customized Active Security to support ephemeral group membership based upon time of day, type of device being used or where in the world the query is issued, simply by adjusting the model. Active Security also supports customization of how documents and ACLs are modeled, allowing for folder-level ACLs, instead of per-record ACLs, to minimize update times even further for larger systems. All of this customization is encapsulated within the Active Security model so, from an administrative and development perspective, interactions with the system do not change.
Since graduating from Rochester Institute of Technology with a Bachelor's Degree in Computer Science, Steve Bower has had extensive experience working with and developing enterprise search and large-scale information retrieval systems. Steve was an early employee at FAST Search and Transfer (now Microsoft) where he worked within their R&D organization to design and develop early versions of the FAST enterprise search platform. Afterwards Steve worked within FAST's professional services organization to deliver enterprise search solutions for some of FAST's largest customers. Steve has worked as a Principal Software Engineer with Attivio's R&D organization and is the Director of Client Engineering at Attivio.
Attivio Active Security Technical Brief
This brief will explore the significant technical problems that accompany early binding and late binding user permissioning for unstructured information – problems that render early and late binding systems impossible to use in practical terms. You will also discover how Attivio’s Active Security model eliminates these issues to provide security access controls that are as easily created and maintained as for a database, as evidenced by two case studies in which Active Security made the difference between business success or failure.