Have you ever wanted to get all the documents that match your query? I'm not talking about all of the top 10, or even all of the top 100. I'm talking about all of them. Trying this with a few popular search engines reveals some interesting results. Google gives you a nice error message:

Yahoo on the other hand, quietly (no error message) returns the last page of results numbered 990-999 no matter how many you ask for.
MarkMail (a site we really love by the way) isn't quite sure how many results there are:

Most eCommerce sites that are powered by traditional enterprise search engines are even worse, they never let you see more than 10-50 results period since that makes catalog scraping easier. A number of other legacy search vendors have hard coded limits on the number of results that can be retrieved for any given query or suffer significant performance issues as you ask for more and more results.
While this model may work well for internet portals it can have disastrous results inside an enterprise. Imagine telling a federal judge you can only get him the top 1000 emails from your client about the insider trading case or telling your CEO you can't get all of the data for the recent sales activity into the quarterly report but that the top 500 should be enough.
Traditionally speaking, search engines are very good at getting you the top 10, 20, or maybe even 100 results. Again, this is fine for portals since most users of those systems never get past the first few results, much less the tenth page. Databases on the other hand are very good at returning all the results that exactly match a query but they have no concept of the 'top 10' or a 'fuzzy match.'
Generally each of these systems serves certain purposes quite well but they definitely have their limitations. As mentioned above, a legal discovery situation might require finding all emails from person 'X' that mention any form of the company name 'Y'. Alternatively you might want to get back all the news stories that mention a certain set of keywords for further review. A database is good at the 'return all' aspects of these queries while a search engine is good at the 'any form of' and 'keyword' aspects.
On the input side, databases are very good at joining large sets of filter criteria against a result set but search engines generally require you to formulate your searches with the full filter set expressed as a single expression (a OR b OR c OR d OR ... ).
Getting one system to meet all of these requirements has been very difficult using existing tools. If you were able to request this kind of information it usually meant you needed very large memory spaces or that you had to address the possibility of results changing underneath you while executing a search. Even if you were able to make all of this work it was very sensitive to data volumes, query types and a host of other issues... until now.
In AIE version 2.2 we shipped a new Beta feature we plan to officially roll out in our next release which will give users the ability to request that all results be returned for a given query. It will also let you request all the facet values for a particular query. On the input side, it lets you stream in a list of filter criteria instead of creating a huge filter expression. Most importantly it works with any type of query and has no memory overhead on the client or server.
So far we're using search result streaming with some integration projects where we're federating between AIE and large databases with 10s and 100s of millions of records flowing in either direction. We send large results back to BI tools and databases and they send us large filter criteria lists and no one has to worry about all of the edge cases that might cause system instability.
We're sure there are other uses cases though. Imagine what you could do if you could request all the results from an internet search provider for a given keyword, or all the pages period. We'd love to hear about ideas you have for this kind of functionality.
