Predictive Analytics Module

Glowing globe with bar graphAttivio’s Predictive Analytics Module adds the power of prediction to the Active Intelligence Engine® (AIE), making information quickly actionable across every possible source. With Attivio’s Predictive Analytics, data scientists and other professionals who have moderate programming and data analysis experience can easily create predictive models, using large amounts of data and human-generated content.

Recent research from MIT also shows that traditional predictive analytics error rates are reduced by as much as 30% when combined with unstructured information that more accurately reflects sentiment along with actual behavior.¹

The three key components of the Attivio Active Intelligence Engine Predictive Analytics module are:


Attivio’s Predictive Analytics module includes a generalized component that uses a wide variety of data for training and classification. For example, it can learn how to assign users in a social network into groups based on their emails.

A number of different sub-techniques are offered, including support vector machines and logistic regression.


Using the same approach described for classification, we train and predict using numbers rather than labels. An example would involve learning how many stars to assign a product rating, based on the review text. Another example would be stock market predictions based on news articles.

As with classification, there are a number of regression techniques included in the regression component including, among others, least-squares regression.

Pair Correlation

Another common predictive analytics challenge is finding the statistically significant co-occurrences between pairs of items. For example, a customer might have a stream of sales transactions, within which are pairs of items that a customer has bought at the same time. We can find the items that people tend to buy together most often, so, for example, if the retailer sees a customer has selected product A, they can recommend to the customer that they might also want product B (since many other customers who bought A also bought B).

Use Case Example: Stock Prediction

In a recent test, we collected a month’s worth of financial news (approximately 10 million articles) plus the opening and closing prices of a dozen diverse stocks. For each stock, we found all the articles that mentioned that stock and then combined all the words from all those articles into a “bucket.” This produced a data vector for a month of trading days and the percentage the stock went up or down each day. We then did a least-squares regression for each stock, to predict its movement each day to produce a model.

We tested the model over ten days making virtual market trades as follows: start the ten testing days with $100 in each of the dozen stocks. Each day, we would buy the stock at the start of the trading day and sell it at the end of the day, if the model predicted the stock would go up, or sell the stock at the start of the trading day and buy it at the end of the day if the model predicted the stock would go down.

At the end of our ten virtual days, we had made a 4.7% profit. Our control, a random strategy, would have lost about 0.1%. Note that the results were statistically significant, so we weren't just lucky!

This is a particularly good example because these results did not come from a deep understanding of stocks, but rather from a straightforward application of regression models to unstructured content.

¹ The Future of Prediction: How Google Searches Foreshadow Housing Prices and Sales, Professor Eric Brynolfsen, Massachusetts Institute of Technology (MIT) - Sloan School of Management; National Bureau of Economic Research (NBER), August 20, 2013.