CTO Will Johnson on Attivio Past, Present, and Future
The search market has come a long way from 2007 when Attivio was founded. We took some time to chat with Attivio co-founder and CTO Will Johnson about just how far Attivio has come and how the search market itself has evolved.
Q: You are one of Attivio’s founders. What’s the story behind Attivio’s origins? What challenges were you addressing?
WJ: The original founders of Attivio came from various search backgrounds: FAST (Fast Search and Transfer) which was sold to Microsoft, Altavista, and Northern Light. We had a lot of deep search expertise.
More importantly, though, we saw some common threads in terms of problems that the search technology used at the time wasn’t suited to solve. We referred to these as “repeat offenders” regarding problem statements and friction points with customers when deploying solutions.
A lot of the problems were with structured, or relational, data. Think about Google - you’re looking for documents using keywords. But with a database you’re looking for analytics or relationships between data, or maybe you want to understand aggregations across relationships of data and things like that. A search engine isn’t really able to handle structured data well. It can handle structured flat data, but not relational data.
We had this idea to use what we knew about search and what we understood about these common problems to build a better search engine (with stronger technical capabilities) that could solve these problems in a more efficient manner. So we came out with this concept of how we would handle relational data and got our first patent on it.
That patent is still in the core foundational component of our technology. It brings together different types of data. For example, from a compliance standpoint, it might be bringing together emails and trades to understand how people are discussing potential risky behaviour. Or security concerns and making sure the right people have access to the right information, the right documents at the right time, or it might just be more traditional structured type queries around customer service notes or things like that. The technology is applied to a lot of different applications, but under the hood it’s the same foundational capability that handles both structured and unstructured data in a search engine.
Q: What were the key drivers in the search market in 2007?
WJ: The big driver was simply the increasing complexity of the requirements. In 2007 and even before then, search mostly focused on document searches for email, content management systems, website search and things like that. But even with document management people wanted to bring in metadata and security, and the requirements were starting to get more complex around structured data.
We started seeing requirements around customer 360 applications where you need to bring in every piece of information about a customer - emails, support tickets, news articles, and so on.
Q: How has the market evolved since then?
WJ: In general, most search players have started to focus on specific areas, including Attivio. We started to focus a lot on risk and compliance for financial services, and we’ve also done a lot with Big Data. It’s really about getting deep expertise in specific verticals and problems statements. There’s still general website and Internet search, but these are close to being commoditized - they aren’t at the top of everyone’s mind. It’s much more about business applications - such as anti-money laundering, email surveillance, drug research and things like that.
Q: What makes Attivio’s technology unique?
WJ: So it’s not just about finding a list of documents, it’s about finding a list of documents as they relate to some other structured or unstructured data. That’s where Attivio’s technology is ideally suited - our ability to handle both structured and unstructured data and let people consume it in the way they feel most comfortable.
For me, I’m a search person, so I feel most comfortable writing search queries, poking around at facets and drilling into the data. But we also support SQL, ODBC, and JDBC. There are thousands of Tableau analysts or people who understand SQL; they can get all the same insights and analytics from the data using the tool they feel most comfortable with, whether it’s visual, scripting or development. Attivio enables everyone to search the data the way they want.
Q: Attivio adapted to the market, and launched into the Big Data and BI space in 2015. What major changes did it go through?
WJ: We did two main things. First, we took the platform itself and ported it to run natively on the Hadoop ecosystem. We recognized that Hadoop is the de facto standard for large distributed computation and storage and started porting over 18 months ago, to take advantage of all that plumbing that’s out there for us. We get to write and support a lot less code and leverage the community. A lot of customers are pulling us in that direction as well.
Second, we recognized this trend of people building data lakes to bring together lots of different types of data, but at the same time not having a good sense of what they have or where their customer records live (or transactions or support tickets). We built a Semantic Data Catalog application on top of our platform that can go out and connect to various backend systems –
something we already knew how to do from our former search-centric days. It pulls in the metadata and uses our text analytics to understand what data is out there and what it means, enabling data scientists or analysts to find datasets to work with for an analytic purpose.
Q: There are hundreds of vendors in the Big Data ecosystem, and customers are struggling to build their data stacks. How does Attivio help with this?
That’s the key; there’s a stack involved. A lot of vendors focus on the backend, the Hadoop ecosystem. There are new types spun out every day for query/database type functionality. There are also legacy vendors like Teradata and Oracle making changes in their products to facilitate storing and querying more data.
There’s also a lot of work being done on the front end from vendors like Tableau or other visual analytic and tooling companies. But there’s a piece in between. How do you get the data sets or know what data is available to use?
Attivio sits right in the middle, as a catalog and a facilitator between the people that own and produce the data and the people that want to consume it. We sit in that best of breed catalog space. We also OEM our product into larger solutions with companies like Dell EMC and Tibco, who are trying to deliver more of the full stack completely.
Q: Could you tell me about a customer project that highlights the value Attivio offers?
WJ: Let’s talk about Dell EMC. Things started with us getting involved in an internal IT project. EMC was looking build their own catalog for their internal EMC data lake and selected Attivio to build the catalog on. This was before we had the catalog solution; it was one of the motivating market events we saw that led us to start building our own solution. EMC is very happy and moving to the Hadoop-based version of our product in the next month or so.
Once we were in with their IT group, one of their product groups in their Global Services division saw what we were doing there and wanted to bring us into their Analytics Insight Module. The module is a full stack data lake from hardware and storage all the way up through software and visualization type technology. Attivio is embedded as the catalog for onboarding data from outside the lake, Attivio facilitates identification of what’s relevant outside the EMC ecosystem and helps pull it in.
Also, EMC built a search-based application for part of the software of the Analytics Insight Module. It needed search capabilities, and EMC uses Attivio behind the scenes in more of a complete OEM white label solution.
Finally, all of EMC’s customers that the data lake can use our product to do text analytics and build high-end search applications.
It’s the same platform, same technology, using all of our different capabilities from the catalog to the platform to text analytics, both internally and externally.
Q: Attivio recently announced a new patent. Can you tell us about it?
WJ: One of our core capabilities is around text analytics, but doing sentiment analysis is a pretty old trick, and we wanted to take it a step further to understand the sentiment for specific entities, even inside a sentence. The best example I can give is: “I love my iPhone; I hate <insert carrier here>.” Those are very different sentiments, but if you looked at the sentiment for the document it would be neutral (one positive note and one negative note). However, if you are doing analytics over mentions of the word iPhone, it’s actually positive.
Our patent is a method and implementation to tease out the differences between those two sentiments and enable analytics across the top of the using our platform. We have customers using it in production today; it’s on by default. We’ve done all the work to package it up and make it available to our customers out of the box, or they can train/tune their models. (Editor's Note: Read the press release to learn more about our latest patent.)
Q: What's next for the market? And for Attivio?
WJ: It’s about focusing on the main products we offer. We want to get deeper into the text analytics market. We recently announced new capabilities, and we have additional capabilities coming out over the next year.
For the search application, it’s about getting more into risk and compliance. We have new features coming out this month speaking to policy management and rule-based alerting around risk and compliance solutions. The focus here is to have better tooling, better capabilities, and smarter solutions.
And then there’s the catalog. We are continuing to improve the semantic nature of the catalog and do more integrations into backends and frontends. If the catalog is the glue between data and consumers, we want to ensure we have as many points of integration to as many systems as possible.