Text Analytics: A Spectrum of Enrichment Techniques
For cognitive search to work, you need text analytics, which is why it’s a key component of the Attivio Cognitive Search Platform. There is a range of capabilities within text analytics to understand, so I thought I’d take you through them and explain how they work.
From Directed to Discovery
Text analytics is the process of extracting valuable information from text-based content - or unstructured content - for business purposes. It seems simple, but it’s far from it.
You can direct your text analytics. When directed, you know what you want, you simply need a tool to extract the information. For example, you have a list of product names or stock tickers, and you want to get information on these from external media sites.
In other cases, you don’t know what you want. You need the system to tell you something interesting about the content. In this discovery mode, you look to text analytics to surface insights you might not be aware of.
From directed to discovery, there are many types of text analytics capabilities you can leverage. You can see these in the diagram above, so let’s take a look at each one.
Taxonomies/Ontologies: Attivio maintains a number of taxonomies for different verticals, but you may also maintain your own lists that you follow. Products produced, industries followed - these are examples of vertical taxonomies you might current maintain. Attivio can pull your taxonomies into the solution in different formats and use them to drive capabilities such as auto tagging or synonyms.
Dictionaries: Dictionaries are a list of common things, such as vendor names, product names, error codes. In this case, you want the system to tag both in place and extract the information as metadata. Attivio has a full system designed to allow business users to manage their dictionaries, as well as approve and publish them into the live environment. We have the infrastructure on the backend to process your content with these dictionaries.
Patterns: Patterns are similar to dictionaries, but not quite as well defined. You use patterns when you don’t know or can’t enumerate all the possible values, but you know there’s a standard pattern. Good examples here include credit card numbers, social security numbers, phone numbers, and part numbers. Attivio’s pattern-based entity extraction can even understand complex patterns like US addresses.
Machine Learning: You apply machine learning (ML) to classify content. Using ML, you can train the system to understand the difference between content, such as the difference between a 10k and a 10Q earning report, or between a product brief and a product requirements document.
Sentiment analysis: Sentiment helps you understand if your content is positive or negative about a topic. Entity sentiment goes deeper to help you understand sentiment down to the entity itself. I love my iPhone, but I hate my carrier - two different entities, two different sentiments. Attivio can tell the difference between the two and perform analytics across them.
Statistical Entity Extraction: Heading deeper into discovery on the text analytics spectrum, statistical entity extraction (SEE) helps you pull out entities you don’t have a pattern for. Maybe you want to find the next Facebook that was just started in a garage. SEE looks at the context around terms inside text and marks them up as being a company, a person, or a location.
Language Model-based Key Phrase Analysis: The last one, at the far right of the spectrum, helps you look for statistically improbable phrases. In this case, we’re looking for things that stand out as unique or interesting in the text. Use it for trend analysis or spotting outliers.
Which Capability Should You Use?
There is no one type of text analytics capabilities you need. All kinds of companies use all of the text analytics capabilities listed above. Most will start with 3-4 types and add more over time. The good thing about the Attivio Platform is that you can extend and customize it to meet your needs.