Entity extraction is a core capability of text analytics, so I thought I’d step through an example of how it works.
Note: Learn about the all the different text analytics capabilities in my last post Text Analytics: A Spectrum of Enrichment Techniques.
Consider the following text that talks about Apple and Berkshire Hathaway. Entity extraction lets you pick out the entities in the text (companies, persons, locations) and then help you understand how they are related to other.
For cognitive search to work, you need text analytics, which is why it’s a key component of the Attivio Cognitive Search Platform. There is a range of capabilities within text analytics to understand, so I thought I’d take you through them and explain how they work.
From Directed to Discovery
Text analytics is the process of extracting valuable information from text-based content - or unstructured content - for business purposes. It seems simple, but it’s far from it.
These days, there’s pretty much a trade show for everything — and predictive analytics is no exception. The most recent stateside edition of Predictive Analytics World™ took place in Chicago in June of this year. And predictive maintenance occupied a prominent place on the agenda. One presentation concerned failure and fault detection for Rolls Royce.
There was a time when predictive maintenance encompassed only the analysis of structured data. In 2013, a blogger on the data science website Simafore suggested four ways predictive analytics could improve equipment maintenance: trend analysis, pattern recognition, critical range and limits, and statistical process analysis. Conspicuously absent in the discussion of the tools, techniques, and data types that supported predictive analytics was unstructured information and text analytics.
A recent study by IDC, Data Age 2025: The Evolution of Data to Life-Critical, projects that the amount of data subject to analysis will grow by a factor of 50 between now and 2025. Further, the amount of analyzed data affected by cognitive systems will grow by a factor of 100 to 1.4ZB (zettabytes).
Google has announced that it’s sunsetting the Google Search Appliance. Are you using it? Microsoft is sunsetting FAST. Are you using it?
Here’s the thing. The search market is changing, and the vendors know it. That’s why you are seeing major changes in the traditional search market. More importantly, though, you are noticing that traditional search simply isn’t giving you the information you need.
Traditional search was good, it indexed content and allowed you to perform quick searches. But it’s not that straightforward today. The amount of information you collect and create in your company is growing. It’s stored in multiple repositories and business systems. Some of it is secure and can only be seen by certain people.
“Knowledge workers are workers whose main capital is knowledge.” (Wikipedia)
It seems like a simple definition of a knowledge worker, someone who works with information (knowledge) as a primary part of their job. But what many are only starting to realize is that there are far more “knowledge workers” in their companies than they ever realized.
Almost every employee needs information to do their job effectively. There are few exceptions. There’s the airline pilot who needs to know flight plans, and runway map; the caseworker who works with disadvantaged youth and is looking for support groups and programs they can attend; the line worker in a car factory that’s trying to improve a key process; and customer service agent who helps customers figure out how to use their products.
If you Google “OEM: build versus buy,” you’ll get nearly a million results. It almost doesn’t matter what area of technology you pick, at some point there’s going to be a build versus buy bridge to cross.
For example, let’s take embedded analytics. That’s a hot topic right now among bloggers and industry analysts. ISVs that build line-of-business applications see embedded analytics as a way to differentiate their products. And, since they’re software companies, they could develop and embed the analytics themselves, right? Yes, they could.
But should they? Probably not. There are a lot reasons, not the least of which is time to market. While ISV “A” is developing and embedding analytics in its app, competitors “B” through whatever have already embedded someone else’s best-of-breed analytics. First mover advantage is not to be underestimated.
One of the smartest ways to grow your business is to acquire companies with complementary (and sometimes competing) products and services. You get a ready-made customer base and established products that fit nicely into your long-term business strategy.
With Acquisitions Come Many Challenges
One of these is the need to integrate multiple disparate IT systems from the acquired company. In some cases, you may be able to move everyone into a single system. But often, you will need to keep all the different systems up and running for a long period, if not permanently. This is particularly true with companies in the life sciences industry where the products are very complex and specific to business problems.
Attivio 5.5—the latest release of our market leading Cognitive Search and Insight Platform—has a lot going for it. If you just want a quick summary of all its new features, the launch press release is a good place to start.
But I’m going to focus on one—the platform’s use of machine learning to improve relevancy. After all, relevancy is the heart of cognitive search.