Unified Information Access Blog

Welcome to Attivio's Unified Information Access Blog. Join us for discussions on topics ranging from enterprise search solutions, information access insights, Agile software development methodology to programming with Java. We hope you'll find the articles informative and participate in the discussions by leaving a comment.

Share


The results of a recent survey on the "findability" of information within the enterprise are not encouraging. Roughly half of the responding knowledge workers stated that finding important information was "difficult and time consuming" and that the internal search capabilities provided by their company were "worse" to "much worse" than the equivalent functionality offered to end consumers.

Neither of these is truly surprising. Corporate internet sites tend to be directly involved in important and easily measured activities: selling products to new customers, i.e. generating revenues, or servicing existing customers, i.e. reducing cost. Consequently internet search is usually well funded and staffed and tends to be more successful. Internal search, in contrast, is concerned with productivity - a fuzzier concept that is much harder to measure. Significant investment, and thus success, is therefore harder to achieve.

What is surprising: roughly half of survey respondents stated that their company had "no formal goal" for internal findability. In my view this is a direct cause of the overall poor results. Companies that don't measure search won't be able to invest appropriately, let alone tune and improve a complicated system that they likely don't have deep internal experience with.

If your organization doesn't measure search, or, more precisely, findability, you should start right away. No massive investment is required up-front. One person working a few hours a week can make a difference.

For starters, find out if your organization is saving query logs. Hopefully they are; if not, this is the first challenge to overcome. Talk to your system administration team and see if they can help. You don't need to save all logs for all time; just try to get your hands on a day or two of data. That's quite enough to get started.

Assuming you have some data to look at ... identify a handful (less than 50) of interesting queries. Ideally you want them to fall into few different categories: one word, two word, multiple words, questions (if any), a few different business domains, etc. Run the queries yourself and see what comes up. Look at the first few results and score it on a simple scale, e.g. 1 = incoherent, 5 = perfect. If you rate a result poorly, spend a few minutes trying to find out what the better answer might be, and then see if you can infer what’s wrong.  (For example: does the document which answers your question, but which doesn’t appear in the results list, contain the terms you put in your query?  If it does, you have a relevancy problem; if not, you have some sort of linguistic problem.)

Next, write down some typical internal questions like "what is the company holiday schedule?" or "which business unit is responsible for product X?". Then see if you can identify a document that best answers these questions. (It may help to pick a domain you are knowledgeable about, at least initially). Finally, see if you can find the document by querying. It may take you several tries; keep track of how many times you have to revise the query to get the document, and score your results similar to the above, e.g. 1 = impossible to find, 5 = easy to find.

Now compile the results and take a look. Are there any trends in the ratings? The odds are that you will observe one of the following:

• Results just incoherent

This typically indicates that relevancy needs tuning, or is otherwise not configured correctly.  For example you may see articles that have nothing to do with your query terms; or you might see new articles, but not relevant ones; or relevant ones, but not the latest information.

• Too many results

• One source of data dominates the results

These indicate that the search solution needs to expose facets (or ‘dimensions’) that users can use to slice into the result set.  It may also need to add entity or concept extraction capabilities.

• Misspelling or non-recognition of company terminology, jargon, acronyms, etc

This issue indicates that query and/or content processing, especially linguistic processing such as tokenization, spelling are not configured correctly, or that some work on acronym and synonym handling is required.

One of the most likely outcomes, regardless of the overall health of your internal search solution, will be:

• For many queries there is simply no appropriate content to find

One-third of survey respondents noted this, claiming that less than half the information needed is searchable. Most organizations limit internal search to text - office documents, spreadsheets, PDFs, brochures, and of course web pages, both inter- and intra-net. This unfortunately ignores three of the most important corporate silos: email/messaging; custom or departmental applications, i.e. databases; and complex enterprise applications built on-top of databases, like BI, ERP and CRM systems. Not surprisingly these are the most challenging silos to work with, let alone link and correlate with the other, fuzzier, unstructured data. Legacy enterprise search engines may simply not be up to it. One low-cost, quick & easy fix is to federate user queries against these sources and present the results side-by-side. Even if this doesn't represent a perfect solution, it will at least show users that improvement is possible!

Getting back to the analysis: once you have scored a bunch of queries, looked into the bad ones, and tried to understand how hard (or easy) it is to find answers to particular questions ... repeat the process a week later, and a week later still.  Now, armed with a months’ worth of analysis, you should be ready to take the next step and build an ROI case arguing for greater investment with respect to the actual information access challenges faced by your organization.

What if your company already measures search? Ask a deeper question: are you measuring the search engine, or findability? Query logs can only tell you what people are searching for; you can't necessarily infer what they couldn’t find.  If you conclude that you may be looking more at the search engine then the user, one concrete step you can take is to interview the top users – they can often tell you what the search solution does well, and where improvements are needed.

In my next post I'll go deeper into the survey and look at the features that respondents felt were valuable to findability and in various contexts. In the meantime if you would like to learn more about building an ROI case for your company's internal search or information access systems, please This e-mail address is being protected from spambots. You need JavaScript enabled to view it .

Trackback(0)
Comments (0)add comment

Write comment
smaller | bigger

security image
Write the displayed characters


busy

Attivio on LinkedIn

 

blue-rss-icon.png

Enter your email address:

 

Articles by Date

Recent Posts

Thinking Like a Tester

As a member of what was back then, just a three-person QA team, my heart sank when I read the title of one of our early...
Read More...

What AIE and unified information access mean for developers

There has been a lot of press recently on unified information access and how it enables business users and IT staff to reduce the time it takes to provide...
Read More...

The (Real) Semantic Web Requires Machine Learning

The (Real) Semantic Web Requires Machine Learning
We think about the semantic web in two complementary (and equivalent) ways. It can be viewed as: • A large set of subject-verb-object triples, where...
Read More...

More on Triples and Graphs

More on Triples and Graphs
One of the follow-up questions I've received regarding the post on Triples...
Read More...
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8