Connect with Us
Blog: Do More With Search

Understanding the Difference Between Precision and Recall?

Written by Martin Muldoon on . Posted in BA Insight News, FAST Search, SharePoint 2010, Understanding Search

Information retrieval experts define relevance as having two components; Precision and Recall In an Enterprise setting it is difficult to achieve acceptable measures in either. Here’s why:

Precision

When a user runs a two or three word query, there will be many documents that contain those terms. Naturally only a few of those documents will be relevant to you. This results in a very lengthy list of hits that the user must troll through. Precision is the measure of how many of these hits are actually relevant. Typically most people use the first page in the search result to measure this since most people wont’ even bother to go past the first page move on to page.

So if on the first page, you have 10 hits and 2 of these are actually relevant, you have a precision of 20%. Obviously the goal is to get this number as high as possible. So what’s the problem? The problem is that the amount of data being indexed by the search engine is huge and getting larger with each passing day. The more data that’s indexed, the more hits you’ll have, and the less likely you’ll find what you are looking for on page one.

Knowing this most companies resort to tagging content prior to it being indexed. The additional metadata provides the search engine with additional information so it’s smarter, and this also provides users with a means to filter content pre and post query. The downside of this is it requires a significant investment in time and money. BA Insight approaches this via a patented technology we call AptivRank.

Recall

Recall is a really simple concept. How much of the content that is in your domain of interest is actually indexed and searchable? If there is content in a system that is useful to you and it’s not indexed, you’ll never get a relevant search result.

So why not index everything? On the Internet it’s because content is published so quickly the search engines can’t even keep up with it. In the Enterprise though it’s largely due to security. Different line of business systems have different security models. Microsoft Search technologies, whether you are using FAST or SharePoint, only recognize Active Directory users and groups. Non Microsoft systems use proprietary security models. The answer to the problem is obvious conceptually but challenging to implement. One must create a map between the Active Directory users and groups, to those in the non AD based system. In terms of measuring relevance, the calculation that combines Precision and Recall can be quite complex. This article on Wikipedia provides a good overview, but will leave you feeling you should paid more attention in match class.

The truth is, you don’t have to calculate relevance to determine how SharePoint or FAST search implementation is performing. You can look at a much more telling KPI. Are users actually finding what they are looking for? BA Insight provides users with a search interface that gives them the ability to act on relevant content when it’s found. This information collected for reporting purposes and can provide insight into how well search is serving their needs.

 

Tags:

Trackback from your site.

Martin Muldoon

Martin is a co-founder of BA Insight. He has extensive experience in the Enterprise Search space with a particular focus on Microsoft Search technologies. Martin is a recipient of Suez Innovation Award for the search based application he developed for the Environmental division. Martin is a frequent speaker in the community and shares his extensive expertise in enterprise search technologies at various key conferences including TechEd, KM World, AIIM and others.

Leave a comment