As data doubles in the Enterprise every 12 to 18 months, the problem is only getting worse. Consider a query against 10,000 documents and 5% of those are hits returned in the search result. That’s 500 hits or 50 pages of results for your average 10 hit per page result. What is the likelihood that a relevant hit is found on the first page?
What’s interesting is that Google had this same challenge back in the late 90′s when the amount of content on the internet began to explode. Relevance started to drop like a stone. Google claim to fame was to take a somewhat obvious (in hindsight) approach to improving things. On the Internet users have a tendency to link to quality content. People are more than happy to subscribe to a useful blog, for example. The fact that a user took time out of his or her busy day to do so must mean they value that content. Google began to boost the relevance of this content, as if overnight people began to find what they were looking for.So why not apply this approach to Enterprise Search? The answer is that Enterprise content is different in structure and in how it’s treated by users. Google’s approach doesn’t work well in the majority of organizations, because their content is not web based. Office documents, PDF’s, and the like are not linked. When a valuable document is placed in a file system, or even in SharePoint it’s not rated, reviewed, or linked to (Although SharePoint 2010 has just implemented a rating/review capability). A second difference between web and enterprise is that users on the web want content to be found. They go through great lengths to make it as easy as possible for users to find their content. This is done by making the content search engine friendly. In most cases they’ll submit their content to search engines asking for it to be indexed. Not so in the Enterprise where most people are just happy to hit their deadline.
SharePoint Search 2010 includes a new feature called “Social behavior improves relevance”. This feature monitors what links users click on in the search result and boosts the relevance of documents that are frequently clicked. So this feature like Google’s on the Internet attempts to boost relevance by having users identify relevant content without being so obtrusive that users aren’t turned off. This feature was first introduced on the Internet and is often referred to as click-through analysis. There have been many studies that indicate that click-through can indeed improve relevance. The question is does this naturally translate to the Enterprise and Enterprise content?
On the surface, this appears to be a great feature. In practice, depending on the type of data and metadata one has to work with will determine whether this feature actually helps or hurts relevance. The key to click-through working is relies on the quality of the document surrogates returned in the search result. As search result contains a representation of the documents that were relevant to the search. The surrogate often contains the name of the file, an auto generated summary or abstract of each document, and other metadata. If the surrogate is good, the user can quickly surmise if the document is relevant or not. If the surrogate is poor, users disregard the information, click on the file to download it, and then manually search through the file to assess relevance.
So poor document summaries will force users to click on links in a search result from top to bottom. Click-through will reinforce the original rank setting it in stone.
Administrators should do whatever they can to improve the quality of the information returned in the search result so that it is truly representative of the documents being summarized. This will insure that click-through works as it was intended.
Trackback from your site.