One mistake that I’ve seen developers often make when deploying and optimizing SharePoint Search is to try to get it to do what it is simply not architected to do. One example that I seen recently was to extend SharePoint 2010′s faceted search to enable deep refiners, a feature seen in FAST Search for SharePoint. In this post I’ll explain why SharePoint can’t deliver this functionality based on the architecture of the property store.
MOSS 2007 Search is designed similarly to most Full Text Search engines. It leverages an Inverted Index to store the full text of documents, and has a separate Property Store that the search engine uses to query and present meta-data describing the documents. The inverted index enables the search engine to efficiently find documents containing the keyword in the query and rank them. The speed that the search engine can generate the results is based on a number of factors, but the number of hits the user is requesting on the page can have a significant impact. Most organizations present ten results at a time.
SharePoint 2010, like MOSS 2007, leverages an inverted index and a property store. The drawback to this architecture is the Faceted Search capabilities within SharePoint 2010 will be unable to provide exact hit counts. The reason for this is that the Property Store in SharePoint Search is designed for the quick lookup of meta-data for the top ten documents being returned in the search result. Exact counts for the facet values would require the search engine query for the top ten hits requested by the user, then query the property store for all of the relevant meta-data and aggregate it by value. This is significantly more demanding than querying for a simple search result. Search systems such as FAST ESP are able to provide exact counts because the property store and inverted index are combined into a high-performance OLAP cube. Facet values are pre-aggregated at index time, prior to the user ever running a query. Without this architecture SharePoint 2010 cannot produce exact counts unless the number of documents is small, or for larger document sets, performance will decrease significantly on the server.
There have been prior attempts to provide exact counts on top of SharePoint, for example from the Open Source community on CodePlex, but reports have indicated performance levels dropped unacceptably. Some Administrators have reported query times increasing from sub-second to 10-14 seconds.
There have been several studies that demonstrate users expect a sub-second search result, with longer times resulting in poor user adoption of the search system. In 2005 for example, prior to Web search being standardized as many as 30 hits would be shown on a page (Reiterer et al. 2005). A Google VP reported that despite the fact that users said they wanted more hits per page; an experiment that included 30 hits per page resulted in a 20% reduction in site traffic (Linden 2006). The reason turned out to be that a page with 10 hits took 0.4 seconds to generate while a page with 30 took 0.9 on average. Linden (2006) found similar user sensitivity to half second delays at Amazon.com.
While Refiners with Exact Counts is a desirable feature, the consequences of deploying such a feature could significantly reduce user adoption and the ROI of a search initiative.
Trackback from your site.