Datawocky: How Google Measures Search Quality

This link has been bookmarked by 12 people . It was first bookmarked on 11 Jun 2008, by Roger Chen.

14 Nov 13

tiger dragon
30 May 10

dolors reig
This post continues my prior post Are Machine-Learned Models Prone to Catastrophic Errors. You can think of these as a two-post series based on my conversation with Peter Norvig. As that post describes, Google has not cut over to the...

twine
09 Jul 08

Gosia Stergios
search Google
07 Jul 08

Andy Breeding
google search measurement analytics searchanalytics searchengine for:klsscan
19 Jun 08

Takuya Homma
it's difficult to set a model..

Google Search 8　△
- The heart of the matter is this: how do you measure the quality of search results? One of the essential requirements to train any machine learning model is a a set of observations (in this case, queries and results) that are tagged with "scores" that measure the goodness of the results. (Technically this requirement applies only to so-called "supervised learning" approaches, but those are the ones we are discussing here.) Where to get this data?
- For example, how does a new ranking model affect the fraction of users who click on the first result? The second? How many users click to page 2 of results? Once a user clicks out to result page, how long before they click the back button to come back to the search results page?
- It came as a great surprise to me that Google relies on a small panel of raters rather than harness their massive usage data. But in retrospect, perhaps it is not so surprising. Two forces appear to be at work. The first is that we have all been trained to trust Google and click on the first result no matter what. So ranking models that make slight changes in ranking may not produce significant swings in the measured usage data. The second, more interesting, factor is that users don't know what they're missing.
- Navigational queries, where the user is looking for a specific uber-authoritative website. e.g., "stanford university". In such cases, the user can very quickly tell the best result from the others -- and it's usually the first result on major search engines.
- Informational queries, where the user has a broader topic. e.g., "diabetes pregnancy". In this case, there is no single right answer. Suppose there's a really fantastic result on page 4, that provides better information any of the results on the first three pages. Most users will not even know this result exists! Therefore, their usage behavior does not actually provide the best feedback on the rankings.
- I'll close with an interesting vignette. A couple of years ago, Yahoo was making great strides in search relevance, while Google apparently was not improving as fast. Recall then that Yahoo trumpeted data showing their results were better than Google's. Well, the Google team was quite amazed, because their data showed just the opposite: their results were better than Yahoo's. They couldn't both be right -- or could they? It turns out that Yahoo's benchmark contained queries drawn from Yahoo search logs, and Google's benchmark likewise contained queries drawn from Google search logs. The Yahoo ranking algorithm performed better on the Yahoo benchmark and the Google algorithm performed better on the Google benchmark.
4 more annotations...
16 Jun 08

africangirl
google delicious
12 Jun 08

Chris Douglas
algorithms blogs

Datawocky: How Google Measures Search Quality - The Diigo Meta page

Would you like to comment?

Top Tags

Check out another URL