Word sense disambiguation (WSD), is an open problem of information retrieval domain. WSD leads to retrieval of multiple meanings for a single searched keyword. The efficiency of search results for an automated search machine depends on its efficiency to handle WSD to a large extent. This efficiency depends on the ability to precisely recall the required results against a user query. The weighted harmonic mean of precision and recall, the F- measure, also known as the F1-score is a scale of testing accuracy for an input dataset. This paper evaluates the WSD handling capacity of three major search engines, Google, Bing and Yahoo based on their F1 scores. The F1 scores for each search engine are based on their precision and query classification performances [1]. Ten queries are constructed from single keywords that produce different ambiguous meanings for different contexts. With varying index
sizes, number of results per page and different search ranking strategies, these search engines responded differently for our set of single keyword queries.
Finally, a comparison of the relevance of search results of Google, Bing and Yahoo for different ambiguous sensed results of single keyword queries is also discussed.