Web query classification

A Web query topic classification/categorization is a problem in information science. The task is to assign a Web search query to one or more predefined categories, based on its topics. The importance of query classification is underscored by many services provided by Web search. A direct application is to provide better search result pages for users with interests of different categories. For example, the users issuing a Web query “apple” might expect to see Web pages related to the fruit apple, or they may prefer to see products or news related to the computer company. Online advertisement services can rely on the query classification results to promote different products more accurately. Search result pages can be grouped according to the categories predicted by a query classification algorithm. However, the computation of query classification is non-trivial. Different from the document classification tasks, queries submitted by Web search users are usually short and ambiguous; also the meanings of the queries are evolving over time. Therefore, query topic classification is much more difficult than traditional document classification tasks. A Web query topic classification/categorization is a problem in information science. The task is to assign a Web search query to one or more predefined categories, based on its topics. The importance of query classification is underscored by many services provided by Web search. A direct application is to provide better search result pages for users with interests of different categories. For example, the users issuing a Web query “apple” might expect to see Web pages related to the fruit apple, or they may prefer to see products or news related to the computer company. Online advertisement services can rely on the query classification results to promote different products more accurately. Search result pages can be grouped according to the categories predicted by a query classification algorithm. However, the computation of query classification is non-trivial. Different from the document classification tasks, queries submitted by Web search users are usually short and ambiguous; also the meanings of the queries are evolving over time. Therefore, query topic classification is much more difficult than traditional document classification tasks. KDDCUP 2005 competition highlighted the interests in query classification. The objective of this competition is to classify 800,000 real user queries into 67 target categories. Each query can belong to more than one target category. As an example of a QC task, given the query “apple”, it should be classified into ranked categories: “Computers Hardware; Living Food & Cooking”. Web query topic classification is to automatically assign a query to some predefined categories. Different from the traditional document classification tasks, there are several major difficulties which hinder the progress of Web query understanding: Many queries are short and query terms are noisy. As an example, in the KDDCUP 2005 dataset, queries containing 3 words are most frequent (22%). Furthermore, 79% queries have no more than 4 words. A user query often has multiple meanings. For example, 'apple' can mean a kind of fruit or a computer company. 'Java' can mean a programming language or an island in Indonesia. In the KDDCUP 2005 dataset, most of the queries contain more than one meaning. Therefore, only using the keywords of the query to set up a vector space model for classification is not appropriate. The meanings of queries may also evolve over time. Therefore, the old labeled training queries may be out-of-data and useless soon. How to make the classifier adaptive over time becomes a big issue. For example, the word 'Barcelona' has a new meaning of the new micro-processor of AMD, while it refers to a city or football club before 2007. The distribution of the meanings of this term is therefore a function of time on the Web.

Parent Topic

Child Topic

No Parent Topic