Abstract
In this paper, we describe our ensemble-search based approach, Q2C@UST (http://webprojectl.cs.ust.hk/q2c/), for the query classification task for the KDDCUP 2005. There are two aspects to the key difficulties of this problem: one is that the meaning of the queries and the semantics of the predefined categories are hard to determine. The other is that there are no training data for this classification problem. We apply a two-phase framework to tackle the above difficulties. Phase I corresponds to the training phase of machine learning research and phase II corresponds to testing phase. In phase I, two kinds of classifiers are developed as the base classifiers. One is synonym-based and the other is statistics based. Phase II consists of two stages. In the first stage, the queries are enriched such that for each query, its related Web pages together with their category information are collected through the use of search engines. In the second stage, the enriched queries are classified through the base classifiers trained in phase I. Based on the classification results obtained by the base classifiers, two ensemble classifiers based on two different strategies are proposed. The experimental results on the validation dataset help confirm our conjectures on the performance of the Q2C@UST system. In addition, the evaluation results given by the KDDCUP 2005 organizer confirm the effectiveness of our proposed approaches. The best F1 value of our two solutions is 9.6% higher than the best of all other participants' solutions. The average F1 value of our two submitted solutions is 94.4% higher than the average F1 value from all other submitted solutions.
- E. Bauer, R. Kohavi. An empirical comparison of voting classification algorithms: Bagging, boosting and variants. Machine Learning, 36:1/2, 105--142. 1999.]] Google ScholarDigital Library
- D. Beeferman and A. Berger. Agglomerative clustering of a search engine query log. In Proceedings of the sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 407--415, 2000.]] Google ScholarDigital Library
- L. Breiman. Bagging predictors. Machine Learning, 24:2, 123--140. 1996.]] Google ScholarCross Ref
- R. Caruana and A. Niculescu-Mizil. Ensemble selection from libraries of models. In Proc. 21th International Conference on Machine Learning (ICML'04), 2004.]] Google ScholarDigital Library
- C. Chekuri, M. Goldwasser, P. Raghavan and E. Upfal. Web Search Using Automated Classification. Poster at the Sixth International World Wide Web Conference (WWW6), 1997.]]Google Scholar
- H. Chen, S. Dumais. Bringing order to the Web: Automatically categorizing search results. In Proceedings of the ACM SIGCHI Conference on Human Factors in Computing Systems (CHI), pages 145--152, The Hague, The Netherlands, April 2000.]] Google ScholarDigital Library
- T. G. Dietterich. Ensemble methods in machine learning. First International Workshop on Multiple Classifier Systems, pages 1--15, 2000.]] Google ScholarDigital Library
- W. Fan, S. Stolfo, J. Zhang. The application of AdaBoost for distributed, scalable and on-line learning. In Proceedings of the Fifth SIGKDD International Conference on Knowledge Discovery and Data Mining, 362--366. 1999.]] Google ScholarDigital Library
- Y. Freund, R. E. Schapire. Experiments with a new boosting algorithm. In Proceedings of the Thirteenth International Conference on Machine Learning, 148--156. 1996.]]Google Scholar
- Google, http://www.google.com]]Google Scholar
- P. G. Hoel, Elementary Statistics, Wiley, 1971.]]Google Scholar
- T. Joachims. Transductive inference for text classification using support vector machines. In Proc. 16th International Conference on Machine Learning (ICML), Bled, Slovenia, June 1999.]] Google ScholarDigital Library
- T. Joachims (1998): Text Categorization with Support Vector Machines: Learning with Many Relevant Features. European Conference on Machine Learning (ECML), Claire Nédellec and Céline Rouveirol (ed.), 1998.]] Google ScholarDigital Library
- K. S. Jones. Automatic Keyword Classification for Information Retrieval. Butterworths, London, 1971.]]Google Scholar
- I. H. Kang, G. Kim, Query type classification for web document retrieval. In Proceedings of the 26rd annual international ACM SIGIR Conference on Research and Development in Information Retrieval. Toronto, Canada, 2003, 64--71.]] Google ScholarDigital Library
- J. Kittler, M. Hatef, R. P. W. Duin, and J. Matas. On Combining Classifiers. IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 20, No. 3, 1998, pp. 226--239.]] Google ScholarDigital Library
- Lemur, http://www.lemurproject.org/]]Google Scholar
- D. D. Lewis, W. A. Gale. A sequential algorithm for training text classifiers. In W. Bruce Croft and Cornelis J. van Rijsbergen, editors, Proceedings of SIGIR-94, 17th ACM International Conference on Research and Development in Information Retrieval, pages 3--12, Dublin, IE, 1994. Springer Verlag, Heidelberg, DE.]] Google ScholarDigital Library
- Y. Li, Z. J. Zheng, K. Dai. KDD-CUP 2005. Presentation on The Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Chicago, USA. August 21, 2005. http://kdd05.lac.uic.edu/kddcup.html.]]Google Scholar
- Looksmart, http://www.looksmart.com.]]Google Scholar
- ODP: Open Directory Project, http://dmoz.com]]Google Scholar
- L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank citation ranking: Bringing order to the web. Technical report, Stanford Digital Library Technologies Project, Stanford University, Stanford, CA, USA, 1998.]]Google Scholar
- J. R. Quinlan. Bagging, boosting and C4.5. In proceedings of the Thirteenth National Conference on Artificial Intelligence, 725--730. 1996.]]Google Scholar
- C. J. van Rijsbergen. Information Retrieval. Second Edition, Butterworths, London, 1979, 173--176.]] Google ScholarDigital Library
- Wordnet, http://wordnet.princeton.edu/]]Google Scholar
Index Terms
- Q2C@UST: our winning solution to query classification in KDDCUP 2005
Recommendations
Building bridges for web query classification
SIGIR '06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrievalWeb query classification (QC) aims to classify Web users' queries, which are often short and ambiguous, into a set of target categories. QC has many applications including page ranking in Web search, targeted advertisement in response to queries, and ...
Query enrichment for web-query classification
Web-search queries are typically short and ambiguous. To classify these queries into certain target categories is a difficult but important problem. In this article, we present a new technique called query enrichment, which takes a short query and maps ...
Building Locally Discriminative Classifier Ensemble Through Classifier Fusion Among Nearest Neighbors
PCM 2016: 17th Pacific-Rim Conference on Advances in Multimedia Information Processing - Volume 9916Many studies on ensemble learning that combines multiple classifiers have shown that, it is an effective technique to improve accuracy and stability of a single classifier. In this paper, we propose a novel discriminative classifier fusion method, which ...
Comments