article

Q²C@UST: our winning solution to query classification in KDDCUP 2005

Authors:
Dou Shen

Hong Kong University of Science and Technology, Kowloon, Hong Kong, China

Hong Kong University of Science and Technology, Kowloon, Hong Kong, China
View Profile

,
Rong Pan

Hong Kong University of Science and Technology, Kowloon, Hong Kong, China

Hong Kong University of Science and Technology, Kowloon, Hong Kong, China
View Profile

,
Jian-Tao Sun

Hong Kong University of Science and Technology, Kowloon, Hong Kong, China

Hong Kong University of Science and Technology, Kowloon, Hong Kong, China
View Profile

,
Jeffrey Junfeng Pan

Hong Kong University of Science and Technology, Kowloon, Hong Kong, China

Hong Kong University of Science and Technology, Kowloon, Hong Kong, China
View Profile

,
Kangheng Wu

Hong Kong University of Science and Technology, Kowloon, Hong Kong, China

Hong Kong University of Science and Technology, Kowloon, Hong Kong, China
View Profile

,
Jie Yin

Hong Kong University of Science and Technology, Kowloon, Hong Kong, China

Hong Kong University of Science and Technology, Kowloon, Hong Kong, China
View Profile

,
Qiang Yang

Hong Kong University of Science and Technology, Kowloon, Hong Kong, China

Hong Kong University of Science and Technology, Kowloon, Hong Kong, China
View Profile

Authors Info & Claims

ACM SIGKDD Explorations Newsletter Volume 7 Issue 2December 2005pp 100–110https://doi.org/10.1145/1117454.1117467

Published:01 December 2005Publication History

ACM SIGKDD Explorations Newsletter

Abstract

In this paper, we describe our ensemble-search based approach, Q²C@UST (http://webprojectl.cs.ust.hk/q2c/), for the query classification task for the KDDCUP 2005. There are two aspects to the key difficulties of this problem: one is that the meaning of the queries and the semantics of the predefined categories are hard to determine. The other is that there are no training data for this classification problem. We apply a two-phase framework to tackle the above difficulties. Phase I corresponds to the training phase of machine learning research and phase II corresponds to testing phase. In phase I, two kinds of classifiers are developed as the base classifiers. One is synonym-based and the other is statistics based. Phase II consists of two stages. In the first stage, the queries are enriched such that for each query, its related Web pages together with their category information are collected through the use of search engines. In the second stage, the enriched queries are classified through the base classifiers trained in phase I. Based on the classification results obtained by the base classifiers, two ensemble classifiers based on two different strategies are proposed. The experimental results on the validation dataset help confirm our conjectures on the performance of the Q2C@UST system. In addition, the evaluation results given by the KDDCUP 2005 organizer confirm the effectiveness of our proposed approaches. The best F1 value of our two solutions is 9.6% higher than the best of all other participants' solutions. The average F1 value of our two submitted solutions is 94.4% higher than the average F1 value from all other submitted solutions.

References

E. Bauer, R. Kohavi. An empirical comparison of voting classification algorithms: Bagging, boosting and variants. Machine Learning, 36:1/2, 105--142. 1999.]] Google ScholarDigital Library
D. Beeferman and A. Berger. Agglomerative clustering of a search engine query log. In Proceedings of the sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 407--415, 2000.]] Google ScholarDigital Library
L. Breiman. Bagging predictors. Machine Learning, 24:2, 123--140. 1996.]] Google ScholarCross Ref
R. Caruana and A. Niculescu-Mizil. Ensemble selection from libraries of models. In Proc. 21th International Conference on Machine Learning (ICML'04), 2004.]] Google ScholarDigital Library
C. Chekuri, M. Goldwasser, P. Raghavan and E. Upfal. Web Search Using Automated Classification. Poster at the Sixth International World Wide Web Conference (WWW6), 1997.]]Google Scholar
H. Chen, S. Dumais. Bringing order to the Web: Automatically categorizing search results. In Proceedings of the ACM SIGCHI Conference on Human Factors in Computing Systems (CHI), pages 145--152, The Hague, The Netherlands, April 2000.]] Google ScholarDigital Library
T. G. Dietterich. Ensemble methods in machine learning. First International Workshop on Multiple Classifier Systems, pages 1--15, 2000.]] Google ScholarDigital Library
W. Fan, S. Stolfo, J. Zhang. The application of AdaBoost for distributed, scalable and on-line learning. In Proceedings of the Fifth SIGKDD International Conference on Knowledge Discovery and Data Mining, 362--366. 1999.]] Google ScholarDigital Library
Y. Freund, R. E. Schapire. Experiments with a new boosting algorithm. In Proceedings of the Thirteenth International Conference on Machine Learning, 148--156. 1996.]]Google Scholar
Google, http://www.google.com]]Google Scholar
P. G. Hoel, Elementary Statistics, Wiley, 1971.]]Google Scholar
T. Joachims. Transductive inference for text classification using support vector machines. In Proc. 16th International Conference on Machine Learning (ICML), Bled, Slovenia, June 1999.]] Google ScholarDigital Library
T. Joachims (1998): Text Categorization with Support Vector Machines: Learning with Many Relevant Features. European Conference on Machine Learning (ECML), Claire Nédellec and Céline Rouveirol (ed.), 1998.]] Google ScholarDigital Library
K. S. Jones. Automatic Keyword Classification for Information Retrieval. Butterworths, London, 1971.]]Google Scholar
I. H. Kang, G. Kim, Query type classification for web document retrieval. In Proceedings of the 26rd annual international ACM SIGIR Conference on Research and Development in Information Retrieval. Toronto, Canada, 2003, 64--71.]] Google ScholarDigital Library
J. Kittler, M. Hatef, R. P. W. Duin, and J. Matas. On Combining Classifiers. IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 20, No. 3, 1998, pp. 226--239.]] Google ScholarDigital Library
Lemur, http://www.lemurproject.org/]]Google Scholar
D. D. Lewis, W. A. Gale. A sequential algorithm for training text classifiers. In W. Bruce Croft and Cornelis J. van Rijsbergen, editors, Proceedings of SIGIR-94, 17th ACM International Conference on Research and Development in Information Retrieval, pages 3--12, Dublin, IE, 1994. Springer Verlag, Heidelberg, DE.]] Google ScholarDigital Library
Y. Li, Z. J. Zheng, K. Dai. KDD-CUP 2005. Presentation on The Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Chicago, USA. August 21, 2005. http://kdd05.lac.uic.edu/kddcup.html.]]Google Scholar
Looksmart, http://www.looksmart.com.]]Google Scholar
ODP: Open Directory Project, http://dmoz.com]]Google Scholar
L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank citation ranking: Bringing order to the web. Technical report, Stanford Digital Library Technologies Project, Stanford University, Stanford, CA, USA, 1998.]]Google Scholar
J. R. Quinlan. Bagging, boosting and C4.5. In proceedings of the Thirteenth National Conference on Artificial Intelligence, 725--730. 1996.]]Google Scholar
C. J. van Rijsbergen. Information Retrieval. Second Edition, Butterworths, London, 1979, 173--176.]] Google ScholarDigital Library
Wordnet, http://wordnet.princeton.edu/]]Google Scholar

Index Terms

Q²C@UST: our winning solution to query classification in KDDCUP 2005
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by classification
    2. Machine learning approaches
      1. Classification and regression trees
2. Information systems
  1. Information retrieval
  2. Information systems applications
    1. Data mining
      1. Clustering

Recommendations

Building bridges for web query classification
SIGIR '06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval

Web query classification (QC) aims to classify Web users' queries, which are often short and ambiguous, into a set of target categories. QC has many applications including page ranking in Web search, targeted advertisement in response to queries, and ...
Read More
Query enrichment for web-query classification

Web-search queries are typically short and ambiguous. To classify these queries into certain target categories is a difficult but important problem. In this article, we present a new technique called query enrichment, which takes a short query and maps ...
Read More
Building Locally Discriminative Classifier Ensemble Through Classifier Fusion Among Nearest Neighbors
PCM 2016: 17th Pacific-Rim Conference on Advances in Multimedia Information Processing - Volume 9916

Many studies on ensemble learning that combines multiple classifiers have shown that, it is an effective technique to improve accuracy and stability of a single classifier. In this paper, we propose a novel discriminative classifier fusion method, which ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM SIGKDD Explorations Newsletter Volume 7, Issue 2
December 2005
152 pages
ISSN:1931-0145
EISSN:1931-0153
DOI:10.1145/1117454
Issue’s Table of Contents

Copyright © 2005 Authors
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 December 2005
Check for updates
Author Tags
KDDCUP 2005
ensemble learning
query classification
synonym-based classifier
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 99
  Total Citations
  View Citations
- 661
  Total Downloads
- Downloads (Last 12 months)8
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Q²C@UST: our winning solution to query classification in KDDCUP 2005

ACM SIGKDD Explorations Newsletter

Abstract

References

Cited By

Index Terms

Recommendations

Building bridges for web query classification

Query enrichment for web-query classification

Building Locally Discriminative Classifier Ensemble Through Classifier Fusion Among Nearest Neighbors

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Q2C@UST: our winning solution to query classification in KDDCUP 2005

ACM SIGKDD Explorations Newsletter

Abstract

References

Cited By

Index Terms

Recommendations

Building bridges for web query classification

Query enrichment for web-query classification

Building Locally Discriminative Classifier Ensemble Through Classifier Fusion Among Nearest Neighbors

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media

Q²C@UST: our winning solution to query classification in KDDCUP 2005