research-article

Concept-based Short Text Classification and Ranking

Authors:
Fang Wang

Beihang University, Beijing, China

Beihang University, Beijing, China
View Profile

,
Zhongyuan Wang

Microsoft Research Asia, Beijing, China

Microsoft Research Asia, Beijing, China
View Profile

,
Zhoujun Li

Beihang University, Beijing, China

Beihang University, Beijing, China
View Profile

,
Ji-Rong Wen

Renmin University of China, Beijing, China

Renmin University of China, Beijing, China
View Profile

CIKM '14: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge ManagementNovember 2014Pages 1069–1078https://doi.org/10.1145/2661829.2662067

Published:03 November 2014Publication History

CIKM '14: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management

Pages 1069–1078

ABSTRACT

Most existing approaches for text classification represent texts as vectors of words, namely ``Bag-of-Words.'' This text representation results in a very high dimensionality of feature space and frequently suffers from surface mismatching. Short texts make these issues even more serious, due to their shortness and sparsity. In this paper, we propose using ``Bag-of-Concepts'' in short text representation, aiming to avoid the surface mismatching and handle the synonym and polysemy problem. Based on ``Bag-of-Concepts,'' a novel framework is proposed for lightweight short text classification applications. By leveraging a large taxonomy knowledgebase, it learns a concept model for each category, and conceptualizes a short text to a set of relevant concepts. A concept-based similarity mechanism is presented to classify the given short text to the most similar category. One advantage of this mechanism is that it facilitates short text ranking after classification, which is needed in many applications, such as query or ad recommendation. We demonstrate the usage of our proposed framework through a real online application: Channel-based Query Recommendation. Experiments show that our framework can map queries to channels with a high degree of precision (avg. precision=90.3%), which is critical for recommendation applications.

References

C. C. Aggarwal and C. Zhai. Mining text data. Springer, 2012. Google ScholarDigital Library
A. Anagnostopoulos, L. Becchetti, C. Castillo, and A. Gionis. An optimization framework for query recommendation. In WSDM, pages 161--170. ACM, 2010. Google ScholarDigital Library
R. Baeza-Yates, C. Hurtado, and M. Mendoza. Query recommendation using query logs in search engines. In EDBT, pages 588--596. Springer, 2005. Google ScholarDigital Library
S. M. Beitzel, E. C. Jensen, O. Frieder, D. D. Lewis, A. Chowdhury, and A. Kolcz. Improving automatic query classification via semi-supervised learning. In ICDM, 2005. Google ScholarDigital Library
I. Bordino, G. De Francisci Morales, I. Weber, and F. Bonchi. From machu_picchu to rafting the urubamba river: anticipating information needs via the entity-query graph. In WSDM, pages 275--284. ACM, 2013. Google ScholarDigital Library
C.-C. Chang and C.-J. Lin. Libsvm: a library for support vector machines. TIST, 2(3):27, 2011. Google ScholarDigital Library
M. Chen, X. Jin, and D. Shen. Short text classification improved by learning multi-granularity topics. In IJCAI, pages 1776--1781. AAAI Press, 2011. Google ScholarDigital Library
C. Cortes and V. Vapnik. Support-vector networks. Machine learning, 20(3):273--297, 1995. Google ScholarDigital Library
N. Craswell and M. Szummer. Random walks on the click graph. In SIGIR, pages 239--246. ACM, 2007. Google ScholarDigital Library
W. B. Croft, M. Bendersky, H. Li, and G. Xu. Query representation and understanding workshop. In SIGIR Forum, volume 44, pages 48--53, 2010. Google ScholarDigital Library
H. K. Dai, L. Zhao, Z. Nie, J.-R. Wen, L. Wang, and Y. Li. Detecting online commercial intention (oci). In WWW, 2006. Google ScholarDigital Library
V. Dang and W. B. Croft. Diversity by proportionality: an election-based approach to search result diversification. In SIGIR, pages 65--74. ACM, 2012. Google ScholarDigital Library
H. Feild and J. Allan. Task-aware query recommendation. In SIGIR, pages 83--92. ACM, 2013. Google ScholarDigital Library
E. Gabrilovich and S. Markovitch. Overcoming the brittleness bottleneck using wikipedia: Enhancing text categorization with encyclopedic knowledge. In AAAI, 2006. Google ScholarDigital Library
E. Gabrilovich and S. Markovitch. Computing semantic relatedness using wikipedia-based explicit semantic analysis. In IJCAI, volume 7, pages 1606--1611, 2007. Google ScholarDigital Library
J. He, V. Hollink, and A. de Vries. Combining implicit and explicit topic representations for result diversification. In SIGIR, pages 851--860. ACM, 2012. Google ScholarDigital Library
X. Hu, N. Sun, C. Zhang, and T.-S. Chua. Exploiting internal and external semantics for the clustering of short texts using world knowledge. In CIKM, pages 919--928. ACM, 2009. Google ScholarDigital Library
L. Huang. Concept-based text clustering. PhD thesis, The University of Waikato, 2011.Google Scholar
A. Jordan. On discriminative vs. generative classifiers: A comparison of logistic regression and naive bayes. NIPS, 14:841, 2002.Google Scholar
Y.-H. Kim, S.-Y. Hahn, and B.-T. Zhang. Text filtering by boosting naive bayes classifiers. In SIGIR, 2000. Google ScholarDigital Library
T. Lee, Z. Wang, H. Wang, and S.-w. Hwang. Attribute extraction and scoring: A probabilistic approach. In ICDE, pages 194--205. IEEE, 2013. Google ScholarDigital Library
P. Li, H. Wang, K. Q. Zhu, Z. Wang, and X. Wu. Computing term similarity by large probabilistic isa knowledge. In CIKM, pages 1401--1410. ACM, 2013. Google ScholarDigital Library
R. Li, B. Kao, B. Bi, R. Cheng, and E. Lo. Dqr: a probabilistic approach to diversified query recommendation. In CIKM, pages 16--25. ACM, 2012. Google ScholarDigital Library
X. Li, Y.-Y. Wang, and A. Acero. Learning query intent from regularized click graphs. In SIGIR, 2008. Google ScholarDigital Library
Y. Li, D. McLean, Z. A. Bandar, J. D. O'shea, and K. Crockett. Sentence similarity based on semantic nets and corpus statistics. TKDE, 18(8):1138--1150, 2006. Google ScholarDigital Library
H. T. Ng, W. B. Goh, and K. L. Low. Feature selection, perceptron learning, and a usability case study for text categorization. In SIGIR. ACM, 1997. Google ScholarDigital Library
X.-H. Phan, L.-M. Nguyen, and S. Horiguchi. Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In WWW, 2008. Google ScholarDigital Library
J. R. Quinlan. Induction of decision trees. Machine learning, pages 81--106, 1986. Google ScholarDigital Library
M. Sahlgren and R. Cöster. Using bag-of-concepts to improve the performance of support vector machines in text categorization. In COLING, page 487. ACL, 2004. Google ScholarDigital Library
G. Salton, A. Wong, and C.-S. Yang. A vector space model for automatic indexing. Communications of the ACM, 1975. Google ScholarDigital Library
D. Shen, R. Pan, J.-T. Sun, J. J. Pan, K. Wu, J. Yin, and Q. Yang. Q2c@ust: our winning solution to query classification in kddcup 2005. SIGKDD, 7(2):100--110, 2005. Google ScholarDigital Library
D. Shen, R. Pan, J.-T. Sun, J. J. Pan, K. Wu, J. Yin, and Q. Yang. Query enrichment for web-query classification. TOIS, 24(3):320--352, 2006. Google ScholarDigital Library
D. Shen, J.-T. Sun, Q. Yang, and Z. Chen. Building bridges for web query classification. In SIGIR, 2006. Google ScholarDigital Library
F. Song and W. B. Croft. A general language model for information retrieval. In CIKM, pages 316--321. ACM, 1999. Google ScholarDigital Library
Y. Song, H. Wang, Z. Wang, H. Li, and W. Chen. Short text conceptualization using a probabilistic knowledgebase. In IJCAI, pages 2330--2336. AAAI Press, 2011. Google ScholarDigital Library
F. M. Suchanek, G. Kasneci, and G. Weikum. Yago: a core of semantic knowledge. In WWW, pages 697--706. ACM, 2007. Google ScholarDigital Library
A. Sun. Short text classification using very few words. In SIGIR, pages 1145--1146. ACM, 2012. Google ScholarDigital Library
I. Szpektor, A. Gionis, and Y. Maarek. Improving recommendation for long-tail queries via templates. In WWW, pages 47--56. ACM, 2011. Google ScholarDigital Library
Z. Wang, H. Wang, and Z. Hu. Head, modifier, and constraint detection in short texts. In ICDE, pages 280--291, 2014.Google ScholarCross Ref
W. Wu, H. Li, H. Wang, and K. Q. Zhu. Probase: A probabilistic taxonomy for text understanding. In SIGMOD, pages 481--492. ACM, 2012. Google ScholarDigital Library
E. Yeh, D. Ramage, C. D. Manning, E. Agirre, and A. Soroa. Wikiwalk: random walks on wikipedia for semantic relatedness. In ACL Workshop, pages 41--49. ACL, 2009. Google ScholarDigital Library
C. Zhai and J. Lafferty. A study of smoothing methods for language models applied to ad hoc information retrieval. In SIGIR, pages 334--342. ACM, 2001. Google ScholarDigital Library
Z. Zhang and O. Nasraoui. Mining search engine query logs for query recommendation. In WWW, pages 1039--1040, 2006. Google ScholarDigital Library

Index Terms

Concept-based Short Text Classification and Ranking
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by classification
    2. Machine learning approaches
      1. Classification and regression trees

Recommendations

Short text classification using very few words
SIGIR '12: Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval

We propose a simple, scalable, and non-parametric approach for short text classification. Leveraging the well studied and scalable Information Retrieval (IR) framework, our approach mimics human labeling process for a piece of short text. It first ...
Read More
Enhancing naive bayes with various smoothing methods for short text classification
WWW '12 Companion: Proceedings of the 21st International Conference on World Wide Web

Partly due to the proliferance of microblog, short texts are becoming prominent. A huge number of short texts are generated every day, which calls for a method that can efficiently accommodate new data to incrementally adjust classification models. ...
Read More
Short Text Classification Using Wikipedia Concept Based Document Representation
ITA '13: Proceedings of the 2013 International Conference on Information Technology and Applications

Short text classification is a difficult and challenging task in information retrieval systems since the text data is short, sparse and multidimensional. In this paper, we represent short text with Wikipedia concepts for classification. Short document ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '14: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management
November 2014
2152 pages
ISBN:9781450325981
DOI:10.1145/2661829
General Chairs:
Jianzhong Li
Harbin Inst. of Technology
,
X. Sean Wang
Fudan University
,
Program Chairs:
Minos Garofalakis
Technical University of Crete, Greece
,
Ian Soboroff
National Institute of Standards, USA
,
Torsten Suel
New York University, USA
,
Min Wang
Google Research, USA
Copyright © 2014 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 3 November 2014
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
msn channel
query recommendation
short text classification
taxonomy knowledge
Qualifiers
- research-article
Conference

Acceptance Rates
CIKM '14 Paper Acceptance Rate175of838submissions,21%Overall Acceptance Rate1,861of8,427submissions,22%
More
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 67
  Total Citations
  View Citations
- 1,089
  Total Downloads
- Downloads (Last 12 months)34
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Concept-based Short Text Classification and Ranking

CIKM '14: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management

ABSTRACT

References

Cited By

Index Terms

Recommendations

Short text classification using very few words

Enhancing naive bayes with various smoothing methods for short text classification

Short Text Classification Using Wikipedia Concept Based Document Representation