skip to main content
10.1145/2661829.2662067acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Concept-based Short Text Classification and Ranking

Published:03 November 2014Publication History

ABSTRACT

Most existing approaches for text classification represent texts as vectors of words, namely ``Bag-of-Words.'' This text representation results in a very high dimensionality of feature space and frequently suffers from surface mismatching. Short texts make these issues even more serious, due to their shortness and sparsity. In this paper, we propose using ``Bag-of-Concepts'' in short text representation, aiming to avoid the surface mismatching and handle the synonym and polysemy problem. Based on ``Bag-of-Concepts,'' a novel framework is proposed for lightweight short text classification applications. By leveraging a large taxonomy knowledgebase, it learns a concept model for each category, and conceptualizes a short text to a set of relevant concepts. A concept-based similarity mechanism is presented to classify the given short text to the most similar category. One advantage of this mechanism is that it facilitates short text ranking after classification, which is needed in many applications, such as query or ad recommendation. We demonstrate the usage of our proposed framework through a real online application: Channel-based Query Recommendation. Experiments show that our framework can map queries to channels with a high degree of precision (avg. precision=90.3%), which is critical for recommendation applications.

References

  1. C. C. Aggarwal and C. Zhai. Mining text data. Springer, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Anagnostopoulos, L. Becchetti, C. Castillo, and A. Gionis. An optimization framework for query recommendation. In WSDM, pages 161--170. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. Baeza-Yates, C. Hurtado, and M. Mendoza. Query recommendation using query logs in search engines. In EDBT, pages 588--596. Springer, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. S. M. Beitzel, E. C. Jensen, O. Frieder, D. D. Lewis, A. Chowdhury, and A. Kolcz. Improving automatic query classification via semi-supervised learning. In ICDM, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. I. Bordino, G. De Francisci Morales, I. Weber, and F. Bonchi. From machu_picchu to rafting the urubamba river: anticipating information needs via the entity-query graph. In WSDM, pages 275--284. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C.-C. Chang and C.-J. Lin. Libsvm: a library for support vector machines. TIST, 2(3):27, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. Chen, X. Jin, and D. Shen. Short text classification improved by learning multi-granularity topics. In IJCAI, pages 1776--1781. AAAI Press, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. C. Cortes and V. Vapnik. Support-vector networks. Machine learning, 20(3):273--297, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. N. Craswell and M. Szummer. Random walks on the click graph. In SIGIR, pages 239--246. ACM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. W. B. Croft, M. Bendersky, H. Li, and G. Xu. Query representation and understanding workshop. In SIGIR Forum, volume 44, pages 48--53, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. H. K. Dai, L. Zhao, Z. Nie, J.-R. Wen, L. Wang, and Y. Li. Detecting online commercial intention (oci). In WWW, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. V. Dang and W. B. Croft. Diversity by proportionality: an election-based approach to search result diversification. In SIGIR, pages 65--74. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. H. Feild and J. Allan. Task-aware query recommendation. In SIGIR, pages 83--92. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. E. Gabrilovich and S. Markovitch. Overcoming the brittleness bottleneck using wikipedia: Enhancing text categorization with encyclopedic knowledge. In AAAI, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. E. Gabrilovich and S. Markovitch. Computing semantic relatedness using wikipedia-based explicit semantic analysis. In IJCAI, volume 7, pages 1606--1611, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. He, V. Hollink, and A. de Vries. Combining implicit and explicit topic representations for result diversification. In SIGIR, pages 851--860. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. X. Hu, N. Sun, C. Zhang, and T.-S. Chua. Exploiting internal and external semantics for the clustering of short texts using world knowledge. In CIKM, pages 919--928. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. L. Huang. Concept-based text clustering. PhD thesis, The University of Waikato, 2011.Google ScholarGoogle Scholar
  19. A. Jordan. On discriminative vs. generative classifiers: A comparison of logistic regression and naive bayes. NIPS, 14:841, 2002.Google ScholarGoogle Scholar
  20. Y.-H. Kim, S.-Y. Hahn, and B.-T. Zhang. Text filtering by boosting naive bayes classifiers. In SIGIR, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. T. Lee, Z. Wang, H. Wang, and S.-w. Hwang. Attribute extraction and scoring: A probabilistic approach. In ICDE, pages 194--205. IEEE, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. P. Li, H. Wang, K. Q. Zhu, Z. Wang, and X. Wu. Computing term similarity by large probabilistic isa knowledge. In CIKM, pages 1401--1410. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. R. Li, B. Kao, B. Bi, R. Cheng, and E. Lo. Dqr: a probabilistic approach to diversified query recommendation. In CIKM, pages 16--25. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. X. Li, Y.-Y. Wang, and A. Acero. Learning query intent from regularized click graphs. In SIGIR, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Y. Li, D. McLean, Z. A. Bandar, J. D. O'shea, and K. Crockett. Sentence similarity based on semantic nets and corpus statistics. TKDE, 18(8):1138--1150, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. H. T. Ng, W. B. Goh, and K. L. Low. Feature selection, perceptron learning, and a usability case study for text categorization. In SIGIR. ACM, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. X.-H. Phan, L.-M. Nguyen, and S. Horiguchi. Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In WWW, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. J. R. Quinlan. Induction of decision trees. Machine learning, pages 81--106, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. M. Sahlgren and R. Cöster. Using bag-of-concepts to improve the performance of support vector machines in text categorization. In COLING, page 487. ACL, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. G. Salton, A. Wong, and C.-S. Yang. A vector space model for automatic indexing. Communications of the ACM, 1975. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. D. Shen, R. Pan, J.-T. Sun, J. J. Pan, K. Wu, J. Yin, and Q. Yang. Q2c@ust: our winning solution to query classification in kddcup 2005. SIGKDD, 7(2):100--110, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. D. Shen, R. Pan, J.-T. Sun, J. J. Pan, K. Wu, J. Yin, and Q. Yang. Query enrichment for web-query classification. TOIS, 24(3):320--352, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. D. Shen, J.-T. Sun, Q. Yang, and Z. Chen. Building bridges for web query classification. In SIGIR, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. F. Song and W. B. Croft. A general language model for information retrieval. In CIKM, pages 316--321. ACM, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Y. Song, H. Wang, Z. Wang, H. Li, and W. Chen. Short text conceptualization using a probabilistic knowledgebase. In IJCAI, pages 2330--2336. AAAI Press, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. F. M. Suchanek, G. Kasneci, and G. Weikum. Yago: a core of semantic knowledge. In WWW, pages 697--706. ACM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. A. Sun. Short text classification using very few words. In SIGIR, pages 1145--1146. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. I. Szpektor, A. Gionis, and Y. Maarek. Improving recommendation for long-tail queries via templates. In WWW, pages 47--56. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Z. Wang, H. Wang, and Z. Hu. Head, modifier, and constraint detection in short texts. In ICDE, pages 280--291, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  40. W. Wu, H. Li, H. Wang, and K. Q. Zhu. Probase: A probabilistic taxonomy for text understanding. In SIGMOD, pages 481--492. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. E. Yeh, D. Ramage, C. D. Manning, E. Agirre, and A. Soroa. Wikiwalk: random walks on wikipedia for semantic relatedness. In ACL Workshop, pages 41--49. ACL, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. C. Zhai and J. Lafferty. A study of smoothing methods for language models applied to ad hoc information retrieval. In SIGIR, pages 334--342. ACM, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Z. Zhang and O. Nasraoui. Mining search engine query logs for query recommendation. In WWW, pages 1039--1040, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Concept-based Short Text Classification and Ranking

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        CIKM '14: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management
        November 2014
        2152 pages
        ISBN:9781450325981
        DOI:10.1145/2661829

        Copyright © 2014 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 3 November 2014

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        CIKM '14 Paper Acceptance Rate175of838submissions,21%Overall Acceptance Rate1,861of8,427submissions,22%

        Upcoming Conference

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader