skip to main content
10.1145/1135777.1135834acmconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
Article

A web-based kernel function for measuring the similarity of short text snippets

Published:23 May 2006Publication History

ABSTRACT

Determining the similarity of short text snippets, such as search queries, works poorly with traditional document similarity measures (e.g., cosine), since there are often few, if any, terms in common between two short text snippets. We address this problem by introducing a novel method for measuring the similarity between short text snippets (even those without any overlapping terms) by leveraging web search results to provide greater context for the short texts. In this paper, we define such a similarity kernel function, mathematically analyze some of its properties, and provide examples of its efficacy. We also show the use of this kernel function in a large-scale system for suggesting related queries to search engine users.

References

  1. P. Anick and S. Tipirneni. The paraphrase search assistant: Terminological feedback for iterative information seeking. In Proceedings of the 22nd Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, pages 153--159, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Banerjee, I. S. Dhillon, J. Ghosh, and S. Sra. Clustering on the unit hypersphere using von mises-fisher distributions. Journal of Machine Learning Research, 6:1345--1382, 2005.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. C. Buckley, G. Salton, J. Allan, and A. Singhal. Automatic query expansion using SMART: TREC 3. In The Third Text REtrieval Conference, pages 69--80, 1994.]]Google ScholarGoogle Scholar
  4. N. Cristianini and J. Shawe-Taylor. An Introduction to Support Vector Machines and Other Kernel-based Learning Methods. Cambridge University Press, 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. N. Cristianini, J. Shawe-Taylor, and H. Lodhi. Latent semantic kernels. Journal of Intelligent Information Systems, 18(2):127--152, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6):391--407, 1990.]]Google ScholarGoogle ScholarCross RefCross Ref
  7. I. S. Dhillon and S. Sra. Modeling data using directional distributions, 2003.]]Google ScholarGoogle Scholar
  8. S. T. Dumais, J. Platt, D. Heckerman, and M. Sahami. Inductive learning algorithms and representations for text categorization. In CIKM-98: Proceedings of the Seventh International Conference on Information and Knowledge Management, 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. L. Fitzpatrick and M. Dent. Automatic feedback using past queries: Social searching? In Proceedings of the 20th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, pages 306--313, 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. D. Harman. Relevance feedback and other query modification techniques. In W. B. Frakes and R. Baeza-Yates, editors, Information Retrieval: Data Structures and Algorithms, pages 241--263. Prentice Hall, 1992.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. T. Joachims. Text categorization with support vector machines: learning with many relevant features. In Proceedings of ECML-98, 10th European Conference on Machine Learning, number 1398, pages 137--142, 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. S. Kandola, J. Shawe-Taylor, and N. Cristianini. Learning semantic similarity. In Advances in Neural Information Processing Systems (NIPS) 15, pages 657--664, 2002.]]Google ScholarGoogle Scholar
  13. M. Mitra, A. Singhal, and C. Buckley. Improving automatic query expansion. In Proceedings of the 21st Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, pages 206--214, 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. V. V. Raghavan and H. Sever. On the reuse of past optimal queries. In Proceedings of the 18th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, pages 344--350, 1995.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. G. Salton and C. Buckley. Term weighting approaches in automatic text retrieval. Information Processing and Management, 24(5):513--523, 1988.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. G. Salton and M. J. McGill. Introduction to Modern Information Retrieval. McGraw-Hill Book Company, 1983.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. G. Salton, A. Wong, and C. S. Yang. A vector space model for automatic indexing. Communications of the ACM, 18:613--620, 1975.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. A. Vinokourov, J. Shawe-Taylor, and N. Cristianini. Inferring a semantic representation of text via cross-language correlation analysis. In Advances in Neural Information Processing Systems (NIPS) 15, pages 1473--1480, 2002.]]Google ScholarGoogle Scholar
  19. B. Vélez, R. Wiess, M. A. Sheldon, and D. K. Gifford. Fast and effective query refinement. In Proceedings of the 20th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, pages 6--15, 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. Xu and W. B. Croft. Query expansion using local and global document analysis. In Proceedings of the 19th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, pages 4--11, 1996.]] Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A web-based kernel function for measuring the similarity of short text snippets

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          WWW '06: Proceedings of the 15th international conference on World Wide Web
          May 2006
          1102 pages
          ISBN:1595933239
          DOI:10.1145/1135777

          Copyright © 2006 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 23 May 2006

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • Article

          Acceptance Rates

          Overall Acceptance Rate1,899of8,196submissions,23%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader