skip to main content
10.3115/1119282.1119287dlproceedingsArticle/Chapter ViewAbstractPublication PagesmweConference Proceedingsconference-collections
Article
Free Access

A language model approach to keyphrase extraction

Published:12 July 2003Publication History

ABSTRACT

We present a new approach to extracting keyphrases based on statistical language models. Our approach is to use pointwise KL-divergence between multiple language models for scoring both phraseness and informativeness, which can be unified into a single score to rank extracted phrases.

References

  1. Rakesh Agrawal and Ramakrishnan Srikant. 1994. Fast algorithms for mining association rules. In Jorge B. Bocca, Matthias Jarke, and Carlo Zaniolo, editors, Proc. 20th Int. Conf. Very Large Data Bases, VLDB, pages 487--499. Morgan Kaufmann, 12--15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Stanley F. Chen and Joshua T. Goodman. 1996. An empirical study of smoothing techniques for language modeling. In Proceedings of the 34th Annual Meeting of the ACL, pages 310--318, Santa Cruz, California, June. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Kenneth W. Church and Patrick Hanks. 1990. Word association norms, mutual information, and lexicography. In Computational Linguistics, volume 16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. K. Church, P. Hanks, D. Hindle, and W. Gale, 1991. Using Statistics in Lexical Analysis, pages 115--164. Lawrence Erlbaum.Google ScholarGoogle Scholar
  5. Thomas M. Cover and Joy A. Thomas. 1991. Elements of Information Theory. John Wiley. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Fred J. Damerau. 1993. Generating and evaluating domain-oriented multi-word terms from texts. Information Processing and Management, 29(4):433--447. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Ted E. Dunning. 1993. Accurate methods for the statistics of surprise and coincidence. Computational Linguistics, 19(1):61--74. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Eibe Frank, Gordon W. Paynter, Ian H. Witten, Carl Gutwin, and Craig G. Nevill-Manning. 1999. Domain-specific keyphrase extraction. In IJCAI, pages 668--673. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Frederick Jelinek. 1990. Self-organized language modeling for speech recognition. In Alex Waibel and Kai-Fu Lee, editors, Readings in Speech Recognition, pages 450--506. Morgan Kaufmann Publishers, Inc., San Maeio, California. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Christopher D. Manning and Hinrich Schütze. 1999. Foundations of Statistical Natural Language Processing. The MIT Press, Cambridge, Massachusetts. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Patrick Pantel and Dekang Lin. 2001. A statistical corpus-based term extractor. In E. Stroulia and S. Matwin, editors, Lecture Notes in Artificial Intelligence, pages 36--46. Springer-Verlag.Google ScholarGoogle Scholar
  12. Frank Z. Smadja. 1994. Retrieving collocations from text: Xtract. Computational Linguistics, 19(1):143--177. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Peter D. Turney. 2000. Learning algorithms for keyphrase extraction. Information Retrieval, 2(4):303--336. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Mikio Yamamoto and Kenneth W. Church. 2001. Using suffix arrays to compute term frequency and document frequency for all substrings in a corpus. Computational Linguistics, 27(1):1--30. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image DL Hosted proceedings
    MWE '03: Proceedings of the ACL 2003 workshop on Multiword expressions: analysis, acquisition and treatment - Volume 18
    July 2003
    105 pages

    Publisher

    Association for Computational Linguistics

    United States

    Publication History

    • Published: 12 July 2003

    Qualifiers

    • Article

    Acceptance Rates

    Overall Acceptance Rate31of69submissions,45%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader