ABSTRACT
We present a new approach to extracting keyphrases based on statistical language models. Our approach is to use pointwise KL-divergence between multiple language models for scoring both phraseness and informativeness, which can be unified into a single score to rank extracted phrases.
- Rakesh Agrawal and Ramakrishnan Srikant. 1994. Fast algorithms for mining association rules. In Jorge B. Bocca, Matthias Jarke, and Carlo Zaniolo, editors, Proc. 20th Int. Conf. Very Large Data Bases, VLDB, pages 487--499. Morgan Kaufmann, 12--15. Google ScholarDigital Library
- Stanley F. Chen and Joshua T. Goodman. 1996. An empirical study of smoothing techniques for language modeling. In Proceedings of the 34th Annual Meeting of the ACL, pages 310--318, Santa Cruz, California, June. Google ScholarDigital Library
- Kenneth W. Church and Patrick Hanks. 1990. Word association norms, mutual information, and lexicography. In Computational Linguistics, volume 16. Google ScholarDigital Library
- K. Church, P. Hanks, D. Hindle, and W. Gale, 1991. Using Statistics in Lexical Analysis, pages 115--164. Lawrence Erlbaum.Google Scholar
- Thomas M. Cover and Joy A. Thomas. 1991. Elements of Information Theory. John Wiley. Google ScholarDigital Library
- Fred J. Damerau. 1993. Generating and evaluating domain-oriented multi-word terms from texts. Information Processing and Management, 29(4):433--447. Google ScholarDigital Library
- Ted E. Dunning. 1993. Accurate methods for the statistics of surprise and coincidence. Computational Linguistics, 19(1):61--74. Google ScholarDigital Library
- Eibe Frank, Gordon W. Paynter, Ian H. Witten, Carl Gutwin, and Craig G. Nevill-Manning. 1999. Domain-specific keyphrase extraction. In IJCAI, pages 668--673. Google ScholarDigital Library
- Frederick Jelinek. 1990. Self-organized language modeling for speech recognition. In Alex Waibel and Kai-Fu Lee, editors, Readings in Speech Recognition, pages 450--506. Morgan Kaufmann Publishers, Inc., San Maeio, California. Google ScholarDigital Library
- Christopher D. Manning and Hinrich Schütze. 1999. Foundations of Statistical Natural Language Processing. The MIT Press, Cambridge, Massachusetts. Google ScholarDigital Library
- Patrick Pantel and Dekang Lin. 2001. A statistical corpus-based term extractor. In E. Stroulia and S. Matwin, editors, Lecture Notes in Artificial Intelligence, pages 36--46. Springer-Verlag.Google Scholar
- Frank Z. Smadja. 1994. Retrieving collocations from text: Xtract. Computational Linguistics, 19(1):143--177. Google ScholarDigital Library
- Peter D. Turney. 2000. Learning algorithms for keyphrase extraction. Information Retrieval, 2(4):303--336. Google ScholarDigital Library
- Mikio Yamamoto and Kenneth W. Church. 2001. Using suffix arrays to compute term frequency and document frequency for all substrings in a corpus. Computational Linguistics, 27(1):1--30. Google ScholarDigital Library
Recommendations
Domain-specific keyphrase extraction
CIKM '05: Proceedings of the 14th ACM international conference on Information and knowledge managementDocument keyphrases provide semantic metadata characterizing documents and producing an overview of the content of a document. They can be used in many text-mining and knowledge management related applications. This paper describes a Keyphrase ...
Rake-Pmi Automated Keyphrase Extraction: An unsupervised approach for automated extraction of keyphrases
ICIA-16: Proceedings of the International Conference on Informatics and AnalyticsKeyphrase extraction is a major step which is used in various applications such as document clustering, summarization. It can be solved using supervised as well as unsupervised approach. The unsupervised approach is based on the ranking of keyphrases ...
Automatic keyphrase extraction by bridging vocabulary gap
CoNLL '11: Proceedings of the Fifteenth Conference on Computational Natural Language LearningKeyphrase extraction aims to select a set of terms from a document as a short summary of the document. Most methods extract keyphrases according to their statistical properties in the given document. Appropriate keyphrases, however, are not always ...
Comments