skip to main content
10.1145/1806338.1806390acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiiwasConference Proceedingsconference-collections
research-article

Key phrase extraction: a hybrid assignment and extraction approach

Published:14 December 2009Publication History

ABSTRACT

Automatic key phrase extraction is fundamental to the success of many recent digital library applications and semantic information retrieval techniques and a difficult and essential problem in Vietnamese natural language processing (NLP). In this work, we propose a novel method for key phrase extracting of Vietnamese text that combines assignment and extraction approaches. We also explore NLP techniques that we propose for the analysis of Vietnamese texts, focusing on the advanced candidate phrases recognition phase as well as part-of-speech (POS) tagging. Then we propose a method that exploits specific characteristics of the Vietnamese language and exploits the Vietnamese Wikipedia as an ontology for key phrase ambiguity resolution. Finally, we show the results of several experiments that have examined the impacts of strategies chosen for Vietnamese key phrase extracting.

References

  1. Barker, K., Cornacchia, N.: Using noun phrase heads to extract document keyphrases. In: Proc. of the 13th Biennial Conf. of the Canadian Society on Computational Studies of Intelligence, pp. 40--52. Springer, Heidelberg (2000). Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Banerjee S. and Pederson T., 2003, Extended Gloss Overlaps as a Measure of Semantic Relatedness, In Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence, IJCAI-03, pp. 805--810. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Bunescu, R., Pasca, M.: Using encyclopedic knowledge for name entity disambiguation. In: Proc. Of the 11th Conference of EACL, pp. 9--16 (2006).Google ScholarGoogle Scholar
  4. Chau Q. Nguyen, Tuoi T. Phan. An Ontology--Based Approach to Vietnamese Key Phrase Extraction, in Proceedings of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing (ACL-IJCNLP 2009), August 2--7, 2009, Singapore. Companion Vol, pp. 181--184. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Chau Q. Nguyen, Luan T. Hong, Tuoi T. Phan. A Support Vector Machines Approach to Vietnamese Key Phrase Extraction, in Proceedings of the 2009 IEEE-RIVF International Conference on Computing & Communication Technologies (IEEE-RIVF 2009), IEEE eXpress, pp. 131--135.Google ScholarGoogle Scholar
  6. Chau Q. Nguyen, Tuoi T. Phan, Tru H. Cao. Writing Style Based Vietnamese POS Tagging, in Proceedings of The Second National Symposium on Fundamental and Applied Information Technology Research- FAIR'05 (9/2005), pp. 106--116.Google ScholarGoogle Scholar
  7. Chau Q. Nguyen, Tuoi T. Phan. A Hybrid Approach to Vietnamese Part-Of-Speech Tagging. In Proceeding of the 9th International Oriental COCOSDA Conference (OCOCOSDA'06), 12/2006, Malaysia, pp. 157--160.Google ScholarGoogle Scholar
  8. Chau Q. Nguyen, Tuoi T. Phan. A Pattern-based Approach to Vietnamese Key Phrase Extraction, In Addendum Contributions of the 5th International IEEE Conference on Computer Sciences- RIVF'07, 2007, Studia Informatica Universalis, pp. 41--46.Google ScholarGoogle Scholar
  9. Dumais, S. T., Platt, J., Hecherman, D., Sahami, M.: Inductive learning algorithms and representations for text categorization. In: CIKM. Proc. of 7th International Conference on Information and Knowledge Management, pp. 148--155 (1998). Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Frank, E., Paynter, G. W., Witten, H. I., Gutwin, C., Nevill-Manning, C. G.: Domain specific keyphrase extraction. In: Proceedings of the 16th International Joint Conference on ArtificialIntelligence, pp. 668--673 (1999) Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Kim, W., Wilbur, W. J.: Corpus-based statistical screening for content-bearing terms. J. Am. Soc. Inf. Sci. Technol. 52, 247--259 (2001). Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Medelyan, O., Witten, I. H.: Thesaurus based automatic keyphrase indexing. In: Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries, pp. 296--297. ACM Press, New York (2006). Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Pouliquen, B., Steinberger, R., Ignat, C.: Automatic annotation of multilingual text collections with a conceptual thesaurus. In: BUG (2003).Google ScholarGoogle Scholar
  14. Steyvers, M., Griffiths, T.: Probabilistic topic models. In: Landauer, T., Mcnamara, D., Dennis, S., Kintsch, W. (eds.) Latent Semantic Analysis: A Road to Meaning, Laurence Erlbaum, Mahwah (2005).Google ScholarGoogle Scholar
  15. Tomokiyo, T., Hurst, M.: A language model approach to keyphrase extraction. In: Proceedings of ACL Workshop on Multiword Expressions (2003). Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Turney, P. D. Learning to Extract Keyphrases from Text, Canadian National Research Council, Institute for Information Technology, 1999.Google ScholarGoogle Scholar
  17. Turney, P. D. Learning Algorithms for Keyphrase Extraction. Information Retrieval 2, 4 (2000); 303--336. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Zesch, T., Gurevych, I.: Analysis of the Wikipedia Category Graph for NLP Applications. In Proceedings of the TextGraphs-2 Workshop (NAACL-HLT 2007), pp. 1--8(2007).Google ScholarGoogle Scholar

Index Terms

  1. Key phrase extraction: a hybrid assignment and extraction approach

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Other conferences
          iiWAS '09: Proceedings of the 11th International Conference on Information Integration and Web-based Applications & Services
          December 2009
          763 pages
          ISBN:9781605586601
          DOI:10.1145/1806338

          Copyright © 2009 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 14 December 2009

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader