skip to main content
10.3115/1072228.1072370dlproceedingsArticle/Chapter ViewAbstractPublication PagescolingConference Proceedingsconference-collections
Article
Free Access

Automatic glossary extraction: beyond terminology identification

Authors Info & Claims
Published:24 August 2002Publication History

ABSTRACT

This paper describes a method for automatically extracting domain-specific glossaries from large document collections. We show that, compared with current text analysis methods for extracting technical terminology from text, our extracted glossaries more successfully support applications requiring knowledge of domain concepts. After presenting our methods, we illustrate the output of GlossEx, our glossary extraction tool, and present an informal evaluation of its performance.

References

  1. R. Baeza-Yates and B. Ribeiro-Neto. 1999. Modern Information Retrieval. Addison Wesley. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. B. Boguraev. 2000. Towards Finite-State Analysis of Lexical Cohesion. In Proceedings of the 3rd International Conference on Finite-State Methods for NLP, INTEX-3, Liege, Belgium.Google ScholarGoogle Scholar
  3. B. Boguraev and M. Neff 2000. Lexical Cohesion, Discourse Segmentation and Document Summarization. In Proceedings of RIAO-2000Google ScholarGoogle Scholar
  4. K. Church, and P. Hanks. 1990. Word Association norms, nutual information and lexicography. Computational Linguistics 6(1), pp. 22--29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. Cooper and R. Byrd. 1997. Lexical Navigation - Visually Prompted Query Expansion and Refinement. In Proceedings of DIGLIB'97. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. G. Dias. S. Guillore., J. Bassano, and J. Lopes. 2000. Combining Linguistics with Statistics for Multiword Term Extraction: A Fruitful Association?. In Proceedings of RIAO2000.Google ScholarGoogle Scholar
  7. F. Damerau, F 1993. Generating and evaluating domain-oriented multi-word terms from texts. Information Processing & Management 29:433--447. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. L. Dice. 1945. Measures of the amount of ecologic associations between species. Journal of Ecology (26).Google ScholarGoogle ScholarCross RefCross Ref
  9. T. Dunning. 1993. Accurate methods for the statistics of surprise and coincidence. Computational Linguistics 19:61--74. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. The Eagles Lexicon Interest Gruop. 1996. Preliminary Recommendations on Semantic Encoding Interim Report. http://www.ilc.pi.cnr.it/EAGLES96/rep2/.Google ScholarGoogle Scholar
  11. J. Hobbs, D. Appelt, J. Bear, D. Israel, M. Kameyama, M. Stickel and M. Tyson. 1997. FASTUS:A Cascaded Finite-State Transducer for Extracting Information from Natural-Language Text. In Finite-State Language Processing pp. 383--406, Roche E. and Y. Schabes, eds., The MIT Press, Cambridge, MA.Google ScholarGoogle Scholar
  12. IBM. 2001. IBM Dictionary and Linguistic Tools. http://booksrvl.raleigh.ibm.com/lingtool.Google ScholarGoogle Scholar
  13. IBM T. J. Watson Research. 2001. The Talent (Text Analysis and Language Engineering) project. http://www.research.ibm.com/talent/.Google ScholarGoogle Scholar
  14. C. Jacquemin. 1995. A Symbolic and Surgical Acquisition of terms through Variation. In Proceedings of Workshop. New Approaches to Learning for NLP at 14th IJCAI'95.Google ScholarGoogle Scholar
  15. J. Justeson and S. Katz. 1995. Technical terminology: some linguistic properties and an algorithm for identification in text. Natural Language Engineering(1), pp. 9--27.Google ScholarGoogle Scholar
  16. L. Karttunen, J. Chanod, G. Grenfenstette, and A. Schiller. 1996. Regular Expressions for Language. Engineering. Natural Language Engineering, 4(1), pp. 305--328. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. A. Kinyon. 2001. A Language-Independent Shallow-Parser Compiler. In Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics, Toulouse, France. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. A. Kornai. 1999. Extended Finite State Models of Language, Cambridge University Press, Cambridge, UK Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M. Lapata, S. McDonald, and F. Keller. 1999. Determinants of Adjective-Noun Plausibility. In Proceedings of the 9th Conference of the European Chapter of the Association for Computational Linguistics, pp. 30--36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. R. Mack, Y. Ravin, and R. Byrd. Knowledge portals and the emerging digital knowledge workplace. In IBM Systems Journal, vol. 40, no. 4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. D. Maynard and S. Ananiadou. 1999. Term Extraction using a Similarity-based Approach, John Benjamins.Google ScholarGoogle Scholar
  22. M. Marcus and M. Santorini. 1993. Building a large annotated corpus of English: the Penn Treebank. Computational Linguistics, 19(2), pp. 313--330. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Y. Park and R. Byrd. 2001. Hybrid Text Mining for Matching Abbreviations and their Definitions. In Proceedings of Empirical Methods in Natural Language Processing, pp. 126--133.Google ScholarGoogle Scholar
  24. Y. Ravin, N. Wacholder, and M. Choi. 1997. Disambiguation of proper names in text. 17th Annual ACM-SIGIR Conference.Google ScholarGoogle Scholar
  25. T. Rindflesch, L. Tanabe, J. Weinstein, and L. Hunter. 2000. EDGAR: Extraction of Drugs, Genes and Relations from the Biomedical Literature. In Proceedings of the Pacific Symposium on Biocomputing.Google ScholarGoogle Scholar
  26. P. Schone and D. Jurafsky, D. 2001. Is Knowledge-Free Induction of Multiword Unit Dictionary Headwords a Solved Problem?. In Proceedings of Empirical Methods in Natural Language Processing, pp. 100--108.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image DL Hosted proceedings
    COLING '02: Proceedings of the 19th international conference on Computational linguistics - Volume 1
    August 2002
    1184 pages

    Publisher

    Association for Computational Linguistics

    United States

    Publication History

    • Published: 24 August 2002

    Qualifiers

    • Article

    Acceptance Rates

    Overall Acceptance Rate1,537of1,537submissions,100%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader