ABSTRACT
This paper describes a method for automatically extracting domain-specific glossaries from large document collections. We show that, compared with current text analysis methods for extracting technical terminology from text, our extracted glossaries more successfully support applications requiring knowledge of domain concepts. After presenting our methods, we illustrate the output of GlossEx, our glossary extraction tool, and present an informal evaluation of its performance.
- R. Baeza-Yates and B. Ribeiro-Neto. 1999. Modern Information Retrieval. Addison Wesley. Google ScholarDigital Library
- B. Boguraev. 2000. Towards Finite-State Analysis of Lexical Cohesion. In Proceedings of the 3rd International Conference on Finite-State Methods for NLP, INTEX-3, Liege, Belgium.Google Scholar
- B. Boguraev and M. Neff 2000. Lexical Cohesion, Discourse Segmentation and Document Summarization. In Proceedings of RIAO-2000Google Scholar
- K. Church, and P. Hanks. 1990. Word Association norms, nutual information and lexicography. Computational Linguistics 6(1), pp. 22--29. Google ScholarDigital Library
- J. Cooper and R. Byrd. 1997. Lexical Navigation - Visually Prompted Query Expansion and Refinement. In Proceedings of DIGLIB'97. Google ScholarDigital Library
- G. Dias. S. Guillore., J. Bassano, and J. Lopes. 2000. Combining Linguistics with Statistics for Multiword Term Extraction: A Fruitful Association?. In Proceedings of RIAO2000.Google Scholar
- F. Damerau, F 1993. Generating and evaluating domain-oriented multi-word terms from texts. Information Processing & Management 29:433--447. Google ScholarDigital Library
- L. Dice. 1945. Measures of the amount of ecologic associations between species. Journal of Ecology (26).Google ScholarCross Ref
- T. Dunning. 1993. Accurate methods for the statistics of surprise and coincidence. Computational Linguistics 19:61--74. Google ScholarDigital Library
- The Eagles Lexicon Interest Gruop. 1996. Preliminary Recommendations on Semantic Encoding Interim Report. http://www.ilc.pi.cnr.it/EAGLES96/rep2/.Google Scholar
- J. Hobbs, D. Appelt, J. Bear, D. Israel, M. Kameyama, M. Stickel and M. Tyson. 1997. FASTUS:A Cascaded Finite-State Transducer for Extracting Information from Natural-Language Text. In Finite-State Language Processing pp. 383--406, Roche E. and Y. Schabes, eds., The MIT Press, Cambridge, MA.Google Scholar
- IBM. 2001. IBM Dictionary and Linguistic Tools. http://booksrvl.raleigh.ibm.com/lingtool.Google Scholar
- IBM T. J. Watson Research. 2001. The Talent (Text Analysis and Language Engineering) project. http://www.research.ibm.com/talent/.Google Scholar
- C. Jacquemin. 1995. A Symbolic and Surgical Acquisition of terms through Variation. In Proceedings of Workshop. New Approaches to Learning for NLP at 14th IJCAI'95.Google Scholar
- J. Justeson and S. Katz. 1995. Technical terminology: some linguistic properties and an algorithm for identification in text. Natural Language Engineering(1), pp. 9--27.Google Scholar
- L. Karttunen, J. Chanod, G. Grenfenstette, and A. Schiller. 1996. Regular Expressions for Language. Engineering. Natural Language Engineering, 4(1), pp. 305--328. Google ScholarDigital Library
- A. Kinyon. 2001. A Language-Independent Shallow-Parser Compiler. In Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics, Toulouse, France. Google ScholarDigital Library
- A. Kornai. 1999. Extended Finite State Models of Language, Cambridge University Press, Cambridge, UK Google ScholarDigital Library
- M. Lapata, S. McDonald, and F. Keller. 1999. Determinants of Adjective-Noun Plausibility. In Proceedings of the 9th Conference of the European Chapter of the Association for Computational Linguistics, pp. 30--36. Google ScholarDigital Library
- R. Mack, Y. Ravin, and R. Byrd. Knowledge portals and the emerging digital knowledge workplace. In IBM Systems Journal, vol. 40, no. 4. Google ScholarDigital Library
- D. Maynard and S. Ananiadou. 1999. Term Extraction using a Similarity-based Approach, John Benjamins.Google Scholar
- M. Marcus and M. Santorini. 1993. Building a large annotated corpus of English: the Penn Treebank. Computational Linguistics, 19(2), pp. 313--330. Google ScholarDigital Library
- Y. Park and R. Byrd. 2001. Hybrid Text Mining for Matching Abbreviations and their Definitions. In Proceedings of Empirical Methods in Natural Language Processing, pp. 126--133.Google Scholar
- Y. Ravin, N. Wacholder, and M. Choi. 1997. Disambiguation of proper names in text. 17th Annual ACM-SIGIR Conference.Google Scholar
- T. Rindflesch, L. Tanabe, J. Weinstein, and L. Hunter. 2000. EDGAR: Extraction of Drugs, Genes and Relations from the Biomedical Literature. In Proceedings of the Pacific Symposium on Biocomputing.Google Scholar
- P. Schone and D. Jurafsky, D. 2001. Is Knowledge-Free Induction of Multiword Unit Dictionary Headwords a Solved Problem?. In Proceedings of Empirical Methods in Natural Language Processing, pp. 100--108.Google Scholar
Recommendations
Automatic Word Embeddings-Based Glossary Term Extraction from Large-Sized Software Requirements
Requirements Engineering: Foundation for Software QualityAbstract[Context and Motivation] Requirements glossary defines specialized and technical terms used in a requirements document. A requirements glossary helps in improving the quality and understandability of requirements documents. [Question/Problem] ...
Comments