Article

Free Access

Automatic glossary extraction: beyond terminology identification

Authors:
Youngja Park

IBM Thomas J. Watson Research Center, Yorktown Heights, NY

IBM Thomas J. Watson Research Center, Yorktown Heights, NY
View Profile

,
Roy J Byrd

IBM Thomas J. Watson Research Center, Yorktown Heights, NY

IBM Thomas J. Watson Research Center, Yorktown Heights, NY
View Profile

,
Branimir K Boguraev

IBM Thomas J. Watson Research Center, Yorktown Heights, NY

IBM Thomas J. Watson Research Center, Yorktown Heights, NY
View Profile

COLING '02: Proceedings of the 19th international conference on Computational linguistics - Volume 1August 2002Pages 1–7https://doi.org/10.3115/1072228.1072370

Published:24 August 2002Publication History

COLING '02: Proceedings of the 19th international conference on Computational linguistics - Volume 1

Pages 1–7

ABSTRACT

This paper describes a method for automatically extracting domain-specific glossaries from large document collections. We show that, compared with current text analysis methods for extracting technical terminology from text, our extracted glossaries more successfully support applications requiring knowledge of domain concepts. After presenting our methods, we illustrate the output of GlossEx, our glossary extraction tool, and present an informal evaluation of its performance.

References

R. Baeza-Yates and B. Ribeiro-Neto. 1999. Modern Information Retrieval. Addison Wesley. Google ScholarDigital Library
B. Boguraev. 2000. Towards Finite-State Analysis of Lexical Cohesion. In Proceedings of the 3rd International Conference on Finite-State Methods for NLP, INTEX-3, Liege, Belgium.Google Scholar
B. Boguraev and M. Neff 2000. Lexical Cohesion, Discourse Segmentation and Document Summarization. In Proceedings of RIAO-2000Google Scholar
K. Church, and P. Hanks. 1990. Word Association norms, nutual information and lexicography. Computational Linguistics 6(1), pp. 22--29. Google ScholarDigital Library
J. Cooper and R. Byrd. 1997. Lexical Navigation - Visually Prompted Query Expansion and Refinement. In Proceedings of DIGLIB'97. Google ScholarDigital Library
G. Dias. S. Guillore., J. Bassano, and J. Lopes. 2000. Combining Linguistics with Statistics for Multiword Term Extraction: A Fruitful Association?. In Proceedings of RIAO2000.Google Scholar
F. Damerau, F 1993. Generating and evaluating domain-oriented multi-word terms from texts. Information Processing & Management 29:433--447. Google ScholarDigital Library
L. Dice. 1945. Measures of the amount of ecologic associations between species. Journal of Ecology (26).Google ScholarCross Ref
T. Dunning. 1993. Accurate methods for the statistics of surprise and coincidence. Computational Linguistics 19:61--74. Google ScholarDigital Library
The Eagles Lexicon Interest Gruop. 1996. Preliminary Recommendations on Semantic Encoding Interim Report. http://www.ilc.pi.cnr.it/EAGLES96/rep2/.Google Scholar
J. Hobbs, D. Appelt, J. Bear, D. Israel, M. Kameyama, M. Stickel and M. Tyson. 1997. FASTUS:A Cascaded Finite-State Transducer for Extracting Information from Natural-Language Text. In Finite-State Language Processing pp. 383--406, Roche E. and Y. Schabes, eds., The MIT Press, Cambridge, MA.Google Scholar
IBM. 2001. IBM Dictionary and Linguistic Tools. http://booksrvl.raleigh.ibm.com/lingtool.Google Scholar
IBM T. J. Watson Research. 2001. The Talent (Text Analysis and Language Engineering) project. http://www.research.ibm.com/talent/.Google Scholar
C. Jacquemin. 1995. A Symbolic and Surgical Acquisition of terms through Variation. In Proceedings of Workshop. New Approaches to Learning for NLP at 14th IJCAI'95.Google Scholar
J. Justeson and S. Katz. 1995. Technical terminology: some linguistic properties and an algorithm for identification in text. Natural Language Engineering(1), pp. 9--27.Google Scholar
L. Karttunen, J. Chanod, G. Grenfenstette, and A. Schiller. 1996. Regular Expressions for Language. Engineering. Natural Language Engineering, 4(1), pp. 305--328. Google ScholarDigital Library
A. Kinyon. 2001. A Language-Independent Shallow-Parser Compiler. In Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics, Toulouse, France. Google ScholarDigital Library
A. Kornai. 1999. Extended Finite State Models of Language, Cambridge University Press, Cambridge, UK Google ScholarDigital Library
M. Lapata, S. McDonald, and F. Keller. 1999. Determinants of Adjective-Noun Plausibility. In Proceedings of the 9th Conference of the European Chapter of the Association for Computational Linguistics, pp. 30--36. Google ScholarDigital Library
R. Mack, Y. Ravin, and R. Byrd. Knowledge portals and the emerging digital knowledge workplace. In IBM Systems Journal, vol. 40, no. 4. Google ScholarDigital Library
D. Maynard and S. Ananiadou. 1999. Term Extraction using a Similarity-based Approach, John Benjamins.Google Scholar
M. Marcus and M. Santorini. 1993. Building a large annotated corpus of English: the Penn Treebank. Computational Linguistics, 19(2), pp. 313--330. Google ScholarDigital Library
Y. Park and R. Byrd. 2001. Hybrid Text Mining for Matching Abbreviations and their Definitions. In Proceedings of Empirical Methods in Natural Language Processing, pp. 126--133.Google Scholar
Y. Ravin, N. Wacholder, and M. Choi. 1997. Disambiguation of proper names in text. 17th Annual ACM-SIGIR Conference.Google Scholar
T. Rindflesch, L. Tanabe, J. Weinstein, and L. Hunter. 2000. EDGAR: Extraction of Drugs, Genes and Relations from the Biomedical Literature. In Proceedings of the Pacific Symposium on Biocomputing.Google Scholar
P. Schone and D. Jurafsky, D. 2001. Is Knowledge-Free Induction of Multiword Unit Dictionary Headwords a Solved Problem?. In Proceedings of Empirical Methods in Natural Language Processing, pp. 100--108.Google Scholar

Recommendations

Automatic Word Embeddings-Based Glossary Term Extraction from Large-Sized Software Requirements
Requirements Engineering: Foundation for Software Quality
Abstract
[Context and Motivation] Requirements glossary defines specialized and technical terms used in a requirements document. A requirements glossary helps in improving the quality and understandability of requirements documents. [Question/Problem] ...
Read More
Automatic ontology extraction and applications
Read More
Computer Glossary: The Complete Illustrated Dictionary, Ninth Edition with Cdrom
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

COLING '02: Proceedings of the 19th international conference on Computational linguistics - Volume 1
August 2002
1184 pages
Sponsors
In-Cooperation
Publisher
Association for Computational Linguistics
United States
Publication History
- Published: 24 August 2002
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate1,537of1,537submissions,100%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 28
  Total Citations
  View Citations
- 2,986
  Total Downloads
- Downloads (Last 12 months)44
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Automatic glossary extraction: beyond terminology identification

COLING '02: Proceedings of the 19th international conference on Computational linguistics - Volume 1

ABSTRACT

References

Cited By

Recommendations

Automatic Word Embeddings-Based Glossary Term Extraction from Large-Sized Software Requirements

Automatic ontology extraction and applications

Computer Glossary: The Complete Illustrated Dictionary, Ninth Edition with Cdrom

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Automatic glossary extraction: beyond terminology identification

COLING '02: Proceedings of the 19th international conference on Computational linguistics - Volume 1

ABSTRACT

References

Cited By

Recommendations

Automatic Word Embeddings-Based Glossary Term Extraction from Large-Sized Software Requirements

Automatic ontology extraction and applications

Computer Glossary: The Complete Illustrated Dictionary, Ninth Edition with Cdrom

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media