Article

Free Access

Collocation map for overcoming data sparseness

Authors:
Moonjoo Kim

Korea Advanced Institute of Science and Technology, Taejon, Korea

Korea Advanced Institute of Science and Technology, Taejon, Korea
View Profile

,
Young S. Han

Korea Advanced Institute of Science and Technology, Taejon, Korea

Korea Advanced Institute of Science and Technology, Taejon, Korea
View Profile

,
Key-Sun Choi

Korea Advanced Institute of Science and Technology, Taejon, Korea

Korea Advanced Institute of Science and Technology, Taejon, Korea
View Profile

EACL '95: Proceedings of the seventh conference on European chapter of the Association for Computational LinguisticsMarch 1995Pages 53–59https://doi.org/10.3115/976973.976982

Published:27 March 1995Publication History

EACL '95: Proceedings of the seventh conference on European chapter of the Association for Computational Linguistics

Pages 53–59

ABSTRACT

Statistical language models are useful because they can provide probabilistic information upon uncertain decision making. The most common statistic is n-grams measuring word cooccurrences in texts. The method suffers from data shortage problem, however. In this paper, we suggest Bayesian networks be used in approximating the statistics of insufficient occurrences and of those that do not occur in the sample texts with graceful degradation. Collocation map is a sigmoid belief network that can be constructed from bigrams. We compared the conditional probabilities and mutual information computed from bigrams and Collocation map. The results show that the variance of the values from Collocation map is smaller than that from frequency measure for the infrequent pairs by 48%. The predictive power of Collocation map for arbitrary associations not observed from sample texts is also demonstrated.

References

Kenneth W. Church, and William A. Gale. 1991. A comparison of the enhanced Good-Turning and deleted estimation methods for estimating probabilities of English bigrams. Computer Speech and Language. 5. 19--54.Google Scholar
Ted Dunning. 1993. Accurate methods for the statistics of surprise and coincidence. Computational Linguistics. 19 (1). 61--74. Google ScholarDigital Library
Ido Dagan, Shaul Marcus, and Shaul Markovitch. 1992. Contextual word similarity and estimation from sparse data. In Proceedings of AAAI fall symposium, Cambridge, MI. 164--171. Google ScholarDigital Library
Young S. Han, Young G. Han, and Key-sun Choi. 1992. Recursive Markov chain as a stochastic grammar. In Proceedings of a SIGLEX workshop, Columbus, Ohio. 22--31.Google Scholar
Young S. Han, Young C. Park, and Key-sun Choi. 1995. Efficient inferencing for sigmoid Bayesian networks. to appear in Applied Intelligence.Google Scholar
Radford M. Neal. 1992. Connectionist learning of belief networks. J of Artificial Intelligence. 56. 71--113. Google ScholarDigital Library
Judea Pearl. 1988. Probabilistic Reasoning in Intelligent Systems. Morgan Kaufmann Publishers. Google ScholarDigital Library
Fernando Pereira, Naftali Tishby, and Lillian Lee. 1993. Distributional clustering of English words. In Proceedings of the Annual Meeting of the ACL. Google ScholarDigital Library

Collocation map for overcoming data sparseness
1. Hardware
  1. Power and energy
    1. Power estimation and optimization
2. Mathematics of computing

Recommendations

Collocation extraction using monolingual word alignment method
EMNLP '09: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2

Statistical bilingual word alignment has been well studied in the context of machine translation. This paper adapts the bilingual word alignment algorithm to monolingual scenario to extract collocations from monolingual corpus. The monolingual corpus is ...
Read More
Two-Word Collocation Extraction Using Monolingual Word Alignment Method

Statistical bilingual word alignment has been well studied in the field of machine translation. This article adapts the bilingual word alignment algorithm into a monolingual scenario to extract collocations from monolingual corpus, based on the fact ...
Read More
Synonymous collocation extraction using translation information
ACL '03: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1

Automatically acquiring synonymous collocation pairs such as <turn on, OBJ, light> and <switch on, OBJ, light> from corpora is a challenging task. For this task, we can, in general, have a large monolingual corpus and/or a very limited bilingual corpus. ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
EACL '95: Proceedings of the seventh conference on European chapter of the Association for Computational Linguistics
March 1995
322 pages
Conference Chairs:
Steven P. Abney,
Erhard W. Hinrichs
Sponsors
In-Cooperation
Publisher
Morgan Kaufmann Publishers Inc.
San Francisco, CA, United States
Publication History
- Published: 27 March 1995
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate100of360submissions,28%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 150
  Total Downloads
- Downloads (Last 12 months)10
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Collocation map for overcoming data sparseness

EACL '95: Proceedings of the seventh conference on European chapter of the Association for Computational Linguistics

ABSTRACT

References

Cited By

Recommendations

Collocation extraction using monolingual word alignment method

Two-Word Collocation Extraction Using Monolingual Word Alignment Method

Synonymous collocation extraction using translation information

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Collocation map for overcoming data sparseness

EACL '95: Proceedings of the seventh conference on European chapter of the Association for Computational Linguistics

ABSTRACT

References

Cited By

Recommendations

Collocation extraction using monolingual word alignment method

Two-Word Collocation Extraction Using Monolingual Word Alignment Method

Synonymous collocation extraction using translation information

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media