skip to main content
10.3115/976973.976982dlproceedingsArticle/Chapter ViewAbstractPublication PageseaclConference Proceedingsconference-collections
Article
Free Access

Collocation map for overcoming data sparseness

Authors Info & Claims
Published:27 March 1995Publication History

ABSTRACT

Statistical language models are useful because they can provide probabilistic information upon uncertain decision making. The most common statistic is n-grams measuring word cooccurrences in texts. The method suffers from data shortage problem, however. In this paper, we suggest Bayesian networks be used in approximating the statistics of insufficient occurrences and of those that do not occur in the sample texts with graceful degradation. Collocation map is a sigmoid belief network that can be constructed from bigrams. We compared the conditional probabilities and mutual information computed from bigrams and Collocation map. The results show that the variance of the values from Collocation map is smaller than that from frequency measure for the infrequent pairs by 48%. The predictive power of Collocation map for arbitrary associations not observed from sample texts is also demonstrated.

References

  1. Kenneth W. Church, and William A. Gale. 1991. A comparison of the enhanced Good-Turning and deleted estimation methods for estimating probabilities of English bigrams. Computer Speech and Language. 5. 19--54.Google ScholarGoogle Scholar
  2. Ted Dunning. 1993. Accurate methods for the statistics of surprise and coincidence. Computational Linguistics. 19 (1). 61--74. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Ido Dagan, Shaul Marcus, and Shaul Markovitch. 1992. Contextual word similarity and estimation from sparse data. In Proceedings of AAAI fall symposium, Cambridge, MI. 164--171. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Young S. Han, Young G. Han, and Key-sun Choi. 1992. Recursive Markov chain as a stochastic grammar. In Proceedings of a SIGLEX workshop, Columbus, Ohio. 22--31.Google ScholarGoogle Scholar
  5. Young S. Han, Young C. Park, and Key-sun Choi. 1995. Efficient inferencing for sigmoid Bayesian networks. to appear in Applied Intelligence.Google ScholarGoogle Scholar
  6. Radford M. Neal. 1992. Connectionist learning of belief networks. J of Artificial Intelligence. 56. 71--113. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Judea Pearl. 1988. Probabilistic Reasoning in Intelligent Systems. Morgan Kaufmann Publishers. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Fernando Pereira, Naftali Tishby, and Lillian Lee. 1993. Distributional clustering of English words. In Proceedings of the Annual Meeting of the ACL. Google ScholarGoogle ScholarDigital LibraryDigital Library
  1. Collocation map for overcoming data sparseness

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image DL Hosted proceedings
        EACL '95: Proceedings of the seventh conference on European chapter of the Association for Computational Linguistics
        March 1995
        322 pages

        Publisher

        Morgan Kaufmann Publishers Inc.

        San Francisco, CA, United States

        Publication History

        • Published: 27 March 1995

        Qualifiers

        • Article

        Acceptance Rates

        Overall Acceptance Rate100of360submissions,28%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader