skip to main content
10.3115/1220355.1220477dlproceedingsArticle/Chapter ViewAbstractPublication PagescolingConference Proceedingsconference-collections
Article
Free Access

Named entity discovery using comparable news articles

Published:23 August 2004Publication History

ABSTRACT

In this paper we describe a way to discover Named Entities by using the distribution of words in news articles. Named Entity recognition is an important task for today's natural language applications, but it still suffers from data sparseness. We used an observation that a Named Entity is likely to appear synchronously in several news articles, whereas a common noun is less likely. Exploiting this characteristic, we successfully obtained rare Named Entities with 90% accuracy just by comparing time series distributions of a word in two newspapers. Although the achieved recall is not sufficient yet, we believe that this method can be used to strengthen the lexical knowledge of a Named Entity tagger.

References

  1. Regina Barzilay and Kathleen R. McKeown. 2001. Extracting paraphrases from a parallel corpus. In Proceedings of ACL/EACL 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Michael Collins and Yoram Singer. 1999. Unsupervised models for named entity classification. In Proceedings of EMNLP 1999.Google ScholarGoogle Scholar
  3. Satoshi Sekine and Hitoshi Isahara. 2000. IREX: IR and IE evaluation-based project in Japanese. In Proceedings of LREC 2000.Google ScholarGoogle Scholar
  4. Satoshi Sekine, Kiyoshi Sudo, and Chikashi No-bata. 2002. Extended named entity hierarchy. In Proceedings of LREC 2002.Google ScholarGoogle Scholar
  5. Yusuke Shinyama and Satoshi Sekine. 2003. Paraphrase acquisition for information extraction. In Proceedings of International Workshop on Paraphrasing 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Tomek Strzalkowski and Jin Wang. 1996. A self-learning universal concept spotter. In Proceedings of COLING 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Roman Yangarber, Winston Lin, and Ralph Grish-man. 2002. Unsupervised learning of generalized names. In Proceedings of COLING 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  1. Named entity discovery using comparable news articles

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image DL Hosted proceedings
        COLING '04: Proceedings of the 20th international conference on Computational Linguistics
        August 2004
        1411 pages

        Publisher

        Association for Computational Linguistics

        United States

        Publication History

        • Published: 23 August 2004

        Qualifiers

        • Article

        Acceptance Rates

        COLING '04 Paper Acceptance Rate1,411of1,411submissions,100%Overall Acceptance Rate1,537of1,537submissions,100%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader