skip to main content
10.1145/1645953.1646218acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
poster

Combining labeled and unlabeled data with word-class distribution learning

Published:02 November 2009Publication History

ABSTRACT

We describe a novel simple and highly scalable semi-supervised method called Word-Class Distribution Learning (WCDL), and apply it task of information extraction (IE) by utilizing unlabeled sentences to improve supervised classification methods. WCDL iteratively builds class label distributions for each word in the dictionary by averaging predicted labels over all cases in the unlabeled corpus, and re-training a base classifier adding these distributions as word features. In contrast, traditional self-training or co-training methods self-labeled examples (rather than features) which can degrade performance due to incestuous learning bias. WCDL exhibits robust behavior, and has no difficult parameters to tune. We applied our method on German and English name entity recognition (NER) tasks. WCDL shows improvements over self-training, multi-task semi-supervision or supervision alone, in particular yielding a state-of-the art 75.72 F1 score on the German NER task.

References

  1. R. K. Ando and T. Zhang. A framework for learning predictive structures from multiple tasks and unlabeled data. Journal of Machine Learning Research, 6:1817--1853, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Blum and T. Mitchell. Combining labeled and unlabeled data with co-training. In COLT' 98: Proceedings of the eleventh annual conference on Computational learning theory, pages 92--100, New York, NY, USA, 1998. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. O. Chapelle, B. Schölkopf, and A. Zien, editors. Semi-Supervised Learning (Adaptive Computation and Machine Learning). MIT Press, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. Collins and Y. Singer. Unsupervised models for named entity classification. In Proceedings of the Joint SIGDAT Conference on EMNLP, pages 100--110, 1999.Google ScholarGoogle Scholar
  5. R. Collobert and J. Weston. A unified architecture for nlp: deep neural networks with multitask learning. In ICML '08: Proceedings of the 25th international conference on Machine learning, pages 160--167, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. H. Daumé III. Cross-task knowledge-constrained self training. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, pages 680--688, Honolulu, Hawaii, October 2008. ACL. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. G. Druck, G. Mann, and A. McCallum. Learning from labeled features using generalized expectation criteria. In SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, pages 595--602. ACM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Erik and F. De Meulder. Introduction to the conll-2003 shared task: language-independent named entity recognition. In Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003, pages 142--147, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. R. Florian, A. Ittycheriah, H. Jing, and T. Zhang. Named entity recognition through classifier combination. In W. Daelemans and M. Osborne, editors, Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003, pages 168--171, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Y. Grandvalet and Y. Bengio. Semi-supervised learning by entropy minimization. In L. K. Saul, Y. Weiss, and L. Bottou, editors, Advances in Neural Information Processing Systems 17, pages 529--536, Cambridge, MA, 2005. MIT Press.Google ScholarGoogle Scholar
  11. T. Joachims. Transductive inference for text classification using support vector machines. In ICML '99: Proceedings of the Sixteenth International Conference on Machine Learning, pages 200--209, San Francisco, CA, USA, 1999. Morgan Kaufmann Publishers Inc. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. R. J. Kate and R. J. Mooney. Semi-supervised learning for semantic parsing using support vector machines. In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, Short Papers (NAACL/HLT-2007), pages 81--84, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Z. Kozareva, B. Bonev, and A. Montoyo. Self-training and co-training applied to spanish named entity recognition. In MICAI 2005: Advances in Artificial Intel ligence, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. D. McClosky, E. Charniak, and M. Johnson. Effective self-training for parsing. In Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, pages 152--159, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. D. Nadeau and S. Sekine. A survey of named entity recognition and classfication. Linguisticae Investigationes, 30(1):3--26, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  16. K. Nigam, A. K. McCallum, S. Thrun, and T. Mitchell. Text classification from labeled and unlabeled documents using em. volume 39, pages 103--134, Hingham, MA, USA, 2000. Kluwer Academic Publishers. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. R. E. Schapire, M. Rochery, M. G. Rahim, and N. Gupta. Incorporating prior knowledge into boosting. In ICML '02: Proceedings of the Nineteenth International Conference on Machine Learning, pages 538--545, San Francisco, CA, USA, 2002. Morgan Kaufmann Publishers Inc. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. H. Scudder. Probability of error of some adaptive pattern-recognition machines. IEEE Transactions on Information Theory, 11(3):363--371, 1965.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. H. Shan and D. Gildea. Self-training and co-training for semantic role labeling: Primary report. Technical Report TR891, University of Rochester, Comp. Sci. Dept., 2006.Google ScholarGoogle Scholar
  20. X. Wu and R. Srihari. Incorporating prior knowledge with weighted margin support vector machines. In KDD '04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 326--333, New York, NY, USA, 2004. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. T. Zhang, F. Damerau, and D. Johnson. Text chunking using regularized winnow. In ACL '01: Proceedings of the 39th Annual Meeting on Association for Computational Linguistics, pages 539--546, Morristown, NJ, USA, 2001. Association for Computational Linguistics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. X. Zhu, Z. Ghahramani, and J. Laerty. Semi-supervised learning using gaussian fields and harmonic functions. In ICML'03: Proceedings of the 20th International Conference on Machine Learning, pages 912--919, 2003.Google ScholarGoogle Scholar

Index Terms

  1. Combining labeled and unlabeled data with word-class distribution learning

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge management
      November 2009
      2162 pages
      ISBN:9781605585123
      DOI:10.1145/1645953

      Copyright © 2009 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 2 November 2009

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • poster

      Acceptance Rates

      Overall Acceptance Rate1,861of8,427submissions,22%

      Upcoming Conference

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader