ABSTRACT
We describe a novel simple and highly scalable semi-supervised method called Word-Class Distribution Learning (WCDL), and apply it task of information extraction (IE) by utilizing unlabeled sentences to improve supervised classification methods. WCDL iteratively builds class label distributions for each word in the dictionary by averaging predicted labels over all cases in the unlabeled corpus, and re-training a base classifier adding these distributions as word features. In contrast, traditional self-training or co-training methods self-labeled examples (rather than features) which can degrade performance due to incestuous learning bias. WCDL exhibits robust behavior, and has no difficult parameters to tune. We applied our method on German and English name entity recognition (NER) tasks. WCDL shows improvements over self-training, multi-task semi-supervision or supervision alone, in particular yielding a state-of-the art 75.72 F1 score on the German NER task.
- R. K. Ando and T. Zhang. A framework for learning predictive structures from multiple tasks and unlabeled data. Journal of Machine Learning Research, 6:1817--1853, 2005. Google ScholarDigital Library
- A. Blum and T. Mitchell. Combining labeled and unlabeled data with co-training. In COLT' 98: Proceedings of the eleventh annual conference on Computational learning theory, pages 92--100, New York, NY, USA, 1998. ACM. Google ScholarDigital Library
- O. Chapelle, B. Schölkopf, and A. Zien, editors. Semi-Supervised Learning (Adaptive Computation and Machine Learning). MIT Press, 2006. Google ScholarDigital Library
- M. Collins and Y. Singer. Unsupervised models for named entity classification. In Proceedings of the Joint SIGDAT Conference on EMNLP, pages 100--110, 1999.Google Scholar
- R. Collobert and J. Weston. A unified architecture for nlp: deep neural networks with multitask learning. In ICML '08: Proceedings of the 25th international conference on Machine learning, pages 160--167, 2008. Google ScholarDigital Library
- H. Daumé III. Cross-task knowledge-constrained self training. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, pages 680--688, Honolulu, Hawaii, October 2008. ACL. Google ScholarDigital Library
- G. Druck, G. Mann, and A. McCallum. Learning from labeled features using generalized expectation criteria. In SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, pages 595--602. ACM, 2008. Google ScholarDigital Library
- Erik and F. De Meulder. Introduction to the conll-2003 shared task: language-independent named entity recognition. In Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003, pages 142--147, 2003. Google ScholarDigital Library
- R. Florian, A. Ittycheriah, H. Jing, and T. Zhang. Named entity recognition through classifier combination. In W. Daelemans and M. Osborne, editors, Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003, pages 168--171, 2003. Google ScholarDigital Library
- Y. Grandvalet and Y. Bengio. Semi-supervised learning by entropy minimization. In L. K. Saul, Y. Weiss, and L. Bottou, editors, Advances in Neural Information Processing Systems 17, pages 529--536, Cambridge, MA, 2005. MIT Press.Google Scholar
- T. Joachims. Transductive inference for text classification using support vector machines. In ICML '99: Proceedings of the Sixteenth International Conference on Machine Learning, pages 200--209, San Francisco, CA, USA, 1999. Morgan Kaufmann Publishers Inc. Google ScholarDigital Library
- R. J. Kate and R. J. Mooney. Semi-supervised learning for semantic parsing using support vector machines. In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, Short Papers (NAACL/HLT-2007), pages 81--84, 2007. Google ScholarDigital Library
- Z. Kozareva, B. Bonev, and A. Montoyo. Self-training and co-training applied to spanish named entity recognition. In MICAI 2005: Advances in Artificial Intel ligence, 2005. Google ScholarDigital Library
- D. McClosky, E. Charniak, and M. Johnson. Effective self-training for parsing. In Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, pages 152--159, 2006. Google ScholarDigital Library
- D. Nadeau and S. Sekine. A survey of named entity recognition and classfication. Linguisticae Investigationes, 30(1):3--26, 2007.Google ScholarCross Ref
- K. Nigam, A. K. McCallum, S. Thrun, and T. Mitchell. Text classification from labeled and unlabeled documents using em. volume 39, pages 103--134, Hingham, MA, USA, 2000. Kluwer Academic Publishers. Google ScholarDigital Library
- R. E. Schapire, M. Rochery, M. G. Rahim, and N. Gupta. Incorporating prior knowledge into boosting. In ICML '02: Proceedings of the Nineteenth International Conference on Machine Learning, pages 538--545, San Francisco, CA, USA, 2002. Morgan Kaufmann Publishers Inc. Google ScholarDigital Library
- H. Scudder. Probability of error of some adaptive pattern-recognition machines. IEEE Transactions on Information Theory, 11(3):363--371, 1965.Google ScholarDigital Library
- H. Shan and D. Gildea. Self-training and co-training for semantic role labeling: Primary report. Technical Report TR891, University of Rochester, Comp. Sci. Dept., 2006.Google Scholar
- X. Wu and R. Srihari. Incorporating prior knowledge with weighted margin support vector machines. In KDD '04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 326--333, New York, NY, USA, 2004. ACM. Google ScholarDigital Library
- T. Zhang, F. Damerau, and D. Johnson. Text chunking using regularized winnow. In ACL '01: Proceedings of the 39th Annual Meeting on Association for Computational Linguistics, pages 539--546, Morristown, NJ, USA, 2001. Association for Computational Linguistics. Google ScholarDigital Library
- X. Zhu, Z. Ghahramani, and J. Laerty. Semi-supervised learning using gaussian fields and harmonic functions. In ICML'03: Proceedings of the 20th International Conference on Machine Learning, pages 912--919, 2003.Google Scholar
Index Terms
- Combining labeled and unlabeled data with word-class distribution learning
Recommendations
Learning Instance Weighted Naive Bayes from labeled and unlabeled data
In real-world data mining applications, it is often the case that unlabeled instances are abundant, while available labeled instances are very limited. Thus, semi-supervised learning, which attempts to benefit from large amount of unlabeled data ...
Semi-Supervised Sequence Labeling with Self-Learned Features
ICDM '09: Proceedings of the 2009 Ninth IEEE International Conference on Data MiningTypical information extraction (IE) systems can be seen as tasks assigning labels to words in a natural language sequence. The performance is restricted by the availability of labeled words. To tackle this issue, we propose a semi-supervised approach to ...
Training object detectors from few weakly-labeled and many unlabeled images
Highlights- A novel method to train detector by few weakly-labeled images and lots of unlabeled images.
AbstractWeakly-supervised object detection attempts to limit the amount of supervision by dispensing the need for bounding boxes, but still assumes image-level labels on the entire training set. In this work, we study the problem of training ...
Comments