ABSTRACT
We present a simple and scalable algorithm for clustering tens of millions of phrases and use the resulting clusters as features in discriminative classifiers. To demonstrate the power and generality of this approach, we apply the method in two very different applications: named entity recognition and query classification. Our results show that phrase clusters offer significant improvements over word clusters. Our NER system achieves the best current result on the widely used CoNLL benchmark. Our query classifier is on par with the best system in KDDCUP 2005 without resorting to labor intensive knowledge engineering efforts.
- R. Ando and T. Zhang A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data. Journal of Machine Learning Research, Vol 6:1817--1853, 2005. Google ScholarDigital Library
- B. H. Bloom. 1970, Space/time trade-offs in hash coding with allowable errors, Communications of the ACM 13 (7): 422--426 Google ScholarDigital Library
- A. Blum and T. Mitchell. 1998. Combining labeled and unlabeled data with co-training. Proceedings of the Eleventh Annual Conference on Computational Learning Theory pp. 92--100. Google ScholarDigital Library
- P. F. Brown, V. J. Della Pietra, P. V. de Souza, J. C. Lai, and R. L. Mercer. 1992. Class-based n-gram models of natural language. Computational Linguistics, 18(4):467--479. Google ScholarDigital Library
- H. L. Chieu and H. T. Ng. Named entity recognition with a maximum entropy approach. In Proceedings CoNLL-2003, pages 160--163, 2003. Google ScholarDigital Library
- J. Dean and S. Ghemawat. 2004. MapReduce: Simplified data processing on large clusters. In Proceedings of the Sixth Symposium on Operating System Design and Implementation (OSDI-04), San Francisco, CA, USA Google ScholarDigital Library
- S Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. A. Harshman. 1990. Indexing by latent semantic analysis, Journal of the American Society for Information Science, 1990, 41(6), 391--407Google ScholarCross Ref
- R. Florian, A. Ittycheriah, H. Jing, and T. Zhang. Named entity recognition through classifier combination. In Proceedings CoNLL-2003, pages 168--171, 2003. Google ScholarDigital Library
- D. Klein, J. Smarr, H. Nguyen, and C. D. Manning. Named entity recognition with character-level models. In Proceedings CoNLL-2003, pages 188--191, 2003. Google ScholarDigital Library
- P. Koehn, F. J. Och, and D. Marcu. 2003. Statistical phrase-based translation. In Proceedings of HLT-NAACL 2003, pp. 127--133. Google ScholarDigital Library
- T. Koo, X. Carreras, and M. Collins. Simple Semi-supervised Dependency Parsing. Proceedings of ACL, 2008.Google Scholar
- J. Lafferty, A. McCallum, F. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proc. 18th International Conf. on Machine Learning, Morgan Kaufmann, San Francisco, CA (2001) 282--289 Google ScholarDigital Library
- Y. Li, Z. Zheng, and H. K. Dai, KDD Cup-2005 Report: Facing a Great Challenge. SIGKDD Explorations, 7 (2), 2005, 91--99. Google ScholarDigital Library
- D. Lin, S. Zhao, and B. Van Durme, and M. Pasca. 2008. Mining Parenthetical Translations from the Web by Word Alignment. Proc. of ACL-08. Columbus, OH.Google Scholar
- J. Lin. Scalable Language Processing Algorithms for the Masses: A Case Study in Computing Word Cooccurrence Matrices with MapReduce. Proceedings of EMNLP 2008, pp. 419--428, Honolulu, Hawaii. Google ScholarDigital Library
- J. B. MacQueen (1967): Some Methods for classification and Analysis of Multivariate Observations, Proc. of 5-th Berkeley Symposium on Mathematical Statistics and Probability", Berkeley, University of California Press, 1:281--297Google Scholar
- S. Miller, J. Guinness, and A. Zamanian. 2004. Name Tagging with Word Clusters and Discriminative Training. In Proceedings of HLT-NAACL, pages 337--342.Google Scholar
- M. Sahami and T. D. Heilman. 2006. A web-based kernel function for measuring the similarity of short text snippets. Proceedings of the 15th international conference on World Wide Web, pp. 377--386. Google ScholarDigital Library
- D. Shen, R. Pan, J. T. Sun, J. J. Pan, K. Wu, J. Yin, Q. Yang. Q2C@UST: our winning solution to query classification in KDDCUP 2005. SIGKDD Explorations, 2005: 100--110. Google ScholarDigital Library
- J. Suzuki, and H. Isozaki. 2008. Semi-Supervised Sequential Labeling and Segmentation using Giga-word Scale Unlabeled Data. In Proc. of ACL/HLT-08. Columbus, Ohio. pp. 665--673.Google Scholar
- E. T. Tjong Kim Sang and F. De Meulder. 2003. Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition. In Proc. of CoNLL-2003, pages 142--147. Google ScholarDigital Library
- Y. Wong and H. T. Ng, 2007. One Class per Named Entity: Exploiting Unlabeled Text for Named Entity Recognition. In Proc. of IJCAI-07, Hyderabad, India. Google ScholarDigital Library
- J. Uszkoreit and T. Brants. 2008. Distributed Word Clustering for Large Scale Class-Based Language Modeling in Machine Translation. Proceedings of ACL-08: HLT, pp. 755--762.Google Scholar
- V. Vapnik, 1999. The Nature of Statistical Learning Theory, 2nd edition. Springer Verlag. Google ScholarDigital Library
- D. Vogel, S. Bickel, P. Haider, R. Schimpfky, P. Siemen, S. Bridges, T. Scheffer. Classifying Search Engine Queries Using the Web as Background Knowledge. SIGKDD Explorations 7(2): 117--122. 2005. Google ScholarDigital Library
Index Terms
- Phrase clustering for discriminative learning
Recommendations
Discriminative clustering via extreme learning machine
Discriminative clustering is an unsupervised learning framework which introduces the discriminative learning rule of supervised classification into clustering. The underlying assumption is that a good partition (clustering) of the data should yield high ...
Incorporating Word Clustering into Complex Noun Phrase Identification
Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big DataAbstractSince the professional technical literature include amounts of complex noun phrases, identifying those phrases has an important practical value for such tasks as machine translation. Through analysis of those phrases in Chinese-English bilingual ...
Key phrase extraction: a hybrid assignment and extraction approach
iiWAS '09: Proceedings of the 11th International Conference on Information Integration and Web-based Applications & ServicesAutomatic key phrase extraction is fundamental to the success of many recent digital library applications and semantic information retrieval techniques and a difficult and essential problem in Vietnamese natural language processing (NLP). In this work, ...
Comments