skip to main content
10.5555/1690219.1690290dlproceedingsArticle/Chapter ViewAbstractPublication PagesaclConference Proceedingsconference-collections
research-article
Free Access

Phrase clustering for discriminative learning

Authors Info & Claims
Published:02 August 2009Publication History

ABSTRACT

We present a simple and scalable algorithm for clustering tens of millions of phrases and use the resulting clusters as features in discriminative classifiers. To demonstrate the power and generality of this approach, we apply the method in two very different applications: named entity recognition and query classification. Our results show that phrase clusters offer significant improvements over word clusters. Our NER system achieves the best current result on the widely used CoNLL benchmark. Our query classifier is on par with the best system in KDDCUP 2005 without resorting to labor intensive knowledge engineering efforts.

References

  1. R. Ando and T. Zhang A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data. Journal of Machine Learning Research, Vol 6:1817--1853, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. B. H. Bloom. 1970, Space/time trade-offs in hash coding with allowable errors, Communications of the ACM 13 (7): 422--426 Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. Blum and T. Mitchell. 1998. Combining labeled and unlabeled data with co-training. Proceedings of the Eleventh Annual Conference on Computational Learning Theory pp. 92--100. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. P. F. Brown, V. J. Della Pietra, P. V. de Souza, J. C. Lai, and R. L. Mercer. 1992. Class-based n-gram models of natural language. Computational Linguistics, 18(4):467--479. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. H. L. Chieu and H. T. Ng. Named entity recognition with a maximum entropy approach. In Proceedings CoNLL-2003, pages 160--163, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. Dean and S. Ghemawat. 2004. MapReduce: Simplified data processing on large clusters. In Proceedings of the Sixth Symposium on Operating System Design and Implementation (OSDI-04), San Francisco, CA, USA Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. A. Harshman. 1990. Indexing by latent semantic analysis, Journal of the American Society for Information Science, 1990, 41(6), 391--407Google ScholarGoogle ScholarCross RefCross Ref
  8. R. Florian, A. Ittycheriah, H. Jing, and T. Zhang. Named entity recognition through classifier combination. In Proceedings CoNLL-2003, pages 168--171, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. D. Klein, J. Smarr, H. Nguyen, and C. D. Manning. Named entity recognition with character-level models. In Proceedings CoNLL-2003, pages 188--191, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. P. Koehn, F. J. Och, and D. Marcu. 2003. Statistical phrase-based translation. In Proceedings of HLT-NAACL 2003, pp. 127--133. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. T. Koo, X. Carreras, and M. Collins. Simple Semi-supervised Dependency Parsing. Proceedings of ACL, 2008.Google ScholarGoogle Scholar
  12. J. Lafferty, A. McCallum, F. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proc. 18th International Conf. on Machine Learning, Morgan Kaufmann, San Francisco, CA (2001) 282--289 Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Y. Li, Z. Zheng, and H. K. Dai, KDD Cup-2005 Report: Facing a Great Challenge. SIGKDD Explorations, 7 (2), 2005, 91--99. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. D. Lin, S. Zhao, and B. Van Durme, and M. Pasca. 2008. Mining Parenthetical Translations from the Web by Word Alignment. Proc. of ACL-08. Columbus, OH.Google ScholarGoogle Scholar
  15. J. Lin. Scalable Language Processing Algorithms for the Masses: A Case Study in Computing Word Cooccurrence Matrices with MapReduce. Proceedings of EMNLP 2008, pp. 419--428, Honolulu, Hawaii. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. B. MacQueen (1967): Some Methods for classification and Analysis of Multivariate Observations, Proc. of 5-th Berkeley Symposium on Mathematical Statistics and Probability", Berkeley, University of California Press, 1:281--297Google ScholarGoogle Scholar
  17. S. Miller, J. Guinness, and A. Zamanian. 2004. Name Tagging with Word Clusters and Discriminative Training. In Proceedings of HLT-NAACL, pages 337--342.Google ScholarGoogle Scholar
  18. M. Sahami and T. D. Heilman. 2006. A web-based kernel function for measuring the similarity of short text snippets. Proceedings of the 15th international conference on World Wide Web, pp. 377--386. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. D. Shen, R. Pan, J. T. Sun, J. J. Pan, K. Wu, J. Yin, Q. Yang. Q2C@UST: our winning solution to query classification in KDDCUP 2005. SIGKDD Explorations, 2005: 100--110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. Suzuki, and H. Isozaki. 2008. Semi-Supervised Sequential Labeling and Segmentation using Giga-word Scale Unlabeled Data. In Proc. of ACL/HLT-08. Columbus, Ohio. pp. 665--673.Google ScholarGoogle Scholar
  21. E. T. Tjong Kim Sang and F. De Meulder. 2003. Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition. In Proc. of CoNLL-2003, pages 142--147. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Y. Wong and H. T. Ng, 2007. One Class per Named Entity: Exploiting Unlabeled Text for Named Entity Recognition. In Proc. of IJCAI-07, Hyderabad, India. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. J. Uszkoreit and T. Brants. 2008. Distributed Word Clustering for Large Scale Class-Based Language Modeling in Machine Translation. Proceedings of ACL-08: HLT, pp. 755--762.Google ScholarGoogle Scholar
  24. V. Vapnik, 1999. The Nature of Statistical Learning Theory, 2nd edition. Springer Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. D. Vogel, S. Bickel, P. Haider, R. Schimpfky, P. Siemen, S. Bridges, T. Scheffer. Classifying Search Engine Queries Using the Web as Background Knowledge. SIGKDD Explorations 7(2): 117--122. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Phrase clustering for discriminative learning

                Recommendations

                Comments

                Login options

                Check if you have access through your login credentials or your institution to get full access on this article.

                Sign in
                • Published in

                  cover image DL Hosted proceedings
                  ACL '09: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
                  August 2009
                  595 pages
                  ISBN:9781932432466
                  • General Chair:
                  • Keh-Yih Su

                  Publisher

                  Association for Computational Linguistics

                  United States

                  Publication History

                  • Published: 2 August 2009

                  Qualifiers

                  • research-article

                  Acceptance Rates

                  Overall Acceptance Rate85of443submissions,19%

                PDF Format

                View or Download as a PDF file.

                PDF

                eReader

                View online with eReader.

                eReader