skip to main content
10.3115/1075096.1075100dlproceedingsArticle/Chapter ViewAbstractPublication PagesaclConference Proceedingsconference-collections
Article
Free Access

Fast methods for kernel-based text analysis

Published:07 July 2003Publication History

ABSTRACT

Kernel-based learning (e.g., Support Vector Machines) has been successfully applied to many hard problems in Natural Language Processing (NLP). In NLP, although feature combinations are crucial to improving performance, they are heuristically selected. Kernel methods change this situation. The merit of the kernel methods is that effective feature combination is implicitly expanded without loss of generality and increasing the computational costs. Kernel-based text analysis shows an excellent performance in terms in accuracy; however, these methods are usually too slow to apply to large-scale text analysis. In this paper, we extend a Basket Mining algorithm to convert a kernel-based classifier into a simple and fast linear classifier. Experimental results on English BaseNP Chunking, Japanese Word Segmentation and Japanese Dependency Parsing show that our new classifiers are about 30 to 300 times faster than the standard kernel-based classifiers.

References

  1. Junichi Aoe. 1989. An efficient digital search algorithm by using a double-array structure. IEEE Transactions on Software Engineering, 15(9). Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Michael Collins and Nigel Duffy. 2001. Convolution kernels for natural language. In Advances in Neural Information Processing Systems 14, Vol.1 (NIPS 2001), pages 625--632.Google ScholarGoogle Scholar
  3. Hideki Isozaki and Hideto Kazawa. 2002. Efficient support vector classifiers for named entity recognition. In Proceedings of the COLING-2002, pages 390--396. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Hisashi Kashima and Teruo Koyanagi. 2002. Svm kernels for semi-structured data. In Proceedings of the ICML-2002, pages 291--298. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Taku Kudo and Yuji Matsumoto. 2000. Japanese Dependency Structure Analysis based on Support Vector Machines. In Proceedings of the EMNLP/VLC-2000, pages 18--25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Taku Kudo and Yuji Matsumoto. 2001. Chunking with support vector machines. In Proceedings of the the NAACL, pages 192--199. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Taku Kudo and Yuji Matsumoto. 2002. Japanese dependency analyisis using cascaded chunking. In Proceedings of the CoNLL-2002, pages 63--69. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Sadao Kurohashi and Makoto Nagao. 1997. Kyoto University text corpus project. In Proceedings of the ANLP-1997, pages 115--118.Google ScholarGoogle Scholar
  9. Huma Lodhi, Craig Saunders, John Shawe-Taylor, Nello Cristianini, and Chris Watkins. 2002. Text classification using string kernels. Journal of Machine Learning Research, 2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Tetsuji Nakagawa, Taku Kudo, and Yuji Matsumoto. 2002. Revision learning and its application to part-of-speech tagging. In Proceedings of the ACL 2002, pages 497--504. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Jian Pei, Jiawei Han, and et al. 2001. Prefixspan: Mining sequential patterns by prefix-projected growth. In Proc. of International Conference of Data Engineering, pages 215--224. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Lance A. Ramshaw and Mitchell P. Marcus. 1995. Text chunking using transformation-based learning. In Proceedings of the VLC, pages 88--94.Google ScholarGoogle Scholar
  13. Vladimir N. Vapnik. 1995. The Nature of Statistical Learning Theory. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Mohammed Zaki. 2002. Efficiently mining frequent trees in a forest. In Proceedings of the 8th International Conference on Knowledge Discovery and Data Mining KDD, pages 71--80. Google ScholarGoogle ScholarDigital LibraryDigital Library
  1. Fast methods for kernel-based text analysis

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image DL Hosted proceedings
            ACL '03: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
            July 2003
            571 pages

            Publisher

            Association for Computational Linguistics

            United States

            Publication History

            • Published: 7 July 2003

            Qualifiers

            • Article

            Acceptance Rates

            Overall Acceptance Rate85of443submissions,19%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader