Article

Free Access

Fast methods for kernel-based text analysis

Authors:
Taku Kudo

Nara Institute of Science and Technology

Nara Institute of Science and Technology
View Profile

,
Yuji Matsumoto

Nara Institute of Science and Technology

Nara Institute of Science and Technology
View Profile

ACL '03: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1July 2003Pages 24–31https://doi.org/10.3115/1075096.1075100

Published:07 July 2003Publication History

ACL '03: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1

Pages 24–31

ABSTRACT

Kernel-based learning (e.g., Support Vector Machines) has been successfully applied to many hard problems in Natural Language Processing (NLP). In NLP, although feature combinations are crucial to improving performance, they are heuristically selected. Kernel methods change this situation. The merit of the kernel methods is that effective feature combination is implicitly expanded without loss of generality and increasing the computational costs. Kernel-based text analysis shows an excellent performance in terms in accuracy; however, these methods are usually too slow to apply to large-scale text analysis. In this paper, we extend a Basket Mining algorithm to convert a kernel-based classifier into a simple and fast linear classifier. Experimental results on English BaseNP Chunking, Japanese Word Segmentation and Japanese Dependency Parsing show that our new classifiers are about 30 to 300 times faster than the standard kernel-based classifiers.

References

Junichi Aoe. 1989. An efficient digital search algorithm by using a double-array structure. IEEE Transactions on Software Engineering, 15(9). Google ScholarDigital Library
Michael Collins and Nigel Duffy. 2001. Convolution kernels for natural language. In Advances in Neural Information Processing Systems 14, Vol.1 (NIPS 2001), pages 625--632.Google Scholar
Hideki Isozaki and Hideto Kazawa. 2002. Efficient support vector classifiers for named entity recognition. In Proceedings of the COLING-2002, pages 390--396. Google ScholarDigital Library
Hisashi Kashima and Teruo Koyanagi. 2002. Svm kernels for semi-structured data. In Proceedings of the ICML-2002, pages 291--298. Google ScholarDigital Library
Taku Kudo and Yuji Matsumoto. 2000. Japanese Dependency Structure Analysis based on Support Vector Machines. In Proceedings of the EMNLP/VLC-2000, pages 18--25. Google ScholarDigital Library
Taku Kudo and Yuji Matsumoto. 2001. Chunking with support vector machines. In Proceedings of the the NAACL, pages 192--199. Google ScholarDigital Library
Taku Kudo and Yuji Matsumoto. 2002. Japanese dependency analyisis using cascaded chunking. In Proceedings of the CoNLL-2002, pages 63--69. Google ScholarDigital Library
Sadao Kurohashi and Makoto Nagao. 1997. Kyoto University text corpus project. In Proceedings of the ANLP-1997, pages 115--118.Google Scholar
Huma Lodhi, Craig Saunders, John Shawe-Taylor, Nello Cristianini, and Chris Watkins. 2002. Text classification using string kernels. Journal of Machine Learning Research, 2. Google ScholarDigital Library
Tetsuji Nakagawa, Taku Kudo, and Yuji Matsumoto. 2002. Revision learning and its application to part-of-speech tagging. In Proceedings of the ACL 2002, pages 497--504. Google ScholarDigital Library
Jian Pei, Jiawei Han, and et al. 2001. Prefixspan: Mining sequential patterns by prefix-projected growth. In Proc. of International Conference of Data Engineering, pages 215--224. Google ScholarDigital Library
Lance A. Ramshaw and Mitchell P. Marcus. 1995. Text chunking using transformation-based learning. In Proceedings of the VLC, pages 88--94.Google Scholar
Vladimir N. Vapnik. 1995. The Nature of Statistical Learning Theory. Springer. Google ScholarDigital Library
Mohammed Zaki. 2002. Efficiently mining frequent trees in a forest. In Proceedings of the 8th International Conference on Knowledge Discovery and Data Mining KDD, pages 71--80. Google ScholarDigital Library

Fast methods for kernel-based text analysis
1. Computing methodologies
  1. Artificial intelligence
  2. Machine learning
    1. Learning paradigms
      1. Supervised learning
    2. Machine learning approaches
2. Hardware
  1. Power and energy
    1. Power estimation and optimization

Recommendations

Kernel methods for word sense disambiguation

Many applications of natural language processing (NLP) need an accurate resolution of various ambiguities existing in natural language. The task of fulfilling this need is also called word sense disambiguation (WSD). WSD is to resolve the correct sense ...
Read More
Fast kernel Fisher discriminant analysis via approximating the kernel principal component analysis

Kernel Fisher discriminant analysis (KFDA) extracts a nonlinear feature from a sample by calculating as many kernel functions as the training samples. Thus, its computational efficiency is inversely proportional to the size of the training sample set. ...
Read More
A knowledge-based semantic Kernel for text classification
SPIRE'11: Proceedings of the 18th international conference on String processing and information retrieval

Typically, in textual document classification the documents are represented in the vector space using the "Bag of Words" (BOW) approach. Despite its ease of use, BOW representation cannot handle word synonymy and polysemy problems and does not consider ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ACL '03: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
July 2003
571 pages
Program Chairs:
Erhard W. Hinrichs,
Dan Roth
Sponsors
In-Cooperation
Publisher
Association for Computational Linguistics
United States
Publication History
- Published: 7 July 2003
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate85of443submissions,19%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 69
  Total Citations
  View Citations
- 824
  Total Downloads
- Downloads (Last 12 months)33
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Fast methods for kernel-based text analysis

ACL '03: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1

ABSTRACT

References

Cited By

Recommendations

Kernel methods for word sense disambiguation

Fast kernel Fisher discriminant analysis via approximating the kernel principal component analysis

A knowledge-based semantic Kernel for text classification

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Fast methods for kernel-based text analysis

ACL '03: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1

ABSTRACT

References

Cited By

Recommendations

Kernel methods for word sense disambiguation

Fast kernel Fisher discriminant analysis via approximating the kernel principal component analysis

A knowledge-based semantic Kernel for text classification

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media