Skip to main content
Top

2016 | OriginalPaper | Chapter

Large-Scale Kernel-Based Language Learning Through the Ensemble Nystr\(\ddot{o}\)m Methods

Authors : Danilo Croce, Roberto Basili

Published in: Advances in Information Retrieval

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Kernel methods have been used by many Machine Learning paradigms, achieving state-of-the-art performances in many Language Learning tasks. One drawback of expressive kernel functions, such as Sequence or Tree kernels, is the time and space complexity required both in learning and classification. In this paper, the Nystr\(\ddot{o}\)m methodology is studied as a viable solution to face these scalability issues. By mapping data in low-dimensional spaces as kernel space approximations, the proposed methodology positively impacts on scalability through compact linear representation of highly structured data. Computation can be also distributed on several machines by adopting the so-called Ensemble Nystr\(\ddot{o}\)m Method. Experimental results show that an accuracy comparable with state-of-the-art kernel-based methods can be obtained by reducing of orders of magnitude the required operations and enabling the adoption of datasets containing more than one million examples.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
We are referring to the PA-II version in [5].
 
2
In this work we will consider the hinge loss \(H(\varvec{w}; (\varvec{\tilde{x}}_t, y_t))=max(0,1-y_t \varvec{w}^\top \varvec{\tilde{x}}_t)\).
 
5
C-SVM [3] proposes a caching policy, here ignored for comparative purposes. Large-scale applications may impose prohibitive requirements on the required space.
 
6
Only sentences whose lexical unit corresponds to a verb are adopted in our tests.
 
Literature
1.
go back to reference Baker, C.F., Fillmore, C.J., Lowe, J.B.: The Berkeley FrameNet project. In: Proceedings of COLING-ACL. Montreal, Canada(1998) Baker, C.F., Fillmore, C.J., Lowe, J.B.: The Berkeley FrameNet project. In: Proceedings of COLING-ACL. Montreal, Canada(1998)
2.
go back to reference Cancedda, N., Gaussier, É., Goutte, C., Renders, J.M.: Word-sequence kernels. J. Mach. Learn. Res. 3, 1059–1082 (2003)MathSciNetMATH Cancedda, N., Gaussier, É., Goutte, C., Renders, J.M.: Word-sequence kernels. J. Mach. Learn. Res. 3, 1059–1082 (2003)MathSciNetMATH
3.
go back to reference Chang, C.C., Lin, C.J.: Libsvm: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3), 27:1–27:27 (2011)CrossRef Chang, C.C., Lin, C.J.: Libsvm: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3), 27:1–27:27 (2011)CrossRef
4.
go back to reference Collins, M., Duffy, N.: Convolution kernels for natural language. In: Proceedings of Neural Information Processing Systems (NIPS 2001), pp. 625–632 (2001) Collins, M., Duffy, N.: Convolution kernels for natural language. In: Proceedings of Neural Information Processing Systems (NIPS 2001), pp. 625–632 (2001)
5.
go back to reference Crammer, K., Dekel, O., Keshet, J., Shalev-Shwartz, S., Singer, Y.: Online passive-aggressive algorithms. J. Mach. Learn. Res. 7, 551–585 (2006)MathSciNetMATH Crammer, K., Dekel, O., Keshet, J., Shalev-Shwartz, S., Singer, Y.: Online passive-aggressive algorithms. J. Mach. Learn. Res. 7, 551–585 (2006)MathSciNetMATH
6.
go back to reference Croce, D., Moschitti, A., Basili, R.: Structured lexical similarity via convolution kernels on dependency trees. In: Proceedings of EMNLP (2011) Croce, D., Moschitti, A., Basili, R.: Structured lexical similarity via convolution kernels on dependency trees. In: Proceedings of EMNLP (2011)
7.
go back to reference Culotta, A., Sorensen, J.: Dependency tree kernels for relation extraction. In: Proceedings of ACL 2004. Stroudsburg, PA, USA (2004) Culotta, A., Sorensen, J.: Dependency tree kernels for relation extraction. In: Proceedings of ACL 2004. Stroudsburg, PA, USA (2004)
8.
go back to reference Dredze, M., Crammer, K., Pereira, F.: Confidence-weighted linear classification. In: Proceedings of ICML 2008. ACM, New York (2008) Dredze, M., Crammer, K., Pereira, F.: Confidence-weighted linear classification. In: Proceedings of ICML 2008. ACM, New York (2008)
9.
go back to reference Drineas, P., Mahoney, M.W.: On the nystrm method for approximating a gram matrix for improved kernel-based learning. J. ML Res. 6, 2153–2175 (2005)MathSciNetMATH Drineas, P., Mahoney, M.W.: On the nystrm method for approximating a gram matrix for improved kernel-based learning. J. ML Res. 6, 2153–2175 (2005)MathSciNetMATH
10.
go back to reference Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: Liblinear: a library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)MATH Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: Liblinear: a library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)MATH
11.
go back to reference Filice, S., Castellucci, G., Croce, D., Basili, R.: Effective kernelized online learning in language processing tasks. In: de Rijke, M., Kenter, T., de Vries, A.P., Zhai, C.X., de Jong, F., Radinsky, K., Hofmann, K. (eds.) ECIR 2014. LNCS, vol. 8416, pp. 347–358. Springer, Heidelberg (2014)CrossRef Filice, S., Castellucci, G., Croce, D., Basili, R.: Effective kernelized online learning in language processing tasks. In: de Rijke, M., Kenter, T., de Vries, A.P., Zhai, C.X., de Jong, F., Radinsky, K., Hofmann, K. (eds.) ECIR 2014. LNCS, vol. 8416, pp. 347–358. Springer, Heidelberg (2014)CrossRef
12.
go back to reference Filice, S., Castellucci, G., Croce, D., Basili, R.: Kelp: a kernel-based learning platform for natural language processing. In: Proceedings of ACL: System Demonstrations. Beijing, China, July 2015 Filice, S., Castellucci, G., Croce, D., Basili, R.: Kelp: a kernel-based learning platform for natural language processing. In: Proceedings of ACL: System Demonstrations. Beijing, China, July 2015
13.
go back to reference Filice, S., Croce, D., Basili, R.: A stratified strategy for efficient kernel-based learning. In: AAAI Conference on Artificial Intelligence (2015) Filice, S., Croce, D., Basili, R.: A stratified strategy for efficient kernel-based learning. In: AAAI Conference on Artificial Intelligence (2015)
14.
go back to reference Filice, S., Croce, D., Basili, R., Zanzotto, F.M.: Linear online learning over structured data with distributed tree kernels. In: Proceedings of ICMLA 2013 (2013) Filice, S., Croce, D., Basili, R., Zanzotto, F.M.: Linear online learning over structured data with distributed tree kernels. In: Proceedings of ICMLA 2013 (2013)
15.
go back to reference Hsieh, C.J., Chang, K.W., Lin, C.J., Keerthi, S.S., Sundararajan, S.: A dual coordinate descent method for large-scale linear svm. In: Proceedings of the ICML 2008, pp. 408–415. ACM, New York (2008) Hsieh, C.J., Chang, K.W., Lin, C.J., Keerthi, S.S., Sundararajan, S.: A dual coordinate descent method for large-scale linear svm. In: Proceedings of the ICML 2008, pp. 408–415. ACM, New York (2008)
16.
go back to reference Joachims, T., Finley, T., Yu, C.N.: Cutting-plane training of structural SVMs. Mach. Learn. 77(1), 27–59 (2009)CrossRefMATH Joachims, T., Finley, T., Yu, C.N.: Cutting-plane training of structural SVMs. Mach. Learn. 77(1), 27–59 (2009)CrossRefMATH
17.
go back to reference Kumar, S., Mohri, M., Talwalkar, A.: Ensemble nystrom method. In: NIPS, pp. 1060–1068. Curran Associates, Inc.(2009) Kumar, S., Mohri, M., Talwalkar, A.: Ensemble nystrom method. In: NIPS, pp. 1060–1068. Curran Associates, Inc.(2009)
18.
go back to reference Kumar, S., Mohri, M., Talwalkar, A.: Sampling methods for the nyström method. J. Mach. Learn. Res. 13, 981–1006 (2012)MathSciNetMATH Kumar, S., Mohri, M., Talwalkar, A.: Sampling methods for the nyström method. J. Mach. Learn. Res. 13, 981–1006 (2012)MathSciNetMATH
19.
go back to reference Li, X., Roth, D.: Learning question classifiers: the role of semantic information. Nat. Lang. Eng. 12(3), 229–249 (2006)CrossRef Li, X., Roth, D.: Learning question classifiers: the role of semantic information. Nat. Lang. Eng. 12(3), 229–249 (2006)CrossRef
20.
go back to reference Moschitti, A., Pighin, D., Basili, R.: Tree kernels for semantic role labeling. Comput. Linguist. 34, 193–224 (2008)MathSciNetCrossRef Moschitti, A., Pighin, D., Basili, R.: Tree kernels for semantic role labeling. Comput. Linguist. 34, 193–224 (2008)MathSciNetCrossRef
21.
go back to reference Rifkin, R., Klautau, A.: In defense of one-vs-all classification. J. Mach. Learn. Res. 5, 101–141 (2004)MathSciNetMATH Rifkin, R., Klautau, A.: In defense of one-vs-all classification. J. Mach. Learn. Res. 5, 101–141 (2004)MathSciNetMATH
22.
go back to reference Shalev-Shwartz, S., Singer, Y., Srebro, N.: Pegasos: primal estimated sub-gradient solver for SVM. In: Proceedings of ICML. ACM, New York (2007) Shalev-Shwartz, S., Singer, Y., Srebro, N.: Pegasos: primal estimated sub-gradient solver for SVM. In: Proceedings of ICML. ACM, New York (2007)
23.
go back to reference Shalev-Shwartz, S., Zhang, T.: Stochastic dual coordinate ascent methods for regularized loss. J. Mach. Learn. Res. 14(1), 567–599 (2013)MathSciNetMATH Shalev-Shwartz, S., Zhang, T.: Stochastic dual coordinate ascent methods for regularized loss. J. Mach. Learn. Res. 14(1), 567–599 (2013)MathSciNetMATH
24.
go back to reference Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, New York (2004)CrossRefMATH Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, New York (2004)CrossRefMATH
25.
go back to reference Vapnik, V.N.: Statistical Learning Theory. Wiley-Interscience, New York (1998)MATH Vapnik, V.N.: Statistical Learning Theory. Wiley-Interscience, New York (1998)MATH
26.
go back to reference Vedaldi, A., Zisserman, A.: Efficient additive kernels via explicit feature maps. IEEE Trans. Pattern Anal. Mach. Intell. 34(3), 480–492 (2012)CrossRef Vedaldi, A., Zisserman, A.: Efficient additive kernels via explicit feature maps. IEEE Trans. Pattern Anal. Mach. Intell. 34(3), 480–492 (2012)CrossRef
27.
go back to reference Wang, J., Zhao, P., Hoi, S.C.: Exact soft confidence-weighted learning. In: Proceedings of the ICML 2012. ACM, New York (2012) Wang, J., Zhao, P., Hoi, S.C.: Exact soft confidence-weighted learning. In: Proceedings of the ICML 2012. ACM, New York (2012)
28.
go back to reference Wang, Z., Vucetic, S.: Online passive-aggressive algorithms on a budget. J. Mach. Learn. Res. Proc. Track 9, 908–915 (2010) Wang, Z., Vucetic, S.: Online passive-aggressive algorithms on a budget. J. Mach. Learn. Res. Proc. Track 9, 908–915 (2010)
29.
go back to reference Williams, C.K.I., Seeger, M.: Using the nyström method to speed up kernel machines. In: Proceedings of NIpPS 2000 (2001) Williams, C.K.I., Seeger, M.: Using the nyström method to speed up kernel machines. In: Proceedings of NIpPS 2000 (2001)
30.
go back to reference Zanzotto, F.M., Dell’Arciprete, L.: Distributed tree kernels. In: Proceedings of ICML 2012 (2012) Zanzotto, F.M., Dell’Arciprete, L.: Distributed tree kernels. In: Proceedings of ICML 2012 (2012)
31.
go back to reference Zhang, D., Lee, W.S.: Question classification using support vector machines. In: Proceedings of SIGIR 2003, pp. 26–32. ACM, New York (2003) Zhang, D., Lee, W.S.: Question classification using support vector machines. In: Proceedings of SIGIR 2003, pp. 26–32. ACM, New York (2003)
Metadata
Title
Large-Scale Kernel-Based Language Learning Through the Ensemble Nystrm Methods
Authors
Danilo Croce
Roberto Basili
Copyright Year
2016
Publisher
Springer International Publishing
DOI
https://doi.org/10.1007/978-3-319-30671-1_8