Skip to main content
Erschienen in: Neural Processing Letters 2/2016

01.04.2016

Global Rademacher Complexity Bounds: From Slow to Fast Convergence Rates

verfasst von: Luca Oneto, Alessandro Ghio, Sandro Ridella, Davide Anguita

Erschienen in: Neural Processing Letters | Ausgabe 2/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Previous works in literature showed that performance estimations of learning procedures can be characterized by a convergence rate ranging between \(O(n^{-1})\) (fast rate) and \(O(n^{-{1}/{2}})\) (slow rate). In order to derive such result, some assumptions on the problem are required; moreover, even when convergence is fast, the constants characterizing the bounds are often quite loose. In this work, we prove new Rademacher complexity (RC) based bounds, which do not require any additional assumptions for achieving a fast convergence rate \(O(n^{-1})\) in the optimistic case and a slow rate \(O(n^{-{1}/{2}})\) in the general case. At the same time, they are characterized by smaller constants with respect to other state-of-the-art RC fast converging alternatives in literature. The results proposed in this work are obtained by exploiting the fundamental work of Talagrand on concentration inequalities for product measures and empirical processes. As a further issue, we also provide the extension of the results to the semi-supervised learning framework, showing how additional unlabeled samples allow improving the tightness of the derived bound.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
1
Note that we exploit a bounded loss function. As a matter of fact, exploiting unbounded loss functions would require additional hypotheses to hold in order to derive bounds ranging between slow and fast convergence [2, 21, 45].
 
2
Note that \(\sqrt{a + b} \le \sqrt{a} + \sqrt{b}\), \( \sqrt{a 2b} \le \frac{a}{2} + b\), \(\sqrt{a \sqrt{a 4 b}} \le \left( \frac{1}{\sqrt{2}} + \frac{1}{2} \right) a + b\) and \(\sqrt{a \sqrt{a \sqrt{a 8 b}}} \le \left( \frac{1}{\sqrt{\sqrt{2}}} + \frac{1}{\sqrt{2}} + \frac{1}{2} \right) a + b\) with \(a,b \ge 0\).
 
3
Note that \({\mathbb {P}}\left\{ A \wedge B \right\} = {\mathbb {P}}\left\{ A \right\} {\mathbb {P}}\left\{ B \right\} \).
 
4
Note that \(\phi (a) \ge \frac{a^2}{2 + \frac{2a}{3}}\) for \(a \ge 0\) and \(\phi (-a) \ge \frac{t^2}{2}\) for \(a \in [0,1]\) [23].
 
Literatur
1.
Zurück zum Zitat Abney SP (2008) Semisupervised learning in computational linguistics. CRC Press, Boca RatonMATH Abney SP (2008) Semisupervised learning in computational linguistics. CRC Press, Boca RatonMATH
2.
Zurück zum Zitat Adamczak R (2007) A tail inequality for suprema of unbounded empirical processes with applications to markov chains. Electron J Probab 13:1000–1034MathSciNetMATH Adamczak R (2007) A tail inequality for suprema of unbounded empirical processes with applications to markov chains. Electron J Probab 13:1000–1034MathSciNetMATH
3.
Zurück zum Zitat Anava O, Hazan E, Mannor S, Shamir O (2013) Online learning for time series prediction. In: The 26th annual conference on learning theory, Princeton University, NJ, USA Anava O, Hazan E, Mannor S, Shamir O (2013) Online learning for time series prediction. In: The 26th annual conference on learning theory, Princeton University, NJ, USA
4.
Zurück zum Zitat Anguita D, Ghio A, Oneto L, Ridella S (2011) The impact of unlabeled patterns in rademacher complexity theory for kernel classifiers. In: Proceedings of neural information processing systems Anguita D, Ghio A, Oneto L, Ridella S (2011) The impact of unlabeled patterns in rademacher complexity theory for kernel classifiers. In: Proceedings of neural information processing systems
5.
Zurück zum Zitat Anguita D, Ghio A, Oneto L, Ridella S (2012) In-sample and out-of-sample model selection and error estimation for support vector machines. IEEE Trans Neural Netw Learn Syst 23(9):1390–1406CrossRef Anguita D, Ghio A, Oneto L, Ridella S (2012) In-sample and out-of-sample model selection and error estimation for support vector machines. IEEE Trans Neural Netw Learn Syst 23(9):1390–1406CrossRef
6.
Zurück zum Zitat Anguita D, Ghio A, Oneto L, Ridella S (2013) A learning machine with a bit-based hypothesis space. In: European symposium on artificial neural networks Anguita D, Ghio A, Oneto L, Ridella S (2013) A learning machine with a bit-based hypothesis space. In: European symposium on artificial neural networks
7.
Zurück zum Zitat Anguita D, Ghio A, Oneto L, Ridella S (2014) Unlabeled patterns to tighten Rademacher complexity error bounds for kernel classifiers. Pattern Recognit Lett 37:210–219CrossRef Anguita D, Ghio A, Oneto L, Ridella S (2014) Unlabeled patterns to tighten Rademacher complexity error bounds for kernel classifiers. Pattern Recognit Lett 37:210–219CrossRef
8.
Zurück zum Zitat Anguita D, Ghio A, Oneto L, Xavier P, Reyes-Ortiz JL (2013) Energy efficient smartphone-based activity recognition using fixed-point arithmetic. J Univers Comput Sci 19:1295–1314 Anguita D, Ghio A, Oneto L, Xavier P, Reyes-Ortiz JL (2013) Energy efficient smartphone-based activity recognition using fixed-point arithmetic. J Univers Comput Sci 19:1295–1314
9.
Zurück zum Zitat Anguita D, Ghio A, Ridella S (2011) Maximal discrepancy for support vector machines. Neurocomputing 74(9):1436–1443CrossRef Anguita D, Ghio A, Ridella S (2011) Maximal discrepancy for support vector machines. Neurocomputing 74(9):1436–1443CrossRef
10.
Zurück zum Zitat Anguita S, Ghio A, Oneto L, Ridella S (2011) Maximal discrepancy vs. rademacher complexity for error estimation. In: European symposium on artificial neural networks Anguita S, Ghio A, Oneto L, Ridella S (2011) Maximal discrepancy vs. rademacher complexity for error estimation. In: European symposium on artificial neural networks
13.
Zurück zum Zitat Bartlett PL, Boucheron S, Lugosi G (2002) Model selection and error estimation. Mach Learn 48(1–3):85–113CrossRefMATH Bartlett PL, Boucheron S, Lugosi G (2002) Model selection and error estimation. Mach Learn 48(1–3):85–113CrossRefMATH
14.
Zurück zum Zitat Bartlett PL, Bousquet O, Mendelson S (2002) Localized rademacher complexities. In: Proceedings of the 15th annual conference on computational learning theory Bartlett PL, Bousquet O, Mendelson S (2002) Localized rademacher complexities. In: Proceedings of the 15th annual conference on computational learning theory
16.
Zurück zum Zitat Bartlett PL, Mendelson S (2003) Rademacher and gaussian complexities: Risk bounds and structural results. J Mach Learn Res 3:463–482MathSciNetMATH Bartlett PL, Mendelson S (2003) Rademacher and gaussian complexities: Risk bounds and structural results. J Mach Learn Res 3:463–482MathSciNetMATH
17.
Zurück zum Zitat Belkin M, Matveeva I, Niyogi P (2004) Regularization and semi-supervised learning on large graphs. In: Conference on learning theory Belkin M, Matveeva I, Niyogi P (2004) Regularization and semi-supervised learning on large graphs. In: Conference on learning theory
18.
Zurück zum Zitat Belkin M, Niyogi P (2004) Semi-supervised learning on riemannian manifolds. Mach Learn 56(1–3):209–239CrossRefMATH Belkin M, Niyogi P (2004) Semi-supervised learning on riemannian manifolds. Mach Learn 56(1–3):209–239CrossRefMATH
19.
Zurück zum Zitat Ben-David S, Lu T, Pál D (2008) Does unlabeled data provably help? worst-case analysis of the sample complexity of semi-supervised learning. In: Conference on computational learning theory Ben-David S, Lu T, Pál D (2008) Does unlabeled data provably help? worst-case analysis of the sample complexity of semi-supervised learning. In: Conference on computational learning theory
20.
Zurück zum Zitat Bennett K, Demiriz A (1999) Semi-supervised support vector machines. In: Advances in neural information processing systems 11: proceedings of the 1998 conference, p. 368. The MIT Press, Cambridge Bennett K, Demiriz A (1999) Semi-supervised support vector machines. In: Advances in neural information processing systems 11: proceedings of the 1998 conference, p. 368. The MIT Press, Cambridge
21.
23.
Zurück zum Zitat Boucheron S, Lugosi G, Massart P (2000) A sharp concentration inequality with applications. Random Struct Algorithms 16(3):277–292MathSciNetCrossRefMATH Boucheron S, Lugosi G, Massart P (2000) A sharp concentration inequality with applications. Random Struct Algorithms 16(3):277–292MathSciNetCrossRefMATH
24.
Zurück zum Zitat Bousquet O (2002) A bennett concentration inequality and its application to suprema of empirical processes. Comptes Rendus Math 334(6):495–500MathSciNetCrossRefMATH Bousquet O (2002) A bennett concentration inequality and its application to suprema of empirical processes. Comptes Rendus Math 334(6):495–500MathSciNetCrossRefMATH
25.
26.
Zurück zum Zitat Chapelle O, Schölkopf B, Zien A (2006) Semi-supervised learning. MIT press, CambridgeCrossRef Chapelle O, Schölkopf B, Zien A (2006) Semi-supervised learning. MIT press, CambridgeCrossRef
28.
Zurück zum Zitat Cherkassky V, Mulier FM (2007) Learning from data: concepts, theory, and methods. Wiley, New YorkCrossRefMATH Cherkassky V, Mulier FM (2007) Learning from data: concepts, theory, and methods. Wiley, New YorkCrossRefMATH
29.
Zurück zum Zitat Clopper CJ, Pearson ES (1934) The use of confidence or fiducial limits illustrated in the case of the binomial. Biometrika 26(4):404CrossRefMATH Clopper CJ, Pearson ES (1934) The use of confidence or fiducial limits illustrated in the case of the binomial. Biometrika 26(4):404CrossRefMATH
30.
Zurück zum Zitat Cohen I, Cozman FG, Sebe N, Cirelo MC, Huang TS (2004) Semisupervised learning of classifiers: theory, algorithms, and their application to human-computer interaction. IEEE Trans Pattern Anal Mach Intell 26(12):1553–1566CrossRef Cohen I, Cozman FG, Sebe N, Cirelo MC, Huang TS (2004) Semisupervised learning of classifiers: theory, algorithms, and their application to human-computer interaction. IEEE Trans Pattern Anal Mach Intell 26(12):1553–1566CrossRef
31.
32.
Zurück zum Zitat Cortes C, Kloft M, Mohri M (2013) Learning kernels using local rademacher complexity. In: Conference on neural information processing systems Cortes C, Kloft M, Mohri M (2013) Learning kernels using local rademacher complexity. In: Conference on neural information processing systems
33.
Zurück zum Zitat El-Yaniv R, Pechyony D (2009) Transductive rademacher complexity and its applications. J Artif Intell Res 35(1):193MathSciNetMATH El-Yaniv R, Pechyony D (2009) Transductive rademacher complexity and its applications. J Artif Intell Res 35(1):193MathSciNetMATH
34.
Zurück zum Zitat Floyd S, Warmuth M (1995) Sample compression, learnability, and the vapnik-chervonenkis dimension. Mach Learn 21(3):269–304 Floyd S, Warmuth M (1995) Sample compression, learnability, and the vapnik-chervonenkis dimension. Mach Learn 21(3):269–304
35.
Zurück zum Zitat Friedland L, Jensen D, Lavine M (2013) Copy or coincidence? a model for detecting social influence and duplication events. In: International conference on machine learning Friedland L, Jensen D, Lavine M (2013) Copy or coincidence? a model for detecting social influence and duplication events. In: International conference on machine learning
36.
Zurück zum Zitat Gopal S, Bai B, Yang Y, Niculescu-Mizil A (2013) Bayesian models for large-scale hierarchical classification. In: Neural information processing systems Gopal S, Bai B, Yang Y, Niculescu-Mizil A (2013) Bayesian models for large-scale hierarchical classification. In: Neural information processing systems
37.
Zurück zum Zitat Kääriäinen M (2005) Generalization error bounds using unlabeled data. In: Conference on computational learning theory Kääriäinen M (2005) Generalization error bounds using unlabeled data. In: Conference on computational learning theory
38.
Zurück zum Zitat Katrenko S, Van Zaanen M (2010) Rademacher complexity and grammar induction algorithms: what it may (not) tell us. In: Grammatical inference: theoretical results and applications Katrenko S, Van Zaanen M (2010) Rademacher complexity and grammar induction algorithms: what it may (not) tell us. In: Grammatical inference: theoretical results and applications
39.
Zurück zum Zitat Klein T (2002) Une inégalité de concentration à gauche pour les processus empiriques. Comptes Rendus Math 334(6):501–504CrossRef Klein T (2002) Une inégalité de concentration à gauche pour les processus empiriques. Comptes Rendus Math 334(6):501–504CrossRef
40.
41.
Zurück zum Zitat Kloft M, Blanchard G (2011) The local rademacher complexity of lp-norm multiple kernel learning. In: Neural information processing systems Kloft M, Blanchard G (2011) The local rademacher complexity of lp-norm multiple kernel learning. In: Neural information processing systems
42.
43.
44.
Zurück zum Zitat Kumar A, Saha A, Daume H (2010) Co-regularization based semi-supervised domain adaptation. In: Advances in neural information processing systems Kumar A, Saha A, Daume H (2010) Co-regularization based semi-supervised domain adaptation. In: Advances in neural information processing systems
45.
Zurück zum Zitat Kutin S (2002) Extensions to mcdiarmid’s inequality when differences are bounded with high probability. In: Department Computer Science, University of Chicago, Chicago, IL. Technical report TR-2002-04 Kutin S (2002) Extensions to mcdiarmid’s inequality when differences are bounded with high probability. In: Department Computer Science, University of Chicago, Chicago, IL. Technical report TR-2002-04
46.
Zurück zum Zitat Langford J (2006) Tutorial on practical prediction theory for classification. J Mach Learn Res 6(1):273MathSciNetMATH Langford J (2006) Tutorial on practical prediction theory for classification. J Mach Learn Res 6(1):273MathSciNetMATH
47.
Zurück zum Zitat Langford J, McAllester D (2000) Computable shell decomposition bounds. In: Conference on computational learning theory Langford J, McAllester D (2000) Computable shell decomposition bounds. In: Conference on computational learning theory
48.
Zurück zum Zitat Ledoux M, Talagrand M (2011) Probability in banach spaces: isoperimetry and processes. Springer, BerlinMATH Ledoux M, Talagrand M (2011) Probability in banach spaces: isoperimetry and processes. Springer, BerlinMATH
49.
Zurück zum Zitat Li M, Tromp J, Vitányi P (2002) Sharpening Occam razor. Springer, BerlinMATH Li M, Tromp J, Vitányi P (2002) Sharpening Occam razor. Springer, BerlinMATH
50.
Zurück zum Zitat Li M, Vitanyi P (1993) An introduction to Kolmogorov complexity and its applications. Springer, New YorkCrossRefMATH Li M, Vitanyi P (1993) An introduction to Kolmogorov complexity and its applications. Springer, New YorkCrossRefMATH
52.
Zurück zum Zitat Massart P (2000) About the constants in talagrand’s concentration inequalities for empirical processes. Ann Probab 28(2):863–884 (English summary)MathSciNetCrossRefMATH Massart P (2000) About the constants in talagrand’s concentration inequalities for empirical processes. Ann Probab 28(2):863–884 (English summary)MathSciNetCrossRefMATH
53.
Zurück zum Zitat McAllester DA (1999) Pac-bayesian model averaging. In: Conference on computational learning theory McAllester DA (1999) Pac-bayesian model averaging. In: Conference on computational learning theory
54.
55.
Zurück zum Zitat Oneto L, Ghio A, Anguita D, Ridella S (2013) An improved analysis of the rademacher data-dependent bound using its self bounding property. Neural Netw 44:107–111CrossRefMATH Oneto L, Ghio A, Anguita D, Ridella S (2013) An improved analysis of the rademacher data-dependent bound using its self bounding property. Neural Netw 44:107–111CrossRefMATH
57.
Zurück zum Zitat Parrado-Hernández E, Ambroladze A, Shawe-Taylor J, Sun S (2012) Pac-bayes bounds with data dependent priors. J Mach Learn Res 13:3507–3531MathSciNetMATH Parrado-Hernández E, Ambroladze A, Shawe-Taylor J, Sun S (2012) Pac-bayes bounds with data dependent priors. J Mach Learn Res 13:3507–3531MathSciNetMATH
59.
Zurück zum Zitat Paulo JL, Azzam FGT (2006) The use of artificial neural networks in decision support in cancer: a systematic review. Neural Netw 19(4):408–415CrossRefMATH Paulo JL, Azzam FGT (2006) The use of artificial neural networks in decision support in cancer: a systematic review. Neural Netw 19(4):408–415CrossRefMATH
60.
Zurück zum Zitat Poggio T, Rifkin R, Mukherjee S, Niyogi P (2004) General conditions for predictivity in learning theory. Nature 428(6981):419–422CrossRef Poggio T, Rifkin R, Mukherjee S, Niyogi P (2004) General conditions for predictivity in learning theory. Nature 428(6981):419–422CrossRef
61.
Zurück zum Zitat Rosenberg DS, Bartlett PL (2007) The rademacher complexity of co-regularized kernel classes. In: International conference on artificial intelligence and statistics, pp 396–403 Rosenberg DS, Bartlett PL (2007) The rademacher complexity of co-regularized kernel classes. In: International conference on artificial intelligence and statistics, pp 396–403
62.
Zurück zum Zitat Shalev-Shwartz S, Shamir O, Srebro N, Sridharan K (2010) Learnability, stability and uniform convergence. J Mach Learn Res 11:2635–2670MathSciNetMATH Shalev-Shwartz S, Shamir O, Srebro N, Sridharan K (2010) Learnability, stability and uniform convergence. J Mach Learn Res 11:2635–2670MathSciNetMATH
63.
Zurück zum Zitat Shao X, Cherkassky V, Li W (2000) Measuring the VC-dimension using optimized experimental design. Neural Comput 12(8):1969–1986CrossRef Shao X, Cherkassky V, Li W (2000) Measuring the VC-dimension using optimized experimental design. Neural Comput 12(8):1969–1986CrossRef
64.
Zurück zum Zitat Shawe-Taylor J, Bartlett PL, Williamson RC, Anthony M (1998) Structural risk minimization over data-dependent hierarchies. IEEE Trans Inf Theory 44(5):1926–1940MathSciNetCrossRefMATH Shawe-Taylor J, Bartlett PL, Williamson RC, Anthony M (1998) Structural risk minimization over data-dependent hierarchies. IEEE Trans Inf Theory 44(5):1926–1940MathSciNetCrossRefMATH
65.
Zurück zum Zitat Srebro N, Sridharan K, Tewari A (2010) Smoothness, low-noise and fast rates. In: Neural information processing systems Srebro N, Sridharan K, Tewari A (2010) Smoothness, low-noise and fast rates. In: Neural information processing systems
66.
Zurück zum Zitat Statnikov A, Henaff M, Narendra V, Konganti K, Li Z, Yang L, Pei Z, Blaser MJ, Aliferis CF, Alekseyenko AV (2013) A comprehensive evaluation of multicategory classification methods for microbiomic data. Microbiome 1(1):11CrossRef Statnikov A, Henaff M, Narendra V, Konganti K, Li Z, Yang L, Pei Z, Blaser MJ, Aliferis CF, Alekseyenko AV (2013) A comprehensive evaluation of multicategory classification methods for microbiomic data. Microbiome 1(1):11CrossRef
67.
Zurück zum Zitat Sun S, Shawe-Taylor J (2010) Sparse semi-supervised learning using conjugate functions. J Mach Learn Res 11:2423–2455MathSciNetMATH Sun S, Shawe-Taylor J (2010) Sparse semi-supervised learning using conjugate functions. J Mach Learn Res 11:2423–2455MathSciNetMATH
69.
Zurück zum Zitat Vapnik VN (1998) Statistical learning theory. Wiley, New YorkMATH Vapnik VN (1998) Statistical learning theory. Wiley, New YorkMATH
70.
Zurück zum Zitat Vapnik VN, Červonenkis AJ (1979) Theorie der Zeichenerkennung. Akademie-Verlag, Berlin Vapnik VN, Červonenkis AJ (1979) Theorie der Zeichenerkennung. Akademie-Verlag, Berlin
72.
Zurück zum Zitat Zhu X, Goldberg AB (2009) Introduction to semi-supervised learning. Synth Lect Artif Intell Mach Learn 3(1):1–130CrossRefMATH Zhu X, Goldberg AB (2009) Introduction to semi-supervised learning. Synth Lect Artif Intell Mach Learn 3(1):1–130CrossRefMATH
Metadaten
Titel
Global Rademacher Complexity Bounds: From Slow to Fast Convergence Rates
verfasst von
Luca Oneto
Alessandro Ghio
Sandro Ridella
Davide Anguita
Publikationsdatum
01.04.2016
Verlag
Springer US
Erschienen in
Neural Processing Letters / Ausgabe 2/2016
Print ISSN: 1370-4621
Elektronische ISSN: 1573-773X
DOI
https://doi.org/10.1007/s11063-015-9429-2

Weitere Artikel der Ausgabe 2/2016

Neural Processing Letters 2/2016 Zur Ausgabe

Neuer Inhalt