nach oben

Neural Processing Letters

Erschienen in:

01.04.2016

Global Rademacher Complexity Bounds: From Slow to Fast Convergence Rates

verfasst von: Luca Oneto, Alessandro Ghio, Sandro Ridella, Davide Anguita

Erschienen in: Neural Processing Letters | Ausgabe 2/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Previous works in literature showed that performance estimations of learning procedures can be characterized by a convergence rate ranging between \(O(n^{-1})\) (fast rate) and \(O(n^{-{1}/{2}})\) (slow rate). In order to derive such result, some assumptions on the problem are required; moreover, even when convergence is fast, the constants characterizing the bounds are often quite loose. In this work, we prove new Rademacher complexity (RC) based bounds, which do not require any additional assumptions for achieving a fast convergence rate \(O(n^{-1})\) in the optimistic case and a slow rate \(O(n^{-{1}/{2}})\) in the general case. At the same time, they are characterized by smaller constants with respect to other state-of-the-art RC fast converging alternatives in literature. The results proposed in this work are obtained by exploiting the fundamental work of Talagrand on concentration inequalities for product measures and empirical processes. As a further issue, we also provide the extension of the results to the semi-supervised learning framework, showing how additional unlabeled samples allow improving the tightness of the derived bound.

Vorheriger Artikel Exponential Stability of Switched Time-varying Delayed Neural Networks with All Modes Being Unstable

Nächster Artikel Dispersion Constraint Based Non-negative Sparse Coding Model

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Nur mit Berechtigung zugänglich

Note that we exploit a bounded loss function. As a matter of fact, exploiting unbounded loss functions would require additional hypotheses to hold in order to derive bounds ranging between slow and fast convergence [2, 21, 45].

Note that \(\sqrt{a + b} \le \sqrt{a} + \sqrt{b}\), \( \sqrt{a 2b} \le \frac{a}{2} + b\), \(\sqrt{a \sqrt{a 4 b}} \le \left( \frac{1}{\sqrt{2}} + \frac{1}{2} \right) a + b\) and \(\sqrt{a \sqrt{a \sqrt{a 8 b}}} \le \left( \frac{1}{\sqrt{\sqrt{2}}} + \frac{1}{\sqrt{2}} + \frac{1}{2} \right) a + b\) with \(a,b \ge 0\).

Note that \({\mathbb {P}}\left\{ A \wedge B \right\} = {\mathbb {P}}\left\{ A \right\} {\mathbb {P}}\left\{ B \right\} \).

Note that \(\phi (a) \ge \frac{a^2}{2 + \frac{2a}{3}}\) for \(a \ge 0\) and \(\phi (-a) \ge \frac{t^2}{2}\) for \(a \in [0,1]\) [23].

Abney SP (2008) Semisupervised learning in computational linguistics. CRC Press, Boca RatonMATH

Adamczak R (2007) A tail inequality for suprema of unbounded empirical processes with applications to markov chains. Electron J Probab 13:1000–1034MathSciNetMATH

Anava O, Hazan E, Mannor S, Shamir O (2013) Online learning for time series prediction. In: The 26th annual conference on learning theory, Princeton University, NJ, USA

Anguita D, Ghio A, Oneto L, Ridella S (2011) The impact of unlabeled patterns in rademacher complexity theory for kernel classifiers. In: Proceedings of neural information processing systems

Anguita D, Ghio A, Oneto L, Ridella S (2012) In-sample and out-of-sample model selection and error estimation for support vector machines. IEEE Trans Neural Netw Learn Syst 23(9):1390–1406CrossRef

Anguita D, Ghio A, Oneto L, Ridella S (2013) A learning machine with a bit-based hypothesis space. In: European symposium on artificial neural networks

Anguita D, Ghio A, Oneto L, Ridella S (2014) Unlabeled patterns to tighten Rademacher complexity error bounds for kernel classifiers. Pattern Recognit Lett 37:210–219CrossRef

Anguita D, Ghio A, Oneto L, Xavier P, Reyes-Ortiz JL (2013) Energy efficient smartphone-based activity recognition using fixed-point arithmetic. J Univers Comput Sci 19:1295–1314

Anguita D, Ghio A, Ridella S (2011) Maximal discrepancy for support vector machines. Neurocomputing 74(9):1436–1443CrossRef

10.

Anguita S, Ghio A, Oneto L, Ridella S (2011) Maximal discrepancy vs. rademacher complexity for error estimation. In: European symposium on artificial neural networks

11.

Audibert JY (2009) Fast learning rates in statistical inference through aggregation. Ann Stat 37(4):1591–1646MathSciNetCrossRefMATH

12.

Audibert JY, Tsybakov AB (2007) Fast learning rates for plug-in classifiers. Ann Stat 35(2):608–633MathSciNetCrossRefMATH

13.

Bartlett PL, Boucheron S, Lugosi G (2002) Model selection and error estimation. Mach Learn 48(1–3):85–113CrossRefMATH

14.

Bartlett PL, Bousquet O, Mendelson S (2002) Localized rademacher complexities. In: Proceedings of the 15th annual conference on computational learning theory

15.

Bartlett PL, Bousquet O, Mendelson S (2005) Local rademacher complexities. Ann Stat 33(4):1497–1537MathSciNetCrossRefMATH

16.

Bartlett PL, Mendelson S (2003) Rademacher and gaussian complexities: Risk bounds and structural results. J Mach Learn Res 3:463–482MathSciNetMATH

17.

Belkin M, Matveeva I, Niyogi P (2004) Regularization and semi-supervised learning on large graphs. In: Conference on learning theory

18.

Belkin M, Niyogi P (2004) Semi-supervised learning on riemannian manifolds. Mach Learn 56(1–3):209–239CrossRefMATH

19.

Ben-David S, Lu T, Pál D (2008) Does unlabeled data provably help? worst-case analysis of the sample complexity of semi-supervised learning. In: Conference on computational learning theory

20.

Bennett K, Demiriz A (1999) Semi-supervised support vector machines. In: Advances in neural information processing systems 11: proceedings of the 1998 conference, p. 368. The MIT Press, Cambridge

21.

Bentkus V (2008) An extension of the hoeffding inequality to unbounded random variables. Lith Math J 48(2):137–157MathSciNetCrossRefMATH

22.

Blumer A, Ehrenfeucht A, Haussler D, Warmuth MK (1987) Occam’s razor. Inf Process Lett 24(6):377–380MathSciNetCrossRefMATH

23.

Boucheron S, Lugosi G, Massart P (2000) A sharp concentration inequality with applications. Random Struct Algorithms 16(3):277–292MathSciNetCrossRefMATH

24.

Bousquet O (2002) A bennett concentration inequality and its application to suprema of empirical processes. Comptes Rendus Math 334(6):495–500MathSciNetCrossRefMATH

25.

Bousquet O, Elisseeff A (2002) Stability and generalization. J Mach Learn Res 2:499–526MathSciNetMATH

26.

Chapelle O, Schölkopf B, Zien A (2006) Semi-supervised learning. MIT press, CambridgeCrossRef

27.

Cherkassky V (2002) Model complexity control and statistical learning theory. Nat Comput 1(1):109–133MathSciNetCrossRefMATH

28.

Cherkassky V, Mulier FM (2007) Learning from data: concepts, theory, and methods. Wiley, New YorkCrossRefMATH

29.

Clopper CJ, Pearson ES (1934) The use of confidence or fiducial limits illustrated in the case of the binomial. Biometrika 26(4):404CrossRefMATH

30.

Cohen I, Cozman FG, Sebe N, Cirelo MC, Huang TS (2004) Semisupervised learning of classifiers: theory, algorithms, and their application to human-computer interaction. IEEE Trans Pattern Anal Mach Intell 26(12):1553–1566CrossRef

31.

Corless RM, Gonnet GH, Hare DEG, Jeffrey DJ, Knuth DE (1996) On the Lambert w function. Adv Comput Math 5(1):329–359MathSciNetCrossRefMATH

32.

Cortes C, Kloft M, Mohri M (2013) Learning kernels using local rademacher complexity. In: Conference on neural information processing systems

33.

El-Yaniv R, Pechyony D (2009) Transductive rademacher complexity and its applications. J Artif Intell Res 35(1):193MathSciNetMATH

34.

Floyd S, Warmuth M (1995) Sample compression, learnability, and the vapnik-chervonenkis dimension. Mach Learn 21(3):269–304

35.

Friedland L, Jensen D, Lavine M (2013) Copy or coincidence? a model for detecting social influence and duplication events. In: International conference on machine learning

36.

Gopal S, Bai B, Yang Y, Niculescu-Mizil A (2013) Bayesian models for large-scale hierarchical classification. In: Neural information processing systems

37.

Kääriäinen M (2005) Generalization error bounds using unlabeled data. In: Conference on computational learning theory

38.

Katrenko S, Van Zaanen M (2010) Rademacher complexity and grammar induction algorithms: what it may (not) tell us. In: Grammatical inference: theoretical results and applications

39.

Klein T (2002) Une inégalité de concentration à gauche pour les processus empiriques. Comptes Rendus Math 334(6):501–504CrossRef

40.

Klein T, Rio E (2005) Concentration around the mean for maxima of empirical processes. Ann Probab 33(3):1060–1077MathSciNetCrossRefMATH

41.

Kloft M, Blanchard G (2011) The local rademacher complexity of lp-norm multiple kernel learning. In: Neural information processing systems

42.

Koltchinskii V (2001) Rademacher penalties and structural risk minimization. IEEE Trans Inf Theory 47(5):1902–1914MathSciNetCrossRefMATH

43.

Koltchinskii V (2006) Local rademacher complexities and oracle inequalities in risk minimization. Ann Stat 34(6):2593–2656MathSciNetCrossRefMATH

44.

Kumar A, Saha A, Daume H (2010) Co-regularization based semi-supervised domain adaptation. In: Advances in neural information processing systems

45.

Kutin S (2002) Extensions to mcdiarmid’s inequality when differences are bounded with high probability. In: Department Computer Science, University of Chicago, Chicago, IL. Technical report TR-2002-04

46.

Langford J (2006) Tutorial on practical prediction theory for classification. J Mach Learn Res 6(1):273MathSciNetMATH

47.

Langford J, McAllester D (2000) Computable shell decomposition bounds. In: Conference on computational learning theory

48.

Ledoux M, Talagrand M (2011) Probability in banach spaces: isoperimetry and processes. Springer, BerlinMATH

49.

Li M, Tromp J, Vitányi P (2002) Sharpening Occam razor. Springer, BerlinMATH

50.

Li M, Vitanyi P (1993) An introduction to Kolmogorov complexity and its applications. Springer, New YorkCrossRefMATH

51.

Mammen E, Tsybakov AB (1999) Smooth discrimination analysis. Ann Stat 27(6):1808–1829MathSciNetCrossRefMATH

52.

Massart P (2000) About the constants in talagrand’s concentration inequalities for empirical processes. Ann Probab 28(2):863–884 (English summary)MathSciNetCrossRefMATH

53.

McAllester DA (1999) Pac-bayesian model averaging. In: Conference on computational learning theory

54.

McDiarmid C (1989) On the method of bounded differences. Surv Comb 141(1):148–188MathSciNetMATH

55.

Oneto L, Ghio A, Anguita D, Ridella S (2013) An improved analysis of the rademacher data-dependent bound using its self bounding property. Neural Netw 44:107–111CrossRefMATH

56.

Panchenko D (2002) Some extensions of an inequality of vapnik and chervonenkis. Electron Commun Probab 7:55–65MathSciNetCrossRefMATH

57.

Parrado-Hernández E, Ambroladze A, Shawe-Taylor J, Sun S (2012) Pac-bayes bounds with data dependent priors. J Mach Learn Res 13:3507–3531MathSciNetMATH

58.

Parrondo JMR, Van den Broeck C (1993) Vapnik-chervonenkis bounds for generalization. J Phys A 26(9):2211MathSciNetCrossRefMATH

59.

Paulo JL, Azzam FGT (2006) The use of artificial neural networks in decision support in cancer: a systematic review. Neural Netw 19(4):408–415CrossRefMATH

60.

Poggio T, Rifkin R, Mukherjee S, Niyogi P (2004) General conditions for predictivity in learning theory. Nature 428(6981):419–422CrossRef

61.

Rosenberg DS, Bartlett PL (2007) The rademacher complexity of co-regularized kernel classes. In: International conference on artificial intelligence and statistics, pp 396–403

62.

Shalev-Shwartz S, Shamir O, Srebro N, Sridharan K (2010) Learnability, stability and uniform convergence. J Mach Learn Res 11:2635–2670MathSciNetMATH

63.

Shao X, Cherkassky V, Li W (2000) Measuring the VC-dimension using optimized experimental design. Neural Comput 12(8):1969–1986CrossRef

64.

Shawe-Taylor J, Bartlett PL, Williamson RC, Anthony M (1998) Structural risk minimization over data-dependent hierarchies. IEEE Trans Inf Theory 44(5):1926–1940MathSciNetCrossRefMATH

65.

Srebro N, Sridharan K, Tewari A (2010) Smoothness, low-noise and fast rates. In: Neural information processing systems

66.

Statnikov A, Henaff M, Narendra V, Konganti K, Li Z, Yang L, Pei Z, Blaser MJ, Aliferis CF, Alekseyenko AV (2013) A comprehensive evaluation of multicategory classification methods for microbiomic data. Microbiome 1(1):11CrossRef

67.

Sun S, Shawe-Taylor J (2010) Sparse semi-supervised learning using conjugate functions. J Mach Learn Res 11:2423–2455MathSciNetMATH

68.

Talagrand M (1996) A new look at independence. Ann Probab 24(1):1–34MathSciNetCrossRefMATH

69.

Vapnik VN (1998) Statistical learning theory. Wiley, New YorkMATH

70.

Vapnik VN, Červonenkis AJ (1979) Theorie der Zeichenerkennung. Akademie-Verlag, Berlin

71.

Weisstein EW (2013) Lambert W-Function. From MathWorld-A Wolfram Web Resource. http://mathworld.wolfram.com/LambertW-Function.html

72.

Zhu X, Goldberg AB (2009) Introduction to semi-supervised learning. Synth Lect Artif Intell Mach Learn 3(1):1–130CrossRefMATH

Titel: Global Rademacher Complexity Bounds: From Slow to Fast Convergence Rates
verfasst von: Luca Oneto
Alessandro Ghio
Sandro Ridella
Davide Anguita
Publikationsdatum: 01.04.2016
Verlag: Springer US
Erschienen in: Neural Processing Letters / Ausgabe 2/2016
Print ISSN: 1370-4621
Elektronische ISSN: 1573-773X
DOI: https://doi.org/10.1007/s11063-015-9429-2

Neuer Inhalt

Bildnachweise

VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Nachhaltigkeitsaward Key Visual/© Cometis AG/Global ESG Monitor | Daniel Rupp | Generiert mit KI, Search Icon, Banner Hanser, Jonas Klose/© Pine Valley Capital GmbH, Carina Kießling von der Strategieberatung Roland Berger/© Monika Walther Fotografie | ATZ, Beijing Auto Show 2024: Deutsche Hersteller wollen angreifen./© EKH-Pictures / Generated with AI / Stock.adobe.com, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, Zukunftswerkstatt Sales Excellence 2024/© AndreyPopov / Getty Images / iStock, 2023_Antrieb/© supervisuell, ATZ-Webinar: Prototypenfreie Entwicklung durch Offline- und Driver-in-the-Loop-HiL-Tests /© (c) VI-grade

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 2/2016

Introducing Synaptic Delays in the NEAT Algorithm to Improve Modelling in Cognitive Robotics

Neural Control for Driving a Mobile Robot Integrating Stereo Vision Feedback

Adaptive Convolutional Neural Network and Its Application in Face Recognition

Asynchronous Event-based Cooperative Stereo Matching Using Neuromorphic Silicon Retinas

Neural Mechanism of Corticofugal Modulation of Tuning Property in Frequency Domain of Bat’s Auditory System

Exponential Stability of Switched Time-varying Delayed Neural Networks with All Modes Being Unstable

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.