Skip to main content
Top
Published in: Knowledge and Information Systems 8/2020

20-02-2020 | Regular Paper

Supervised learning as an inverse problem based on non-smooth loss function

Authors: Soufiane Lyaqini, Mohamed Quafafou, Mourad Nachaoui, Abdelkrim Chakib

Published in: Knowledge and Information Systems | Issue 8/2020

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

This paper is concerned by solving supervised machine learning problem as an inverse problem. Recently, many works have focused on defining a relationship between supervised learning and the well-known inverse problems. However, this connection between the learning problem and the inverse one has been done in the particular case where the inverse problem is reformulated as a minimization problem with a quadratic cost functional (\(L^2\) cost functional). Although, it is well known that the cost functional can be \(L^1\), \(L^2\) or any positive function that measures the gap between the predicted data and the observed one. Indeed, the use of \(L^1\) loss function for supervised learning problem gives more consistent results (see Rosasco et al. in Neural Comput 16:1063–1076, 2004). This strengthens the idea of reformulating the inverse problem, associated to machine learning problem, into a minimization problem using \( L^{1}\) functional. However, the \(L^{1}\) loss function is non-differentiable, which precludes the use of standard optimization tools. To overcome this difficulty, we propose in this paper a new technique of approximation based on the reformulation of the associated inverse problem into a minimizing one of a slanting cost functional Chen et al. (MIS Q Manag Inf Syst 36:1165–1188, 2012), which is solved using Tikhonov regularization and Newton’s method. This approach leads to an efficient numerical algorithm allowing us to solve supervised learning problem in the most general framework. To confirm this, we present some numerical results showing the efficiency of the proposed approach. Furthermore, the numerical experiment validation is made through academic and real-life data. Thus, the comparison with existing methods and numerical stability of the algorithm is presented in order to show that our approach is better in terms of convergence speed and quality of predicted models.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Rosasco L, De Vito E, Caponnetto A, Piana M, Verri A (2004) Are loss functions all the same? Neural Comput 16:1063–1076CrossRef Rosasco L, De Vito E, Caponnetto A, Piana M, Verri A (2004) Are loss functions all the same? Neural Comput 16:1063–1076CrossRef
2.
go back to reference Chen H, Chiang R, Storey V (2012) Business intelligence and analytics: from big data to big impact. MIS Q Manag Inf Syst 36:1165–1188CrossRef Chen H, Chiang R, Storey V (2012) Business intelligence and analytics: from big data to big impact. MIS Q Manag Inf Syst 36:1165–1188CrossRef
3.
go back to reference Vapnik VN (1995) The nature of statistical learning theory. Springer, Berlin, HeidelbergCrossRef Vapnik VN (1995) The nature of statistical learning theory. Springer, Berlin, HeidelbergCrossRef
4.
go back to reference Devroye L, Györfi L, Lugosi G (1996) A probabilistic theory of pattern recognition, vol 31. Applications of mathematics (New York). Springer, New YorkCrossRef Devroye L, Györfi L, Lugosi G (1996) A probabilistic theory of pattern recognition, vol 31. Applications of mathematics (New York). Springer, New YorkCrossRef
5.
go back to reference Rozas J, C Sánchez-DelBarrio J, Messeguer X, Rozas R (2004) DnaSP, DNA polymorphism analyses by the coalescent and other methods. Bioinformatics 19:2496–2497 CrossRef Rozas J, C Sánchez-DelBarrio J, Messeguer X, Rozas R (2004) DnaSP, DNA polymorphism analyses by the coalescent and other methods. Bioinformatics 19:2496–2497 CrossRef
6.
go back to reference Rani P, Liu C, Sarkar N, Vanman E (2006) An empirical study of machine learning techniques for affect recognition in human robot interaction. Pattern Anal Appl 9:58–69CrossRef Rani P, Liu C, Sarkar N, Vanman E (2006) An empirical study of machine learning techniques for affect recognition in human robot interaction. Pattern Anal Appl 9:58–69CrossRef
7.
go back to reference Kononenko I (2001) Machine learning for medical diagnosis: history, state of the art and perspective. Artif Intell Med 23:89–109CrossRef Kononenko I (2001) Machine learning for medical diagnosis: history, state of the art and perspective. Artif Intell Med 23:89–109CrossRef
8.
go back to reference Osuna E, Freund R, Girosit F (1997) Training support vector machines: an application to face detection. In: Proceedings of IEEE computer society conference on computer vision and pattern recognition. IEEE, pp 130–136 Osuna E, Freund R, Girosit F (1997) Training support vector machines: an application to face detection. In: Proceedings of IEEE computer society conference on computer vision and pattern recognition. IEEE, pp 130–136
9.
go back to reference Farivar F, Ahmadabadi MN (2015) Continuous reinforcement learning to robust fault tolerant control for a class of unknown nonlinear systems. Appl Soft Comput 37:702–714CrossRef Farivar F, Ahmadabadi MN (2015) Continuous reinforcement learning to robust fault tolerant control for a class of unknown nonlinear systems. Appl Soft Comput 37:702–714CrossRef
10.
go back to reference Peng H-W, Lee S-J, Lee C-H (2017) An oblique elliptical basis function network approach for supervised learning applications. Appl Soft Comput 60:552–563CrossRef Peng H-W, Lee S-J, Lee C-H (2017) An oblique elliptical basis function network approach for supervised learning applications. Appl Soft Comput 60:552–563CrossRef
11.
go back to reference Kumar YJ, Salim N, Raza B (2012) Cross-document structural relationship identification using supervised machine learning. Appl Soft Comput 12(10):3124–3131CrossRef Kumar YJ, Salim N, Raza B (2012) Cross-document structural relationship identification using supervised machine learning. Appl Soft Comput 12(10):3124–3131CrossRef
12.
go back to reference Yang Y, Zhang H, Yuan D, Sun D, Li G, Ranjan R, Sun M (2019) Hierarchical extreme learning machine based image denoising network for visual internet of things. Appl Soft Comput 74:747–759CrossRef Yang Y, Zhang H, Yuan D, Sun D, Li G, Ranjan R, Sun M (2019) Hierarchical extreme learning machine based image denoising network for visual internet of things. Appl Soft Comput 74:747–759CrossRef
13.
go back to reference Maimon O, Rokach L (2005) Introduction to supervised methods. Data mining and knowledge discovery handbook. Springer, Boston, MA, pp 149–164CrossRef Maimon O, Rokach L (2005) Introduction to supervised methods. Data mining and knowledge discovery handbook. Springer, Boston, MA, pp 149–164CrossRef
14.
go back to reference Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning. Springer series in statistics. Springer, New YorkCrossRef Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning. Springer series in statistics. Springer, New YorkCrossRef
15.
go back to reference Hilas C, Mastorocostas P (2008) An application of supervised and unsupervised learning approaches to telecommunications fraud detection. Knowl Based Syst 21:721–726CrossRef Hilas C, Mastorocostas P (2008) An application of supervised and unsupervised learning approaches to telecommunications fraud detection. Knowl Based Syst 21:721–726CrossRef
17.
go back to reference Slavakis K, Giannakis G, Mateos G (2014) Modeling and optimization for big data analytics: (statistical) learning tools for our era of data deluge. IEEE Signal Process 31:18–31CrossRef Slavakis K, Giannakis G, Mateos G (2014) Modeling and optimization for big data analytics: (statistical) learning tools for our era of data deluge. IEEE Signal Process 31:18–31CrossRef
18.
go back to reference Emrouznejad A (2016) Big data optimization: recent developments and challenges, vol 18. Springer, Berlin CrossRef Emrouznejad A (2016) Big data optimization: recent developments and challenges, vol 18. Springer, Berlin CrossRef
19.
go back to reference Bertero M, De Mol C, Pike ER (1988) Linear inverse problems with discrete data. II. Stability and regularisation. Inverse Probl 4(3):573–594MathSciNetCrossRef Bertero M, De Mol C, Pike ER (1988) Linear inverse problems with discrete data. II. Stability and regularisation. Inverse Probl 4(3):573–594MathSciNetCrossRef
20.
go back to reference Kirsch A (1996) An introduction to the mathematical theory of inverse problems. Springer, Berlin, HeidelbergCrossRef Kirsch A (1996) An introduction to the mathematical theory of inverse problems. Springer, Berlin, HeidelbergCrossRef
21.
go back to reference Kurkova V (2004) Supervised learning as an inverse problem. Technical report 960. Institute of Computer Science, Academy of Sciences of the Czech Republic Kurkova V (2004) Supervised learning as an inverse problem. Technical report 960. Institute of Computer Science, Academy of Sciences of the Czech Republic
22.
go back to reference Mukherjee S, Niyogi P, Poggio T, Rifkin R (2006) Learning theory: stability is sufficient for generalization and necessary and sufficient for consistency of empirical risk minimization. Adv Comput Math 25:161–193MathSciNetCrossRef Mukherjee S, Niyogi P, Poggio T, Rifkin R (2006) Learning theory: stability is sufficient for generalization and necessary and sufficient for consistency of empirical risk minimization. Adv Comput Math 25:161–193MathSciNetCrossRef
23.
go back to reference De Vito E, Rosasco L, Caponnetto A, Piana M, Verri A (2004) Some properties of regularized kernel methods. J Mach Learn Res 5:1363–1390MathSciNetMATH De Vito E, Rosasco L, Caponnetto A, Piana M, Verri A (2004) Some properties of regularized kernel methods. J Mach Learn Res 5:1363–1390MathSciNetMATH
24.
go back to reference Tikhonov AN, Arsenin VY (1977) Solutions of ill-posed problems. W.H. Winston, New YorkMATH Tikhonov AN, Arsenin VY (1977) Solutions of ill-posed problems. W.H. Winston, New YorkMATH
26.
go back to reference Hadamard J (1923) Lectures on Cauchy’s problem in linear partial differential equations. Yale University Press, New HavenMATH Hadamard J (1923) Lectures on Cauchy’s problem in linear partial differential equations. Yale University Press, New HavenMATH
27.
go back to reference Schmidt M, Fung G, Rosales R (2007) Fast optimization methods for L1 regularization: a comparative study and two new approaches. In: European conference on machine learning. Springer, Berlin, Heidelberg, pp 286–297 Schmidt M, Fung G, Rosales R (2007) Fast optimization methods for L1 regularization: a comparative study and two new approaches. In: European conference on machine learning. Springer, Berlin, Heidelberg, pp 286–297
28.
go back to reference Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, New YorkCrossRef Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, New YorkCrossRef
30.
go back to reference da Silva HP, Carreiras C, Lourenço A, das Neves RC, Ferreira R (2015) Off-the-person electrocardiography: performance assessment and clinical correlation. Health Technol 4:309–318CrossRef da Silva HP, Carreiras C, Lourenço A, das Neves RC, Ferreira R (2015) Off-the-person electrocardiography: performance assessment and clinical correlation. Health Technol 4:309–318CrossRef
31.
go back to reference Gb M, Rg M (2001) The impact of the mit-bih arrhythmia database. IEEE Eng Med Biol 20:45–50CrossRef Gb M, Rg M (2001) The impact of the mit-bih arrhythmia database. IEEE Eng Med Biol 20:45–50CrossRef
32.
go back to reference Kasai HS (2017) A Matlab library for stochastic optimization algorithms. JMLR J Mach Learn Res 19:7942–7946 Kasai HS (2017) A Matlab library for stochastic optimization algorithms. JMLR J Mach Learn Res 19:7942–7946
33.
go back to reference Bottou L (2012) Stochastic gradient descent tricks. Springer, Berlin Heidelberg, pp 421–436 Bottou L (2012) Stochastic gradient descent tricks. Springer, Berlin Heidelberg, pp 421–436
34.
go back to reference Mokhtari A, Eisen M, Ribeiro A (2018) IQN: an incremental quasi-Newton method with local superlinear convergence rate. SIAM J Optim 28(2):1670–1698MathSciNetCrossRef Mokhtari A, Eisen M, Ribeiro A (2018) IQN: an incremental quasi-Newton method with local superlinear convergence rate. SIAM J Optim 28(2):1670–1698MathSciNetCrossRef
35.
go back to reference Johnson R, Zhang T (2013) Accelerating stochastic gradient descent using predictive variance reduction. In: Advances in neural information processing systems, pp 315–323 Johnson R, Zhang T (2013) Accelerating stochastic gradient descent using predictive variance reduction. In: Advances in neural information processing systems, pp 315–323
Metadata
Title
Supervised learning as an inverse problem based on non-smooth loss function
Authors
Soufiane Lyaqini
Mohamed Quafafou
Mourad Nachaoui
Abdelkrim Chakib
Publication date
20-02-2020
Publisher
Springer London
Published in
Knowledge and Information Systems / Issue 8/2020
Print ISSN: 0219-1377
Electronic ISSN: 0219-3116
DOI
https://doi.org/10.1007/s10115-020-01439-2

Other articles of this Issue 8/2020

Knowledge and Information Systems 8/2020 Go to the issue

Premium Partner