Top

Knowledge and Information Systems

Published in:

20-02-2020 | Regular Paper

Supervised learning as an inverse problem based on non-smooth loss function

Authors: Soufiane Lyaqini, Mohamed Quafafou, Mourad Nachaoui, Abdelkrim Chakib

Published in: Knowledge and Information Systems | Issue 8/2020

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

This paper is concerned by solving supervised machine learning problem as an inverse problem. Recently, many works have focused on defining a relationship between supervised learning and the well-known inverse problems. However, this connection between the learning problem and the inverse one has been done in the particular case where the inverse problem is reformulated as a minimization problem with a quadratic cost functional (\(L^2\) cost functional). Although, it is well known that the cost functional can be \(L^1\), \(L^2\) or any positive function that measures the gap between the predicted data and the observed one. Indeed, the use of \(L^1\) loss function for supervised learning problem gives more consistent results (see Rosasco et al. in Neural Comput 16:1063–1076, 2004). This strengthens the idea of reformulating the inverse problem, associated to machine learning problem, into a minimization problem using \( L^{1}\) functional. However, the \(L^{1}\) loss function is non-differentiable, which precludes the use of standard optimization tools. To overcome this difficulty, we propose in this paper a new technique of approximation based on the reformulation of the associated inverse problem into a minimizing one of a slanting cost functional Chen et al. (MIS Q Manag Inf Syst 36:1165–1188, 2012), which is solved using Tikhonov regularization and Newton’s method. This approach leads to an efficient numerical algorithm allowing us to solve supervised learning problem in the most general framework. To confirm this, we present some numerical results showing the efficiency of the proposed approach. Furthermore, the numerical experiment validation is made through academic and real-life data. Thus, the comparison with existing methods and numerical stability of the algorithm is presented in order to show that our approach is better in terms of convergence speed and quality of predicted models.

previous article Exploiting review embedding and user attention for item recommendation

next article Decision support for personalized hospital choice using the DEX hierarchical model with SMAA

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Rosasco L, De Vito E, Caponnetto A, Piana M, Verri A (2004) Are loss functions all the same? Neural Comput 16:1063–1076CrossRef

Chen H, Chiang R, Storey V (2012) Business intelligence and analytics: from big data to big impact. MIS Q Manag Inf Syst 36:1165–1188CrossRef

Vapnik VN (1995) The nature of statistical learning theory. Springer, Berlin, HeidelbergCrossRef

Devroye L, Györfi L, Lugosi G (1996) A probabilistic theory of pattern recognition, vol 31. Applications of mathematics (New York). Springer, New YorkCrossRef

Rozas J, C Sánchez-DelBarrio J, Messeguer X, Rozas R (2004) DnaSP, DNA polymorphism analyses by the coalescent and other methods. Bioinformatics 19:2496–2497 CrossRef

Rani P, Liu C, Sarkar N, Vanman E (2006) An empirical study of machine learning techniques for affect recognition in human robot interaction. Pattern Anal Appl 9:58–69CrossRef

Kononenko I (2001) Machine learning for medical diagnosis: history, state of the art and perspective. Artif Intell Med 23:89–109CrossRef

Osuna E, Freund R, Girosit F (1997) Training support vector machines: an application to face detection. In: Proceedings of IEEE computer society conference on computer vision and pattern recognition. IEEE, pp 130–136

Farivar F, Ahmadabadi MN (2015) Continuous reinforcement learning to robust fault tolerant control for a class of unknown nonlinear systems. Appl Soft Comput 37:702–714CrossRef

10.

Peng H-W, Lee S-J, Lee C-H (2017) An oblique elliptical basis function network approach for supervised learning applications. Appl Soft Comput 60:552–563CrossRef

11.

Kumar YJ, Salim N, Raza B (2012) Cross-document structural relationship identification using supervised machine learning. Appl Soft Comput 12(10):3124–3131CrossRef

12.

Yang Y, Zhang H, Yuan D, Sun D, Li G, Ranjan R, Sun M (2019) Hierarchical extreme learning machine based image denoising network for visual internet of things. Appl Soft Comput 74:747–759CrossRef

13.

Maimon O, Rokach L (2005) Introduction to supervised methods. Data mining and knowledge discovery handbook. Springer, Boston, MA, pp 149–164CrossRef

14.

Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning. Springer series in statistics. Springer, New YorkCrossRef

15.

Hilas C, Mastorocostas P (2008) An application of supervised and unsupervised learning approaches to telecommunications fraud detection. Knowl Based Syst 21:721–726CrossRef

16.

Cucker F, Smale S (2002) On the mathematical foundations of learning. Bull Am Math Soc 39:1–49MathSciNetCrossRef

17.

Slavakis K, Giannakis G, Mateos G (2014) Modeling and optimization for big data analytics: (statistical) learning tools for our era of data deluge. IEEE Signal Process 31:18–31CrossRef

18.

Emrouznejad A (2016) Big data optimization: recent developments and challenges, vol 18. Springer, Berlin CrossRef

19.

Bertero M, De Mol C, Pike ER (1988) Linear inverse problems with discrete data. II. Stability and regularisation. Inverse Probl 4(3):573–594MathSciNetCrossRef

20.

Kirsch A (1996) An introduction to the mathematical theory of inverse problems. Springer, Berlin, HeidelbergCrossRef

21.

Kurkova V (2004) Supervised learning as an inverse problem. Technical report 960. Institute of Computer Science, Academy of Sciences of the Czech Republic

22.

Mukherjee S, Niyogi P, Poggio T, Rifkin R (2006) Learning theory: stability is sufficient for generalization and necessary and sufficient for consistency of empirical risk minimization. Adv Comput Math 25:161–193MathSciNetCrossRef

23.

De Vito E, Rosasco L, Caponnetto A, Piana M, Verri A (2004) Some properties of regularized kernel methods. J Mach Learn Res 5:1363–1390MathSciNetMATH

24.

Tikhonov AN, Arsenin VY (1977) Solutions of ill-posed problems. W.H. Winston, New YorkMATH

25.

Aronszajn N (1950) Theory of reproducing kernels. Trans Am Math Soc 68:337–404MathSciNetCrossRef

26.

Hadamard J (1923) Lectures on Cauchy’s problem in linear partial differential equations. Yale University Press, New HavenMATH

27.

Schmidt M, Fung G, Rosales R (2007) Fast optimization methods for L1 regularization: a comparative study and two new approaches. In: European conference on machine learning. Springer, Berlin, Heidelberg, pp 286–297

28.

Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, New YorkCrossRef

29.

https://archive.ics.uci.edu/ml/datasets/airfoil+self-noise

30.

da Silva HP, Carreiras C, Lourenço A, das Neves RC, Ferreira R (2015) Off-the-person electrocardiography: performance assessment and clinical correlation. Health Technol 4:309–318CrossRef

31.

Gb M, Rg M (2001) The impact of the mit-bih arrhythmia database. IEEE Eng Med Biol 20:45–50CrossRef

32.

Kasai HS (2017) A Matlab library for stochastic optimization algorithms. JMLR J Mach Learn Res 19:7942–7946

33.

Bottou L (2012) Stochastic gradient descent tricks. Springer, Berlin Heidelberg, pp 421–436

34.

Mokhtari A, Eisen M, Ribeiro A (2018) IQN: an incremental quasi-Newton method with local superlinear convergence rate. SIAM J Optim 28(2):1670–1698MathSciNetCrossRef

35.

Johnson R, Zhang T (2013) Accelerating stochastic gradient descent using predictive variance reduction. In: Advances in neural information processing systems, pp 315–323

Title: Supervised learning as an inverse problem based on non-smooth loss function
Authors: Soufiane Lyaqini
Mohamed Quafafou
Mourad Nachaoui
Abdelkrim Chakib
Publication date: 20-02-2020
Publisher: Springer London
Published in: Knowledge and Information Systems / Issue 8/2020
Print ISSN: 0219-1377
Electronic ISSN: 0219-3116
DOI: https://doi.org/10.1007/s10115-020-01439-2

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Other articles of this Issue 8/2020

The examination of the effect of the criterion for neural network’s learning on the effectiveness of the qualitative analysis of multidimensional data

Decision support for personalized hospital choice using the DEX hierarchical model with SMAA

Recommender systems with selfish users

Exploiting review embedding and user attention for item recommendation

L-BiX: incremental sliding-window aggregation over data streams using linear bidirectional aggregating indexes

Simple supervised dissimilarity measure: Bolstering iForest-induced similarity with class information without learning

Premium Partner