Skip to main content
Top
Published in: Pattern Analysis and Applications 4/2018

13-04-2018 | Original Article

A regression model based on the nearest centroid neighborhood

Authors: V. García, J. S. Sánchez, A. I. Marqués, R. Martínez-Peláez

Published in: Pattern Analysis and Applications | Issue 4/2018

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The renowned k-nearest neighbor decision rule is widely used for classification tasks, where the label of any new sample is estimated based on a similarity criterion defined by an appropriate distance function. It has also been used successfully for regression problems where the purpose is to predict a continuous numeric label. However, some alternative neighborhood definitions, such as the surrounding neighborhood, have considered that the neighbors should fulfill not only the proximity property, but also a spatial location criterion. In this paper, we explore the use of the k-nearest centroid neighbor rule, which is based on the concept of surrounding neighborhood, for regression problems. Two support vector regression models were executed as reference. Experimentation over a wide collection of real-world data sets and using fifteen odd different values of k demonstrates that the regression algorithm based on the surrounding neighborhood significantly outperforms the traditional k-nearest neighborhood method and also a support vector regression model with a RBF kernel.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Literature
1.
go back to reference Alcalá-Fdez J, Fernández A, Luengo J, Derrac J, García S, Sánchez L, Herrera F (2011) KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J Mult-Valued Log Soft Comput 17:255–287 Alcalá-Fdez J, Fernández A, Luengo J, Derrac J, García S, Sánchez L, Herrera F (2011) KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J Mult-Valued Log Soft Comput 17:255–287
2.
go back to reference Biau G, Devroye L, Dujimovič V, Krzyzak A (2012) An affine invariant -nearest neighbor regression estimate. J Multivar Anal 112:24–34MathSciNetCrossRef Biau G, Devroye L, Dujimovič V, Krzyzak A (2012) An affine invariant -nearest neighbor regression estimate. J Multivar Anal 112:24–34MathSciNetCrossRef
3.
go back to reference Buza K, Nanopoulos A, Nagy G (2015) Nearest neighbor regression in the presence of bad hubs. Knowl-Based Syst 86:250–260CrossRef Buza K, Nanopoulos A, Nagy G (2015) Nearest neighbor regression in the presence of bad hubs. Knowl-Based Syst 86:250–260CrossRef
4.
go back to reference Caruana R, Niculescu-Mizil A (2004) Data mining in metric space: an empirical analysis of supervised learning performance criteria. In: 10th ACM SIGKDD international conference on knowledge discovery and data mining. New York, pp 69–78 Caruana R, Niculescu-Mizil A (2004) Data mining in metric space: an empirical analysis of supervised learning performance criteria. In: 10th ACM SIGKDD international conference on knowledge discovery and data mining. New York, pp 69–78
5.
go back to reference Chaudhuri B (1996) A new definition of neighborhood of a point in multi-dimensional space. Pattern Recognit Lett 17(1):11–17CrossRef Chaudhuri B (1996) A new definition of neighborhood of a point in multi-dimensional space. Pattern Recognit Lett 17(1):11–17CrossRef
6.
go back to reference Cheng CB, Lee E (1999) Nonparametric fuzzy regression \(k\)-nn and kernel smoothing techniques. Comput Math Appl 38(3–4):239–251MathSciNetCrossRef Cheng CB, Lee E (1999) Nonparametric fuzzy regression \(k\)-nn and kernel smoothing techniques. Comput Math Appl 38(3–4):239–251MathSciNetCrossRef
7.
go back to reference Dasarathy B (1990) Nearest neighbor (NN) norms: NN pattern classification techniques. IEEE Computer Society Press, Los Alomitos Dasarathy B (1990) Nearest neighbor (NN) norms: NN pattern classification techniques. IEEE Computer Society Press, Los Alomitos
8.
go back to reference Dell’Acqua P, Belloti F, Berta R, Gloria AD (2015) Time-aware multivariate nearest neighbor regression methods for traffic flow prediction. IEEE Trans Intell Transp Syst 16(6):3393–3402CrossRef Dell’Acqua P, Belloti F, Berta R, Gloria AD (2015) Time-aware multivariate nearest neighbor regression methods for traffic flow prediction. IEEE Trans Intell Transp Syst 16(6):3393–3402CrossRef
9.
go back to reference Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30MathSciNetMATH Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30MathSciNetMATH
10.
go back to reference Eronen AJ, Klapuri AP (2010) Music tempo estimation with \(k\)-nn regression. IEEE Trans Audio Speech Lang Process 18(1):50–57CrossRef Eronen AJ, Klapuri AP (2010) Music tempo estimation with \(k\)-nn regression. IEEE Trans Audio Speech Lang Process 18(1):50–57CrossRef
11.
go back to reference García S, Fernández A, Luengo J, Herrera F (2010) Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power. Inf Sci 130:2044–2064CrossRef García S, Fernández A, Luengo J, Herrera F (2010) Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power. Inf Sci 130:2044–2064CrossRef
12.
go back to reference García V, Sánchez JS, Martín-Félez R, Mollineda RA (2012) Surrounding neighborhood-based SMOTE for learning from imbalanced data sets. Prog Artif Intell 1(4):347–362CrossRef García V, Sánchez JS, Martín-Félez R, Mollineda RA (2012) Surrounding neighborhood-based SMOTE for learning from imbalanced data sets. Prog Artif Intell 1(4):347–362CrossRef
13.
go back to reference Guyader A, Hengartner N (2013) On the mutual nearest neighbors estimate in regression. J Mach Learn Res 14:2361–2376MathSciNetMATH Guyader A, Hengartner N (2013) On the mutual nearest neighbors estimate in regression. J Mach Learn Res 14:2361–2376MathSciNetMATH
14.
go back to reference Hu C, Jain G, Zhang P, Schmidt C, Gomadam P, Gorka T (2014) Data-driven method based on particle swarm optimization and k-nearest neighbor regression for estimating capacity of lithium-ion battery. Appl Energy 129:49–55CrossRef Hu C, Jain G, Zhang P, Schmidt C, Gomadam P, Gorka T (2014) Data-driven method based on particle swarm optimization and k-nearest neighbor regression for estimating capacity of lithium-ion battery. Appl Energy 129:49–55CrossRef
15.
go back to reference Lee SY, Kang P, Cho S (2014) Probabilistic local reconstruction for \(k\)-nn regression and its application to virtual metrology in semiconductor manufacturing. Neurocomputing 131:427–439CrossRef Lee SY, Kang P, Cho S (2014) Probabilistic local reconstruction for \(k\)-nn regression and its application to virtual metrology in semiconductor manufacturing. Neurocomputing 131:427–439CrossRef
16.
go back to reference Leon F, Popescu E (2017) Using large margin nearest neighbor regression algorithm to predict student grades based on social media traces. In: International conference in methodologies and intelligent systems for technology enhanced learning. Porto, Portugal, pp 12–19 Leon F, Popescu E (2017) Using large margin nearest neighbor regression algorithm to predict student grades based on social media traces. In: International conference in methodologies and intelligent systems for technology enhanced learning. Porto, Portugal, pp 12–19
17.
18.
go back to reference Ounpraseuth S, Lensing SY, Spencer HJ, Kodell RL (2012) Estimating misclassification error: a closer look at cross-validation based methods. BMC Res Notes 5(656):1–12 Ounpraseuth S, Lensing SY, Spencer HJ, Kodell RL (2012) Estimating misclassification error: a closer look at cross-validation based methods. BMC Res Notes 5(656):1–12
19.
go back to reference Sánchez JS, Marqués AI (2002) Enhanced neighbourhood specifications for pattern classification. In: Pattern recognition and string matching, pp 673–702CrossRef Sánchez JS, Marqués AI (2002) Enhanced neighbourhood specifications for pattern classification. In: Pattern recognition and string matching, pp 673–702CrossRef
20.
go back to reference Sánchez JS, Pla F, Ferri FJ (1998) Improving the k-NCN classification rule through heuristic modifications. Pattern Recognit Lett 19(13):1165–1170CrossRef Sánchez JS, Pla F, Ferri FJ (1998) Improving the k-NCN classification rule through heuristic modifications. Pattern Recognit Lett 19(13):1165–1170CrossRef
21.
go back to reference Shevade SK, Keerthi SS, Bhattacharyya C, Murthy KRK (2000) Improvements to the SMO algorithm for SVM regression. IEEE Trans Neural Netw 11(5):1188–1193CrossRef Shevade SK, Keerthi SS, Bhattacharyya C, Murthy KRK (2000) Improvements to the SMO algorithm for SVM regression. IEEE Trans Neural Netw 11(5):1188–1193CrossRef
22.
go back to reference Song Y, Liang J, Lu J, Zhao X (2017) An efficient instance selection algorithm for \(k\) nearest neighbor regression. Neurocomputing 251(16):26–34CrossRef Song Y, Liang J, Lu J, Zhao X (2017) An efficient instance selection algorithm for \(k\) nearest neighbor regression. Neurocomputing 251(16):26–34CrossRef
23.
go back to reference Treiber N, Kramer O (2015) Evolutionary feature weighting for wind power prediction with nearest neighbor regression. In: IEEE congress on evolutionary computation. Sendai, Japan, pp 332–337 Treiber N, Kramer O (2015) Evolutionary feature weighting for wind power prediction with nearest neighbor regression. In: IEEE congress on evolutionary computation. Sendai, Japan, pp 332–337
24.
go back to reference Xiao Y, Griffin MP, Lake DE, Moorman JR (2010) Nearest-neighbor and logistic regression analyses of clinical and heart rate characteristics in the early diagnosis of neonatal sepsis. Med Decis Making 30(2):258–266CrossRef Xiao Y, Griffin MP, Lake DE, Moorman JR (2010) Nearest-neighbor and logistic regression analyses of clinical and heart rate characteristics in the early diagnosis of neonatal sepsis. Med Decis Making 30(2):258–266CrossRef
25.
go back to reference Yang S, Zhao C (2006) Regression nearest neighbor in face recognition. In: 18th International conference on pattern recognition. Hong Kong, China, pp 515–518 Yang S, Zhao C (2006) Regression nearest neighbor in face recognition. In: 18th International conference on pattern recognition. Hong Kong, China, pp 515–518
26.
go back to reference Yao Z, Ruzo W (2006) A regression-based k nearest neighbor algorithm for gene function prediction from heterogeneous data. BMC Bioinform 7(1):1–11CrossRef Yao Z, Ruzo W (2006) A regression-based k nearest neighbor algorithm for gene function prediction from heterogeneous data. BMC Bioinform 7(1):1–11CrossRef
27.
go back to reference Yu J, Hong C (2017) Exemplar-based 3D human pose estimation with sparse spectral embedding. Neurocomputing 269:82–89CrossRef Yu J, Hong C (2017) Exemplar-based 3D human pose estimation with sparse spectral embedding. Neurocomputing 269:82–89CrossRef
28.
go back to reference Zhang J, Yim YS, Yang J (1997) Intelligent selection of instances for prediction functions in lazy learning algorithms. Artif Intell Rev 11(1–5):175–191CrossRef Zhang J, Yim YS, Yang J (1997) Intelligent selection of instances for prediction functions in lazy learning algorithms. Artif Intell Rev 11(1–5):175–191CrossRef
Metadata
Title
A regression model based on the nearest centroid neighborhood
Authors
V. García
J. S. Sánchez
A. I. Marqués
R. Martínez-Peláez
Publication date
13-04-2018
Publisher
Springer London
Published in
Pattern Analysis and Applications / Issue 4/2018
Print ISSN: 1433-7541
Electronic ISSN: 1433-755X
DOI
https://doi.org/10.1007/s10044-018-0706-3

Other articles of this Issue 4/2018

Pattern Analysis and Applications 4/2018 Go to the issue

Premium Partner