Skip to main content

2016 | OriginalPaper | Buchkapitel

9.  k-Nearest Neighbor Prediction Functions

verfasst von : Brian Steele, John Chandler, Swarna Reddy

Erschienen in: Algorithms for Data Science

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The purpose of the k-nearest neighbor prediction function is to predict a target variable from a predictor vector. Commonly, the target is a categorical variable, a label identifying the group from which the observation was drawn. The analyst has no knowledge of the membership label but does have the information coded in the attributes of the predictor vector. The predictor vector and the k-nearest neighbor prediction function generate a prediction of membership. In addition to qualitative attributes, the k-nearest neighbor prediction function may be used to predict quantitative target variables. The k-nearest-neighbor prediction functions are conceptually and computationally simple and often rival far more sophisticated prediction functions with respect to accuracy. The functions are nonparametric in the sense that the mathematical basis supporting the prediction functions is not a model. Instead the k-nearest neighbor prediction function utilizes a set of training observations on target and predictor vector pairs and, in essence examines the target values of the training observations nearest to the target. If the target variable is a group membership label, the target is predicted to be to the most common label among the nearest neighbors. If the target is quantitative, then the prediction is an average of the target values associated with the nearest neighbors.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
Predictive analytics is used as a term primarily in data science. Statistics and computer science have their own names, predominantly, statistical learning and machine learning, respectively.
 
2
The boundary would be a plane.
 
3
We’re assuming that accuracy of the prediction function is estimated by applying the prediction function to the training observations and computing the proportion of correct predictions.
 
4
A point in the \(\mathbb{R}^{2}\) is a two-tuple, otherwise known as a pair.
 
5
Darkness is recorded on a scale of 0–255.
 
6
Sorting algorithm run-times are, at best, on the order of nlog(n) [56].
 
7
NASDAQ is the abbreviation for the National Association of Securities Dealers Automated Quotations system.
 
8
Centering is a type of detrending [27].
 
9
By chronologically ordered, we mean that i < j implies s i was observed before s j .
 
10
Precisely, 1 =  k = 0 α(1 −α) k . In practice, the sum is sufficiently close to 1 if the number of terms exceeds 100∕α.
 
11
We will use the fOrder function programmed in the previous tutorial and it requires that the training set is a dictionary.
 
12
Recall that the sample variance is, essentially, the mean squared difference between the observations and the sample mean.
 
13
A lazy alternative is to draw independent random samples E 1, , E k . A test observation may appear in more than one test set. This presents no risk of bias, but there’s less information gained on the second and subsequent predictions of a test observation.
 
14
The Wisconsin breast cancer data set is widely used as a machine learning benchmark data set.
 
Literatur
9.
Zurück zum Zitat N.A. Campbell, R.J. Mahon, A multivariate study of variation in two species of rock crab of genus leptograpsus. Aust. J. Zool. 22, 417–425 (1974)CrossRef N.A. Campbell, R.J. Mahon, A multivariate study of variation in two species of rock crab of genus leptograpsus. Aust. J. Zool. 22, 417–425 (1974)CrossRef
27.
Zurück zum Zitat A.C. Harvey, Forecasting, Structural Time Series and the Kalman Filter (Cambridge University Press, Cambridge, 1989) A.C. Harvey, Forecasting, Structural Time Series and the Kalman Filter (Cambridge University Press, Cambridge, 1989)
56.
58.
Zurück zum Zitat B.M. Steele, Exact bagging of k-nearest neighbor learners. Mach. Learn. 74, 235–255 (2009)CrossRef B.M. Steele, Exact bagging of k-nearest neighbor learners. Mach. Learn. 74, 235–255 (2009)CrossRef
63.
Zurück zum Zitat W.G. Van Panhuis, J. Grefenstette, S.. Jung, N.S. Chok, A. Cross, H. Eng, B.Y. Lee, V. Zadorozhny, S. Brown, D. Cummings, D.S Burke, Contagious diseases in the United States from 1888 to the present. N. Engl. J. Med. 369 (22), 2152–2158 (2013) W.G. Van Panhuis, J. Grefenstette, S.. Jung, N.S. Chok, A. Cross, H. Eng, B.Y. Lee, V. Zadorozhny, S. Brown, D. Cummings, D.S Burke, Contagious diseases in the United States from 1888 to the present. N. Engl. J. Med. 369 (22), 2152–2158 (2013)
64.
Zurück zum Zitat W.N. Venables, B.D. Ripley, Modern Applied Statistics with S, 4th edn. (Springer, New York, 2002)CrossRefMATH W.N. Venables, B.D. Ripley, Modern Applied Statistics with S, 4th edn. (Springer, New York, 2002)CrossRefMATH
Metadaten
Titel
k-Nearest Neighbor Prediction Functions
verfasst von
Brian Steele
John Chandler
Swarna Reddy
Copyright-Jahr
2016
DOI
https://doi.org/10.1007/978-3-319-45797-0_9