nach oben

Erschienen in:

2016 | OriginalPaper | Buchkapitel

9. k-Nearest Neighbor Prediction Functions

verfasst von : Brian Steele, John Chandler, Swarna Reddy

Erschienen in: Algorithms for Data Science

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

The purpose of the k-nearest neighbor prediction function is to predict a target variable from a predictor vector. Commonly, the target is a categorical variable, a label identifying the group from which the observation was drawn. The analyst has no knowledge of the membership label but does have the information coded in the attributes of the predictor vector. The predictor vector and the k-nearest neighbor prediction function generate a prediction of membership. In addition to qualitative attributes, the k-nearest neighbor prediction function may be used to predict quantitative target variables. The k-nearest-neighbor prediction functions are conceptually and computationally simple and often rival far more sophisticated prediction functions with respect to accuracy. The functions are nonparametric in the sense that the mathematical basis supporting the prediction functions is not a model. Instead the k-nearest neighbor prediction function utilizes a set of training observations on target and predictor vector pairs and, in essence examines the target values of the training observations nearest to the target. If the target variable is a group membership label, the target is predicted to be to the most common label among the nearest neighbors. If the target is quantitative, then the prediction is an average of the target values associated with the nearest neighbors.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Cluster Analysis

Nächstes Kapitel The Multinomial Naïve Bayes Prediction Function

Predictive analytics is used as a term primarily in data science. Statistics and computer science have their own names, predominantly, statistical learning and machine learning, respectively.

The boundary would be a plane.

We’re assuming that accuracy of the prediction function is estimated by applying the prediction function to the training observations and computing the proportion of correct predictions.

A point in the \(\mathbb{R}^{2}\) is a two-tuple, otherwise known as a pair.

Darkness is recorded on a scale of 0–255.

Sorting algorithm run-times are, at best, on the order of nlog(n) [56].

NASDAQ is the abbreviation for the National Association of Securities Dealers Automated Quotations system.

Centering is a type of detrending [27].

By chronologically ordered, we mean that i < j implies s _i was observed before s _j.

Precisely, 1 = ∑ _k = 0 ^∞ α(1 −α)^k. In practice, the sum is sufficiently close to 1 if the number of terms exceeds 100∕α.

We will use the fOrder function programmed in the previous tutorial and it requires that the training set is a dictionary.

Recall that the sample variance is, essentially, the mean squared difference between the observations and the sample mean.

A lazy alternative is to draw independent random samples E ₁, …, E _k. A test observation may appear in more than one test set. This presents no risk of bias, but there’s less information gained on the second and subsequent predictions of a test observation.

The Wisconsin breast cancer data set is widely used as a machine learning benchmark data set.

N.A. Campbell, R.J. Mahon, A multivariate study of variation in two species of rock crab of genus leptograpsus. Aust. J. Zool. 22, 417–425 (1974)CrossRef

27.

A.C. Harvey, Forecasting, Structural Time Series and the Kalman Filter (Cambridge University Press, Cambridge, 1989)

32.

Kaggle, https://www.kaggle.com/competitions. Accessed 12 June 2016

56.

S.S. Skiena, The Algorithm Design Manual, 2nd edn. (Springer, New York, 2008)CrossRefMATH

58.

B.M. Steele, Exact bagging of k-nearest neighbor learners. Mach. Learn. 74, 235–255 (2009)CrossRef

63.

W.G. Van Panhuis, J. Grefenstette, S.. Jung, N.S. Chok, A. Cross, H. Eng, B.Y. Lee, V. Zadorozhny, S. Brown, D. Cummings, D.S Burke, Contagious diseases in the United States from 1888 to the present. N. Engl. J. Med. 369 (22), 2152–2158 (2013)

64.

W.N. Venables, B.D. Ripley, Modern Applied Statistics with S, 4th edn. (Springer, New York, 2002)CrossRefMATH

Titel: k-Nearest Neighbor Prediction Functions
verfasst von: Brian Steele
John Chandler
Swarna Reddy
Verlag: Springer International Publishing
Buch: Algorithms for Data Science
Print ISBN: 978-3-319-45795-6

Electronic ISBN: 978-3-319-45797-0

Copyright-Jahr: 2016
DOI: https://doi.org/10.1007/978-3-319-45797-0_9

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"