nach oben

Erschienen in:

2021 | OriginalPaper | Buchkapitel

Estimation of Classification Rules From Partially Classified Data

verfasst von : Geoffrey McLachlan, Daniel Ahfock

Erschienen in: Data Analysis and Rationality in a Complex World

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

We consider the situation where the observed sample contains some observations whose class of origin is known (that is, they are classified with respect to the g underlying classes of interest), and where the remaining observations in the sample are unclassified (that is, their class labels are unknown). For class-conditional distributions taken to be known up to a vector of unknown parameters, the aim is to estimate the Bayes’ rule of allocation for the allocation of subsequent unclassified observations. Estimation on the basis of both the classified and unclassified data can be undertaken in a straightforward manner by fitting a g-component mixture model by maximum likelihood (ML) via the EM algorithm in the situation where the observed data can be assumed to be an observed random sample from the adopted mixture distribution. This assumption applies if the missing-data mechanism is ignorable in the terminology pioneered by Rubin (1976). An initial likelihood approach was to use the so-called classification ML approach whereby the missing labels are taken to be parameters to be estimated along with the parameters of the class-conditional distributions. However, as it can lead to inconsistent estimates, the focus of attention switched to the mixture ML approach after the appearance of the EM algorithm (Dempster et al. 1977). Particular attention is given here to the asymptotic relative efficiency (ARE) of the Bayes’ rule estimated from a partially classified sample. Lastly, we consider briefly some recent results in situations where the missing label pattern is non-ignorable for the purposes of ML estimation for the mixture model.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Specification of Basis Spacing for Process Convolution Gaussian Process Models

Nächstes Kapitel Correspondence Analysis and Kriging: Projection of Quantitative Information on the Factorial Maps

Ahfock, D., McLachlan, G.J.: On missing data patterns in semi-supervised learning. ePreprint arXiv:1904.02883 (2019a)

Ahfock, D., McLachlan, G.J.: An apparent paradox: a classifier trained from a partially classified sample may have smaller expected error rate than that if the sample were completely classified. ePreprint arXiv:1910.09189v2 (2019b)

Cannings, T.I., Fan, Y., Samworth, R.J.: Classification with imperfect training labels. Biometrika 107, 311–330 (2020)

Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm (with discussion). J. R. Statist. Soc. B 39, 1–22 (1977)

Efron, B.: The efficiency of logistic regression compared to normal discriminant analysis. J. Am. Stat. Assoc. 70, 892–898 (1975)

Gallaugher, M., McNicholas, P.D.: On fractionally-supervised classification: weight selection and extension to the multivariate \(t\)-distribution. J. Classif. 36, 232–265 (2019)

Ganesalingam, S., McLachlan, G.J.: The efficiency of a linear discriminant function based on unclassified initial samples. Biometrika 65, 658–665 (1978)

Hartley, H.O., Rao, J.N.K.: Classification and estimation in analysis of variance problems. Int. Stat. Rev. 36, 141–147 (1968)

Hills, M.: Allocation rules and their error rates (with discussion). J. R. Statist. Soc. B 28, 1–31 (1966)

McLachlan, G.J.: Asymptotic results for discriminant analysis when the initial samples are misclassified. Technometrics 14, 415–422 (1972)

McLachlan, G.J.: The classification and mixture maximum likelihood approaches to cluster analysis. In: Krishnaiah, P.A. Kanal, L. (eds.) Handbook of Statistics Vol. 2, North-Holland, pp. 199–208. Amsterdam (1982)

McLachlan, G.J.: Discriminant Analysis and Statistical Pattern Recognition. Wiley, New York (1992)

McLachlan, G.J.: Iterative reclassification procedure for constructing and asymptotically optimal rule of allocation in discriminant analysis. J. Am. Stat. Assoc. 70, 365–369 (1975)

McLachlan, G.J., Basford, K.E.: Mixture Models: Inference and Applications to Clustering. Marcel Dekker, New York (1988)

McLachlan, G.J., Gordon, R.D.: Mixture models for partially unclassified data: a case study of renal venous renin levels in essential hypertension. Stat. Med. 8, 1291–1300 (1989)

McLachlan, G.J., Peel, D.: Finite Mixture Models. Wiley, New York (2000)

McLachlan, G.J., Scot, D.: On the asymptotic relative efficiency of the linear discriminant function under partial nonrandom classification of the training data. Stat. Comput. Simul. 52, 452–456 (1995)

O’Neill, T.J.: Normal discrimination with unclassified observations. J. Am. Stat. Assoc. 73, 821–826 (1978)

Rubin, D.B.: Inference and missing data. Biometrika 63, 581–592 (1976)

Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6, 461–464 (1978)

Smith, C.A.B.: Contribution to the discussion of paper by M. Hills. J. R. Stat. Soc. 28, 21 (1966)

Vrbik, I., McNicholas, P.D.: Fractionally-supervised classification. J. Classif. 32, 359–381 (2015)

Titel: Estimation of Classification Rules From Partially Classified Data
verfasst von: Geoffrey McLachlan
Daniel Ahfock
Verlag: Springer International Publishing
Buch: Data Analysis and Rationality in a Complex World
Print ISBN: 978-3-030-60103-4

Electronic ISBN: 978-3-030-60104-1

Copyright-Jahr: 2021
DOI: https://doi.org/10.1007/978-3-030-60104-1_17

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner