Skip to main content
Erschienen in: Pattern Analysis and Applications 1/2021

23.06.2020 | Theoretical advances

Nonparametric “anti-Bayesian” quantile-based pattern classification

verfasst von: Fatemeh Mahmoudi, Mostafa Razmkhah, B. John Oommen

Erschienen in: Pattern Analysis and Applications | Ausgabe 1/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Parametric and nonparametric pattern recognition have been studied for almost a century based on a Bayesian paradigm, which is, in turn, founded on the principles of Bayes theorem. It is well known that the accuracy of the Bayes classifier cannot be exceeded. Typically, this reduces to comparing the testing sample to mean or median of the respective distributions. Recently, Oommen and his co-authors have presented a pioneering and non-intuitive paradigm, namely that of achieving the classification by comparing the testing sample with another descriptor, which could also be quite distant from the mean. This paradigm has been termed as being “anti-Bayesian,” and it essentially uses the quantiles of the distributions to achieve the pattern recognition. Such classifiers attain the optimal Bayesian accuracy for symmetric distributions even though they operate with a non-intuitive philosophy. While this paradigm has been applied in a number of domains (briefly explained in the body of this paper), its application for nonparametric domains has been limited. This paper explains, in detail, how such quantile-based classification can be extended to the nonparametric world, using both traditional and kernel-based strategies. The paper analyzes the methodology of such nonparametric schemes and their robustness. From a fundamental perspective, the paper utilizes the so-called large sample theory to derive strong asymptotic results that pertain to the equivalence between the parametric and nonparametric schemes for large samples. Apart from the new theoretical results, the paper also presents experimental results demonstrating their power. These results pertain to artificial data sets and also involve a real-life breast cancer data set obtained from the University Hospital Centre of Coimbra. The experimental results clearly confirm the power of the proposed “anti-Bayesian” procedure, especially when approached from a nonparametric perspective.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
In the last century, there are, indeed, tens of thousands of papers describing the art and science of Bayesian classification—for a myriad of distributions and applications. In this paper, we do not attempt a survey of the field.
 
2
We are very grateful to the anonymous referee of the previous version of the paper, who requested this.
 
3
Initially, the authors of [31] stated that the classification was based on the order statistics of the distribution, and this was later rectified [33].
 
4
With going into too many details, we refer the reader to [5], which is a key reference in this field.
 
5
To be fair to the authors of [12, 22, 34, 35], one must grant them the credit that they were able to achieve their nonparametric results by using the “anti-Bayesian” paradigm in multidimensions, as opposed to unidimensions, as we have done here!
 
6
The proof of the theorem is omitted, since it is found in the literature. Also, we refer the interested reader to [6] for more information about the various types of convergence.
 
7
It is pertinent to mention that the accuracy of any classifier can and will never exceed that of a Bayesian classifier. The amazing thing is that we have been able to attain to an accuracy quite close to the optimal, even though we have worked in a counterintuitive manner, and also made no assumption about the underlying distribution!
 
8
For more details about outliers in statistical analysis, we refer the reader to [8, 16, 25, 27].
 
9
The data may be obtained from the UCI Repository of Machine Learning databases at archive.ics.uci.edu/ml.
 
Literatur
1.
Zurück zum Zitat Ahsanullah M, Nevzorov VB (2005) Order statistics: examples and exercises. Nova Publishers, HauppaugeMATH Ahsanullah M, Nevzorov VB (2005) Order statistics: examples and exercises. Nova Publishers, HauppaugeMATH
2.
Zurück zum Zitat Aitkin M, Wilson GT (1980) Mixture models, outliers, and the EM algorithm. Technometrics 22(3):325–331CrossRef Aitkin M, Wilson GT (1980) Mixture models, outliers, and the EM algorithm. Technometrics 22(3):325–331CrossRef
3.
4.
Zurück zum Zitat Altman N, Léger C (1995) Bandwidth selection for kernel distribution function estimation. J Stat Plan Inference 46(2):195–214MathSciNetCrossRef Altman N, Léger C (1995) Bandwidth selection for kernel distribution function estimation. J Stat Plan Inference 46(2):195–214MathSciNetCrossRef
5.
Zurück zum Zitat Arnold BC, Balakrishnan N, Nagaraja HN (2008) A first course in order statistics. SIAM, PhiladelphiaCrossRef Arnold BC, Balakrishnan N, Nagaraja HN (2008) A first course in order statistics. SIAM, PhiladelphiaCrossRef
6.
Zurück zum Zitat Athreya KB, Lahiri SN (2006) Measure theory and probability theory. Springer, BerlinMATH Athreya KB, Lahiri SN (2006) Measure theory and probability theory. Springer, BerlinMATH
7.
Zurück zum Zitat Azzalini A (1981) A note on the estimation of a distribution function and quantiles by a kernel method. Biometrika 68(1):326–328MathSciNetCrossRef Azzalini A (1981) A note on the estimation of a distribution function and quantiles by a kernel method. Biometrika 68(1):326–328MathSciNetCrossRef
8.
Zurück zum Zitat Barnett V, Lewis T (1978) Outliers in statistical data. Wiley, HobokenMATH Barnett V, Lewis T (1978) Outliers in statistical data. Wiley, HobokenMATH
10.
Zurück zum Zitat Bishop CM (2006) Pattern recognition and machine learning (information science and statistics). Springer, Berlin, HeidelbergMATH Bishop CM (2006) Pattern recognition and machine learning (information science and statistics). Springer, Berlin, HeidelbergMATH
11.
Zurück zum Zitat David HA, Nagaraja HN (2004) Order statistics. Wiley, Hoboken David HA, Nagaraja HN (2004) Order statistics. Wiley, Hoboken
12.
Zurück zum Zitat Hammer H, Yazidi A, Oommen BJ (2017) “Anti-Bayesian” flat and hierarchical clustering using symmetric quantiloids. Inf Sci 418–419:495–512MathSciNetCrossRef Hammer H, Yazidi A, Oommen BJ (2017) “Anti-Bayesian” flat and hierarchical clustering using symmetric quantiloids. Inf Sci 418–419:495–512MathSciNetCrossRef
13.
Zurück zum Zitat Hawkins DM (1980) Identification of outliers. Chapman and Hall, LondonCrossRef Hawkins DM (1980) Identification of outliers. Chapman and Hall, LondonCrossRef
14.
Zurück zum Zitat Hollander M, Wolfe DA, Chicken E (2013) Nonparametric statistical methods. Wiley, HobokenMATH Hollander M, Wolfe DA, Chicken E (2013) Nonparametric statistical methods. Wiley, HobokenMATH
15.
Zurück zum Zitat Hu L (2015) A note on order statistics-based parametric pattern classification. Pattern Recognit 48(1):43–49CrossRef Hu L (2015) A note on order statistics-based parametric pattern classification. Pattern Recognit 48(1):43–49CrossRef
16.
Zurück zum Zitat Huber PJ (2011) Robust statistics. In: International encyclopedia of statistical science. pp 1248–1251 Huber PJ (2011) Robust statistics. In: International encyclopedia of statistical science. pp 1248–1251
17.
Zurück zum Zitat Kothari CR (2004) Research methodology: methods and techniques. New Age International, New Delhi Kothari CR (2004) Research methodology: methods and techniques. New Age International, New Delhi
18.
Zurück zum Zitat Leech NL, Onwuegbuzie AJ (2002) A call for greater use of nonparametric statistics. In: Mid-south educational research association annual meeting Leech NL, Onwuegbuzie AJ (2002) A call for greater use of nonparametric statistics. In: Mid-south educational research association annual meeting
20.
Zurück zum Zitat Nguyen-Trang T, Vo-Van T (2017) A new approach for determining the prior probabilities in the classification problem by Bayesian method. Adv Data Anal Classif 11:629–643MathSciNetCrossRef Nguyen-Trang T, Vo-Van T (2017) A new approach for determining the prior probabilities in the classification problem by Bayesian method. Adv Data Anal Classif 11:629–643MathSciNetCrossRef
21.
Zurück zum Zitat Oommen BJ, Thomas A (2014) Optimal order statistics-based “anti-Bayesian” parametric pattern classification for the exponential family. Pattern Recognit 47:40–55CrossRef Oommen BJ, Thomas A (2014) Optimal order statistics-based “anti-Bayesian” parametric pattern classification for the exponential family. Pattern Recognit 47:40–55CrossRef
22.
Zurück zum Zitat Oommen BJ, Khoury R, Schmidt A (2015) Text classification using novel “anti-Bayesian” techniques. In: Nunez M, Nguyen N, Camacho D, Trawinski B (eds) Computational collective intelligence. Lecture notes in computer science, vol 9329. pp 1–15 Oommen BJ, Khoury R, Schmidt A (2015) Text classification using novel “anti-Bayesian” techniques. In: Nunez M, Nguyen N, Camacho D, Trawinski B (eds) Computational collective intelligence. Lecture notes in computer science, vol 9329. pp 1–15
23.
Zurück zum Zitat Patrício M, Pereira J, Crisóstomo J, Matafome P, Gomes M, Seiça R, Caramelo F (2018) Using resistin, glucose, age and BMI to predict the presence of breast cancer. BMC Cancer 18:29CrossRef Patrício M, Pereira J, Crisóstomo J, Matafome P, Gomes M, Seiça R, Caramelo F (2018) Using resistin, glucose, age and BMI to predict the presence of breast cancer. BMC Cancer 18:29CrossRef
24.
Zurück zum Zitat Rosenblatt M (1956) Remarks on some nonparametric estimates of a density function. Ann Math Stat 27(3):832–837MathSciNetCrossRef Rosenblatt M (1956) Remarks on some nonparametric estimates of a density function. Ann Math Stat 27(3):832–837MathSciNetCrossRef
25.
Zurück zum Zitat Rousseeuw PJ, Leroy AM (2005) Robust regression and outlier detection. Wiley, HobokenMATH Rousseeuw PJ, Leroy AM (2005) Robust regression and outlier detection. Wiley, HobokenMATH
26.
Zurück zum Zitat Santhanam V, Morariu VI, Harwood D, Davis LS (2016) A non-parametric approach to extending generic binary classifiers for multi-classification. Pattern Recognit 58:149–158CrossRef Santhanam V, Morariu VI, Harwood D, Davis LS (2016) A non-parametric approach to extending generic binary classifiers for multi-classification. Pattern Recognit 58:149–158CrossRef
27.
Zurück zum Zitat Scott D W (2004) Partial mixture estimation and outlier detection in data and regression. In: Hubert M, Pison G, Struyf A, Van Aelst S (eds) Theory and applications of recent robust methods. Statistics for industry and technology. pp 297–306 Scott D W (2004) Partial mixture estimation and outlier detection in data and regression. In: Hubert M, Pison G, Struyf A, Van Aelst S (eds) Theory and applications of recent robust methods. Statistics for industry and technology. pp 297–306
28.
Zurück zum Zitat Serfling RJ (2009) Approximation theorems of mathematical statistics. Wiley, HobokenMATH Serfling RJ (2009) Approximation theorems of mathematical statistics. Wiley, HobokenMATH
29.
30.
Zurück zum Zitat Thomas A, Oommen BJ (2012) Optimal “anti-Bayesian” parametric pattern classification for the exponential family using order statistics criteria. In: Alvarez L, Mejail M, Gomez L, Jacobo J (eds) Progress in pattern recognition, image analysis, computer vision, and applications. CIARP 2012. Lecture notes in computer science, vol 7441. pp 1–13 Thomas A, Oommen BJ (2012) Optimal “anti-Bayesian” parametric pattern classification for the exponential family using order statistics criteria. In: Alvarez L, Mejail M, Gomez L, Jacobo J (eds) Progress in pattern recognition, image analysis, computer vision, and applications. CIARP 2012. Lecture notes in computer science, vol 7441. pp 1–13
31.
Zurück zum Zitat Thomas A, Oommen BJ (2013) The fundamental theory of optimal “anti-Bayesian” parametric pattern classification using order statistics criteria. Pattern Recognit 46(1):376–388CrossRef Thomas A, Oommen BJ (2013) The fundamental theory of optimal “anti-Bayesian” parametric pattern classification using order statistics criteria. Pattern Recognit 46(1):376–388CrossRef
32.
Zurück zum Zitat Thomas A, Oommen B J (2013) Order statistics-based parametric classification for multi-dimensional distributions. Pattern Recognit 46(12):3472–3482CrossRef Thomas A, Oommen B J (2013) Order statistics-based parametric classification for multi-dimensional distributions. Pattern Recognit 46(12):3472–3482CrossRef
33.
Zurück zum Zitat Thomas A, Oommen BJ (2014) Corrigendum to three papers that deal with “anti-Bayesian” pattern recognition. Pattern Recognit 47(6):2301–2302CrossRef Thomas A, Oommen BJ (2014) Corrigendum to three papers that deal with “anti-Bayesian” pattern recognition. Pattern Recognit 47(6):2301–2302CrossRef
34.
Zurück zum Zitat Thomas A, Oommen BJ (2013) Ultimate order statistics-based prototype reduction schemes. In: Cranefield S, Nayak A (eds) AI 2013: Advances in artificial intelligence. AI 2013. Lecture notes in computer science, vol 8272. pp 421–433 Thomas A, Oommen BJ (2013) Ultimate order statistics-based prototype reduction schemes. In: Cranefield S, Nayak A (eds) AI 2013: Advances in artificial intelligence. AI 2013. Lecture notes in computer science, vol 8272. pp 421–433
35.
Zurück zum Zitat Thomas A, Oommen BJ (2013) A novel Border Identification algorithm based on an “anti-Bayesian” paradigm. In: Wilson R, Hancock E, Bors A, Smith W (eds) Computer analysis of images and patterns. CAIP 2013. Lecture notes in computer science, vol 8047. pp 196–203 Thomas A, Oommen BJ (2013) A novel Border Identification algorithm based on an “anti-Bayesian” paradigm. In: Wilson R, Hancock E, Bors A, Smith W (eds) Computer analysis of images and patterns. CAIP 2013. Lecture notes in computer science, vol 8047. pp 196–203
Metadaten
Titel
Nonparametric “anti-Bayesian” quantile-based pattern classification
verfasst von
Fatemeh Mahmoudi
Mostafa Razmkhah
B. John Oommen
Publikationsdatum
23.06.2020
Verlag
Springer London
Erschienen in
Pattern Analysis and Applications / Ausgabe 1/2021
Print ISSN: 1433-7541
Elektronische ISSN: 1433-755X
DOI
https://doi.org/10.1007/s10044-020-00903-7

Weitere Artikel der Ausgabe 1/2021

Pattern Analysis and Applications 1/2021 Zur Ausgabe

Premium Partner