Skip to main content
Erschienen in: Soft Computing 10/2018

01.04.2017 | Methodologies and Application

A fuzzy K-nearest neighbor classifier to deal with imperfect data

verfasst von: Jose M. Cadenas, M. Carmen Garrido, Raquel Martínez, Enrique Muñoz, Piero P. Bonissone

Erschienen in: Soft Computing | Ausgabe 10/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The k-nearest neighbors method (kNN) is a nonparametric, instance-based method used for regression and classification. To classify a new instance, the kNN method computes its k nearest neighbors and generates a class value from them. Usually, this method requires that the information available in the datasets be precise and accurate, except for the existence of missing values. However, data imperfection is inevitable when dealing with real-world scenarios. In this paper, we present the kNN\(_{imp}\) classifier, a k-nearest neighbors method to perform classification from datasets with imperfect value. The importance of each neighbor in the output decision is based on relative distance and its degree of imperfection. Furthermore, by using external parameters, the classifier enables us to define the maximum allowed imperfection, and to decide if the final output could be derived solely from the greatest weight class (the best class) or from the best class and a weighted combination of the closest classes to the best one. To test the proposed method, we performed several experiments with both synthetic and real-world datasets with imperfect data. The results, validated through statistical tests, show that the kNN\(_{imp}\) classifier is robust when working with imperfect data and maintains a good performance when compared with other methods in the literature, applied to datasets with or without imperfection.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Fußnoten
1
For example, the fuzzy entropy (Ent(\(\cdot \))) and the power of fuzzy sets (Pw(\(\cdot \))) defined by DeLuca and Termini (1972) are the following:
$$\begin{aligned} \mathrm{Ent}(A)= & {} \sum _{a\in A} (\mu (a)\mathrm{log}(\mu _A (a)) + (1-\mu _A (a))log(1-\mu _A (a))); \\ Pw(A)= & {} \sum _{a\in A}\mu _A(a) \end{aligned}$$
where A is a fuzzy set and in the case of continuous fuzzy sets, the sum is understood as an integral.
 
Literatur
Zurück zum Zitat Aha DW (1992) Tolerating noisy, irrelevant and novel attributes in instance-based learning algorithms. Int J Man-Mach Stud 36(2):267–287CrossRef Aha DW (1992) Tolerating noisy, irrelevant and novel attributes in instance-based learning algorithms. Int J Man-Mach Stud 36(2):267–287CrossRef
Zurück zum Zitat Aha DW, Kibler D, Albert KM (1991) Instance-based learning algorithms. Mach Learn 6(1):37–66 Aha DW, Kibler D, Albert KM (1991) Instance-based learning algorithms. Mach Learn 6(1):37–66
Zurück zum Zitat Alcalá-Fdez J, Fernández A, Luengo J, Derrac J, García S, Sánchez L, Herrera F (2011) Keel data-mining software tool: data set repository, integration of algorithm and experimental analysis framework. J Mult-Valued Logic Soft Comput 17(2–3):255–287 Alcalá-Fdez J, Fernández A, Luengo J, Derrac J, García S, Sánchez L, Herrera F (2011) Keel data-mining software tool: data set repository, integration of algorithm and experimental analysis framework. J Mult-Valued Logic Soft Comput 17(2–3):255–287
Zurück zum Zitat Barua A, Mudunuri LS, Kosheleva O (2014) Why trapezoidal and triangular membership functions work so well: towards a theoretical explanation. J Uncertain Syst 8(3):164–168 Barua A, Mudunuri LS, Kosheleva O (2014) Why trapezoidal and triangular membership functions work so well: towards a theoretical explanation. J Uncertain Syst 8(3):164–168
Zurück zum Zitat Berlanga F, Rivas AR, del Jesús M, Herrera F (2010) Gp-coach genetic programming-based learning of compact and accurate fuzzy rule-based classification systems for high-dimensional problems. Inf Sci 180(8):1183–1200CrossRef Berlanga F, Rivas AR, del Jesús M, Herrera F (2010) Gp-coach genetic programming-based learning of compact and accurate fuzzy rule-based classification systems for high-dimensional problems. Inf Sci 180(8):1183–1200CrossRef
Zurück zum Zitat Bezdek J (1981) Pattern recognition with fuzzy objective function algorithms. Plenum, New YorkCrossRefMATH Bezdek J (1981) Pattern recognition with fuzzy objective function algorithms. Plenum, New YorkCrossRefMATH
Zurück zum Zitat Cadenas JM, Garrido MC, Martínez R (2013) Nip—an imperfection processor to data mining datasets. Int J Comput Intell Syst 6(1):3–17CrossRef Cadenas JM, Garrido MC, Martínez R (2013) Nip—an imperfection processor to data mining datasets. Int J Comput Intell Syst 6(1):3–17CrossRef
Zurück zum Zitat Cadenas JM, Garrido MC, Martínez R, Bonissone PP (2012) Extending information processing in a fuzzy random forest. Soft Comput 16(6):845–861CrossRef Cadenas JM, Garrido MC, Martínez R, Bonissone PP (2012) Extending information processing in a fuzzy random forest. Soft Comput 16(6):845–861CrossRef
Zurück zum Zitat Cano A, Zafra A, Ventura S (2013) Weighted data gravitation classification for standard and imbalanced data. IEEE Trans Cybern 43(6):1672–1687CrossRef Cano A, Zafra A, Ventura S (2013) Weighted data gravitation classification for standard and imbalanced data. IEEE Trans Cybern 43(6):1672–1687CrossRef
Zurück zum Zitat Clare A, King R (2001) Knowledge discovery in multi-label phenotype data. In: Proceedings of the 5th European conference on principles of data mining and knowledge discovery, Freiburg, pp 42–53 Clare A, King R (2001) Knowledge discovery in multi-label phenotype data. In: Proceedings of the 5th European conference on principles of data mining and knowledge discovery, Freiburg, pp 42–53
Zurück zum Zitat Cover T, Hart PE (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13(1):21–27CrossRefMATH Cover T, Hart PE (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13(1):21–27CrossRefMATH
Zurück zum Zitat Crockett K, Bandar Z, Mclean D (2001) Growing a fuzzy decision forest. In: Proceedings of the 10th IEEE international conference on fuzzy systems, Melbourne, pp 614–617 Crockett K, Bandar Z, Mclean D (2001) Growing a fuzzy decision forest. In: Proceedings of the 10th IEEE international conference on fuzzy systems, Melbourne, pp 614–617
Zurück zum Zitat DeLuca A, Termini S (1972) A definition of a nonprobabilistic entropy in the setting of fuzzy sets theory. Inf Control 20(4):301–312MathSciNetCrossRefMATH DeLuca A, Termini S (1972) A definition of a nonprobabilistic entropy in the setting of fuzzy sets theory. Inf Control 20(4):301–312MathSciNetCrossRefMATH
Zurück zum Zitat Derrac J, García S, Herrera F (2014) Fuzzy nearest neighbor algorithms: taxonomy, experimental analysis and prospects. Inf Sci 260:98–119CrossRef Derrac J, García S, Herrera F (2014) Fuzzy nearest neighbor algorithms: taxonomy, experimental analysis and prospects. Inf Sci 260:98–119CrossRef
Zurück zum Zitat Diamon P, Kloeden P (1994) Metric spaces of fuzzy sets: theory and application. World Scientific Publishing, LondonCrossRef Diamon P, Kloeden P (1994) Metric spaces of fuzzy sets: theory and application. World Scientific Publishing, LondonCrossRef
Zurück zum Zitat Dubois D, Parde H (1980) Fuzzy sets and system: theory and applications. Academic Press, New York Dubois D, Parde H (1980) Fuzzy sets and system: theory and applications. Academic Press, New York
Zurück zum Zitat Duda RO, Hart PE, Stork DG (2001) Pattern classification. Wiley, New YorkMATH Duda RO, Hart PE, Stork DG (2001) Pattern classification. Wiley, New YorkMATH
Zurück zum Zitat Fernández A, del Jesús M, Herrera F (2009) Hierarchical fuzzy rule based classification systems with genetic rule selection for imbalanced data-sets. Int J Approx Reason 50(3):561577CrossRefMATH Fernández A, del Jesús M, Herrera F (2009) Hierarchical fuzzy rule based classification systems with genetic rule selection for imbalanced data-sets. Int J Approx Reason 50(3):561577CrossRefMATH
Zurück zum Zitat Fix E, Hodges J (1989) Discriminatory analysis, nonparametric discrimination: consistency properties. Int Stat Rev 57(3):238–247CrossRefMATH Fix E, Hodges J (1989) Discriminatory analysis, nonparametric discrimination: consistency properties. Int Stat Rev 57(3):238–247CrossRefMATH
Zurück zum Zitat García S, Fernández A, Luengo J, Herrera F (2009) A study statistical techniques and performance measures for genetics-based machine learning: accuracy and interpretability. Soft Comput 13(10):959–977CrossRef García S, Fernández A, Luengo J, Herrera F (2009) A study statistical techniques and performance measures for genetics-based machine learning: accuracy and interpretability. Soft Comput 13(10):959–977CrossRef
Zurück zum Zitat García S, Fernández A, Luengo J, Herrera F (2010) Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power. Inf Sci 180(10):2044–2064CrossRef García S, Fernández A, Luengo J, Herrera F (2010) Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power. Inf Sci 180(10):2044–2064CrossRef
Zurück zum Zitat Garrido MC, Cadenas JM, Bonissone PP (2010) A classification and regression technique to handle heterogeneous and imperfect information. Soft Comput 14(11):1165–1185CrossRef Garrido MC, Cadenas JM, Bonissone PP (2010) A classification and regression technique to handle heterogeneous and imperfect information. Soft Comput 14(11):1165–1185CrossRef
Zurück zum Zitat Huang Z (2002) A fuzzy k-modes algorithm for clustering categorical data. IEEE Trans Fuzzy Syst 7(4):446–452CrossRef Huang Z (2002) A fuzzy k-modes algorithm for clustering categorical data. IEEE Trans Fuzzy Syst 7(4):446–452CrossRef
Zurück zum Zitat Inoue T, Abe S (2001) Fuzzy support vector machines for pattern classification. In: Proceedings of international joint conference on neural networks, Washington, pp 1449–1454 Inoue T, Abe S (2001) Fuzzy support vector machines for pattern classification. In: Proceedings of international joint conference on neural networks, Washington, pp 1449–1454
Zurück zum Zitat Ishibuchi H, Yamamoto T (2005) Rule weight specification in fuzzy rule-based classification systems. IEEE Trans Fuzzy Syst 13(4):428436CrossRef Ishibuchi H, Yamamoto T (2005) Rule weight specification in fuzzy rule-based classification systems. IEEE Trans Fuzzy Syst 13(4):428436CrossRef
Zurück zum Zitat Jahromi MZ, Parvinnia E, John R (2009) A method of learning weighted similarity function to improve the performance of nearest neighbor. Inf Sci 179(17):2964–2973CrossRefMATH Jahromi MZ, Parvinnia E, John R (2009) A method of learning weighted similarity function to improve the performance of nearest neighbor. Inf Sci 179(17):2964–2973CrossRefMATH
Zurück zum Zitat Janikow CZ (1998) Fuzzy decision trees: issues and methods. IEEE Trans Syst Man Cybern Part B 28(1):1–14CrossRef Janikow CZ (1998) Fuzzy decision trees: issues and methods. IEEE Trans Syst Man Cybern Part B 28(1):1–14CrossRef
Zurück zum Zitat Janikow CZ (2003) Fuzzy decision forest. In: Proceedings of the 22nd international conference of the North American fuzzy information processing society, Chicago, pp 480–483 Janikow CZ (2003) Fuzzy decision forest. In: Proceedings of the 22nd international conference of the North American fuzzy information processing society, Chicago, pp 480–483
Zurück zum Zitat Johanyák ZC, Kovács S (2005) Distance based similarity measures of fuzzy sets. In: Proceedings of the 3rd Slovakian-Hungarian joint symposium on applied machine intelligence, Herlany, pp 265–276 Johanyák ZC, Kovács S (2005) Distance based similarity measures of fuzzy sets. In: Proceedings of the 3rd Slovakian-Hungarian joint symposium on applied machine intelligence, Herlany, pp 265–276
Zurück zum Zitat Kaufmann A (1975) Introduction to the theory of fuzzy subsets: fundamental theoretical elements. Academic Press, New YorkMATH Kaufmann A (1975) Introduction to the theory of fuzzy subsets: fundamental theoretical elements. Academic Press, New YorkMATH
Zurück zum Zitat Lee K, Lee K, Lee J (1999) A fuzzy decision tree induction method for fuzzy data. In: Proceedings of IEEE international fuzzy systems conference, Seoul, pp 16–21 Lee K, Lee K, Lee J (1999) A fuzzy decision tree induction method for fuzzy data. In: Proceedings of IEEE international fuzzy systems conference, Seoul, pp 16–21
Zurück zum Zitat Li D, Gu H, Zhang L (2010) A fuzzy c-means clustering algorithm based on nearest-neighbor intervals for incomplete data. Exp Syst Appl 37(10):6942–6947CrossRef Li D, Gu H, Zhang L (2010) A fuzzy c-means clustering algorithm based on nearest-neighbor intervals for incomplete data. Exp Syst Appl 37(10):6942–6947CrossRef
Zurück zum Zitat Lin C, Wang S (2002) Fuzzy support vector machines. IEEE Trans Neural Netw 13(2):464471 Lin C, Wang S (2002) Fuzzy support vector machines. IEEE Trans Neural Netw 13(2):464471
Zurück zum Zitat Madjarov G, Kocev D, Gjorgjevikj D, Džeroski S (2012) An extensive experimental comparison of methods for multi-label learning. Pattern Recognit 45(9):30843104CrossRef Madjarov G, Kocev D, Gjorgjevikj D, Džeroski S (2012) An extensive experimental comparison of methods for multi-label learning. Pattern Recognit 45(9):30843104CrossRef
Zurück zum Zitat Marsala C (2009) Data mining with ensembles of fuzzy decision trees. In: Proceedings of IEEE symposium on computational intelligence and data mining, Nashville, pp 348–354 Marsala C (2009) Data mining with ensembles of fuzzy decision trees. In: Proceedings of IEEE symposium on computational intelligence and data mining, Nashville, pp 348–354
Zurück zum Zitat Michie D, Spiegelhalter D, Taylor C (1994) Machine learning, neural and statistical classification. Ellis Horwood, Upper Saddle RiverMATH Michie D, Spiegelhalter D, Taylor C (1994) Machine learning, neural and statistical classification. Ellis Horwood, Upper Saddle RiverMATH
Zurück zum Zitat Mitra S, Pal SK (1995) Fuzzy multi-layer perceptron, inferencing and rule generation. IEEE Trans Neural Netw 6(1):51–63CrossRef Mitra S, Pal SK (1995) Fuzzy multi-layer perceptron, inferencing and rule generation. IEEE Trans Neural Netw 6(1):51–63CrossRef
Zurück zum Zitat Moore RE (1979) Methods and applications of interval analysis. (SIAM) Studies in Applied Mathematics 2, Soc for Industrial and Applied Math, Philadelphia Moore RE (1979) Methods and applications of interval analysis. (SIAM) Studies in Applied Mathematics 2, Soc for Industrial and Applied Math, Philadelphia
Zurück zum Zitat Nauck D, Krusel R (1997) A neuro-fuzzy method to learn fuzzy classification rules from data. Fuzzy Sets Syst 89(3):277–288MathSciNetCrossRef Nauck D, Krusel R (1997) A neuro-fuzzy method to learn fuzzy classification rules from data. Fuzzy Sets Syst 89(3):277–288MathSciNetCrossRef
Zurück zum Zitat Otero A, Otero J, Sánchez L, Villar JR (2006) Longest path estimation from inherently fuzzy data acquired with GPS using genetic algorithms. In: Proceedings of the international symposium on evolving fuzzy systems, Lancaster, pp 300–305 Otero A, Otero J, Sánchez L, Villar JR (2006) Longest path estimation from inherently fuzzy data acquired with GPS using genetic algorithms. In: Proceedings of the international symposium on evolving fuzzy systems, Lancaster, pp 300–305
Zurück zum Zitat Palacios AM, Sánchez L, Couso I (2009) Extending a simple genetic cooperative-competitive learning fuzzy classifier to low quality datasets. Evolut Intell 2(1):73–84CrossRef Palacios AM, Sánchez L, Couso I (2009) Extending a simple genetic cooperative-competitive learning fuzzy classifier to low quality datasets. Evolut Intell 2(1):73–84CrossRef
Zurück zum Zitat Palacios AM, Sánchez L, Couso I (2010) Diagnosis of dyslexia with low quality data with genetic fuzzy systems. Int J Approx Reason 51(8):993–1009CrossRef Palacios AM, Sánchez L, Couso I (2010) Diagnosis of dyslexia with low quality data with genetic fuzzy systems. Int J Approx Reason 51(8):993–1009CrossRef
Zurück zum Zitat Palacios AM, Sánchez L, Couso I (2011) Future performance modeling in athletism with low quality data-based genetic fuzzy systems. J Mult-Valued Logic Soft Comput 17:207–228 Palacios AM, Sánchez L, Couso I (2011) Future performance modeling in athletism with low quality data-based genetic fuzzy systems. J Mult-Valued Logic Soft Comput 17:207–228
Zurück zum Zitat Palacios AM, Sánchez L, Couso I (2012) Boosting of fuzzy rules with low quality data. J Mult-Valued Logic Soft Comput 19:591–619MathSciNet Palacios AM, Sánchez L, Couso I (2012) Boosting of fuzzy rules with low quality data. J Mult-Valued Logic Soft Comput 19:591–619MathSciNet
Zurück zum Zitat Palacios AM, Sánchez L, Couso I (2013) An extension of the furia classification algorithm to low quality data. Hybrid artificial intelligent systems (LNCS 8073). Springer, Berlin, pp 679–688CrossRef Palacios AM, Sánchez L, Couso I (2013) An extension of the furia classification algorithm to low quality data. Hybrid artificial intelligent systems (LNCS 8073). Springer, Berlin, pp 679–688CrossRef
Zurück zum Zitat Palacios AM, Palacios JL, Sánchez L, Alcalá-Fdez J (2015) Genetic learning of the membership functions for mining fuzzy association rules from low quality data. Inf Sci 295:358–378CrossRefMATH Palacios AM, Palacios JL, Sánchez L, Alcalá-Fdez J (2015) Genetic learning of the membership functions for mining fuzzy association rules from low quality data. Inf Sci 295:358–378CrossRefMATH
Zurück zum Zitat Paredes R, Vidal E (2006) Learning prototypes and distances: a prototype reduction technique based on nearest neighbor error minimization. Pattern Recognit 39(2):180–188CrossRefMATH Paredes R, Vidal E (2006) Learning prototypes and distances: a prototype reduction technique based on nearest neighbor error minimization. Pattern Recognit 39(2):180–188CrossRefMATH
Zurück zum Zitat Paredes R, Vidal E (2006) Learning weighted metrics to minimize nearest-neighbor classification error. IEEE Trans Pattern Anal Mach Intell 28(7):1100–1110CrossRef Paredes R, Vidal E (2006) Learning weighted metrics to minimize nearest-neighbor classification error. IEEE Trans Pattern Anal Mach Intell 28(7):1100–1110CrossRef
Zurück zum Zitat Rumelhart DE, Mcclelland JL (1986) Parallel distributed processing. MIT Press, Cambridge Rumelhart DE, Mcclelland JL (1986) Parallel distributed processing. MIT Press, Cambridge
Zurück zum Zitat Stone M (1974) Cross-validatory choice and assessment of statistical predictions. J R Stat Soc Ser B (Methodological) 36(2):111–147MathSciNetMATH Stone M (1974) Cross-validatory choice and assessment of statistical predictions. J R Stat Soc Ser B (Methodological) 36(2):111–147MathSciNetMATH
Zurück zum Zitat Torra V (2005) Fuzzy c-means for fuzzy hierarchical clustering. In: Proceedings of the 14th IEEE international conference on fuzzy systems, Reno, pp 646–651 Torra V (2005) Fuzzy c-means for fuzzy hierarchical clustering. In: Proceedings of the 14th IEEE international conference on fuzzy systems, Reno, pp 646–651
Zurück zum Zitat Villar J, Otero A, Otero J, Sánchez L (2009) Taximeter verification using imprecise data from GPS. Eng Appl Artif Intell 22(2):250–260CrossRef Villar J, Otero A, Otero J, Sánchez L (2009) Taximeter verification using imprecise data from GPS. Eng Appl Artif Intell 22(2):250–260CrossRef
Zurück zum Zitat Wang J, Neskovic P, Cooper LN (2007) Improving nearest neighbor rule with a simple adaptive distance measure. Pattern Recognit Lett 28(2):207–213CrossRef Wang J, Neskovic P, Cooper LN (2007) Improving nearest neighbor rule with a simple adaptive distance measure. Pattern Recognit Lett 28(2):207–213CrossRef
Zurück zum Zitat Witten IH, Frank E, Hall MA (2011) Data mining, 3rd edn. Morgan Kaufmann Publishers, San Francisco Witten IH, Frank E, Hall MA (2011) Data mining, 3rd edn. Morgan Kaufmann Publishers, San Francisco
Zurück zum Zitat Younes Z, Abdallah F, Denoeux T (2010) Fuzzy multi-label learning under veristic variables. In: Proceedings of the IEEE international conference on fuzzy systems, Yantai, pp 1–8 Younes Z, Abdallah F, Denoeux T (2010) Fuzzy multi-label learning under veristic variables. In: Proceedings of the IEEE international conference on fuzzy systems, Yantai, pp 1–8
Metadaten
Titel
A fuzzy K-nearest neighbor classifier to deal with imperfect data
verfasst von
Jose M. Cadenas
M. Carmen Garrido
Raquel Martínez
Enrique Muñoz
Piero P. Bonissone
Publikationsdatum
01.04.2017
Verlag
Springer Berlin Heidelberg
Erschienen in
Soft Computing / Ausgabe 10/2018
Print ISSN: 1432-7643
Elektronische ISSN: 1433-7479
DOI
https://doi.org/10.1007/s00500-017-2567-x

Weitere Artikel der Ausgabe 10/2018

Soft Computing 10/2018 Zur Ausgabe