Skip to main content

27.02.2024

A heuristic hybrid instance reduction approach based on adaptive relative distance and k-means clustering

verfasst von: Junnan Li, Qing Zhao, Shuang Liu

Erschienen in: The Journal of Supercomputing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The k nearest neighbor (KNN) classifier is one of the well-known instance-based classifiers. Nevertheless, the low efficiency in both running time and memory usage is a great challenge in the KNN classifier and its improvements due to noise and redundant samples. Although hybrid instance reduction approaches have been postulated as a good solution, they still suffer from the following issues: (a) adopted edition methods in existing hybrid instance reduction approaches are susceptible to harmful samples around the tested sample; (b) existing hybrid instance reduction approaches retain many internal samples, which contributes little to the classification accuracy and (or) leading to the low reduction rate; (c) existing hybrid instance reduction approaches rely on more than one parameter. The chief contributions of this article are that (a) a novel heuristic hybrid instance reduction approach based on adaptive relative distance and k-means clustering (HIRRDKM) is proposed against the above issues; (b) a novel concept, i.e., the adaptive relative distance, is first proposed and calculated for each sample; (c) a novel edition method based on adaptive relative distance in HIRRDKM is second proposed to filter out harmful samples; (d) a novel condensing method based on adaptive relative distance and k-means clustering in HIRRDKM is third proposed to obtain condensed borderline samples from the training set without harmful samples. Experiments have proved that (a) HIRRDKM outperforms 6 state-of-the-art hybrid instance reduction methods on real data sets from various fields in weighing reduction rate and classification accuracy of KNN-based classifiers; (b) the running time of HIRRDKM is competitive.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Literatur
1.
Zurück zum Zitat Zhu H, Wang X, Wang R (2022) Fuzzy monotonic K-nearest neighbor versus monotonic fuzzy K-nearest neighbor. IEEE Trans Fuzzy Syst 30(9):3501–3513CrossRef Zhu H, Wang X, Wang R (2022) Fuzzy monotonic K-nearest neighbor versus monotonic fuzzy K-nearest neighbor. IEEE Trans Fuzzy Syst 30(9):3501–3513CrossRef
2.
Zurück zum Zitat Ma Y, Huang R, Yan M, Li G, Wang T (2022) Attention-based local mean K-nearest centroid neighbor classifier. Expert Syst Appl 201:117159CrossRef Ma Y, Huang R, Yan M, Li G, Wang T (2022) Attention-based local mean K-nearest centroid neighbor classifier. Expert Syst Appl 201:117159CrossRef
3.
Zurück zum Zitat Pan Z, Wang Y, Ku W (2017) A new k-harmonic nearest neighbor classifier based on the multi-local means. Expert Syst Appl 67:115–125CrossRef Pan Z, Wang Y, Ku W (2017) A new k-harmonic nearest neighbor classifier based on the multi-local means. Expert Syst Appl 67:115–125CrossRef
4.
Zurück zum Zitat Kumbure MM, Luukka P, Collan M (2020) A new fuzzy k-nearest neighbor classifier based on the Bonferroni mean. Pattern Recognit Lett 140:172–178CrossRef Kumbure MM, Luukka P, Collan M (2020) A new fuzzy k-nearest neighbor classifier based on the Bonferroni mean. Pattern Recognit Lett 140:172–178CrossRef
5.
Zurück zum Zitat Heo JP, Lin Z, Yoon SE (2019) Distance encoded product quantization for approximate K-nearest neighbor search in high-dimensional space. IEEE Trans Pattern Anal Mach Intell 41(9):2084–2097CrossRef Heo JP, Lin Z, Yoon SE (2019) Distance encoded product quantization for approximate K-nearest neighbor search in high-dimensional space. IEEE Trans Pattern Anal Mach Intell 41(9):2084–2097CrossRef
6.
Zurück zum Zitat Nikolaidis K, Rodriguez-Martinez E, Goulermas JY, Wu QH (2012) Spectral graph optimization for instance reduction. IEEE Trans Neural Netw Learn Syst 23(7):1169–1175CrossRef Nikolaidis K, Rodriguez-Martinez E, Goulermas JY, Wu QH (2012) Spectral graph optimization for instance reduction. IEEE Trans Neural Netw Learn Syst 23(7):1169–1175CrossRef
7.
Zurück zum Zitat Xuan J et al (2015) Towards effective bug triage with software data reduction techniques. IEEE Trans Knowl Data Eng 27(1):264–280CrossRef Xuan J et al (2015) Towards effective bug triage with software data reduction techniques. IEEE Trans Knowl Data Eng 27(1):264–280CrossRef
8.
Zurück zum Zitat Wilson DL (1972) Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans Syst Man Cybern SMC-2 3:408–421MathSciNetCrossRef Wilson DL (1972) Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans Syst Man Cybern SMC-2 3:408–421MathSciNetCrossRef
9.
Zurück zum Zitat Hart P (1968) The condensed nearest neighbor rule. IEEE Trans Inf Theory 14(3):515–516CrossRef Hart P (1968) The condensed nearest neighbor rule. IEEE Trans Inf Theory 14(3):515–516CrossRef
10.
Zurück zum Zitat Li J, Zhu Q, Wu Q (2020) A parameter-free hybrid instance selection algorithm based on local sets with natural neighbors. Appl Intell 50:1527–1541CrossRef Li J, Zhu Q, Wu Q (2020) A parameter-free hybrid instance selection algorithm based on local sets with natural neighbors. Appl Intell 50:1527–1541CrossRef
11.
Zurück zum Zitat Sánchez J, Barandela R, Marques A, Alejo R, Badenas J (2003) Analysis of new techniques to obtain quality training sets. Pattern Recognit Lett 24(7):1015–1022CrossRef Sánchez J, Barandela R, Marques A, Alejo R, Badenas J (2003) Analysis of new techniques to obtain quality training sets. Pattern Recognit Lett 24(7):1015–1022CrossRef
12.
Zurück zum Zitat Yang L, Zhu Q, Huang J, Cheng D (2017) Adaptive edited natural neighbor algorithm. Neurocomputing 230(22):427–433CrossRef Yang L, Zhu Q, Huang J, Cheng D (2017) Adaptive edited natural neighbor algorithm. Neurocomputing 230(22):427–433CrossRef
13.
Zurück zum Zitat Marchiori E (2008) Hit miss networks with applications to instance selection. J Mach Learn Res 9:997–1017MathSciNet Marchiori E (2008) Hit miss networks with applications to instance selection. J Mach Learn Res 9:997–1017MathSciNet
14.
Zurück zum Zitat Nikolaidis K, Goulermas JY, Wu QH (2011) A class boundary preserving algorithm for data condensation. Pattern Recognit 44(3):704–715CrossRef Nikolaidis K, Goulermas JY, Wu QH (2011) A class boundary preserving algorithm for data condensation. Pattern Recognit 44(3):704–715CrossRef
15.
Zurück zum Zitat Rico-Juan JR, Iñesta JM (2012) New rank methods for reducing the size of the training set using the nearest neighbor rule. Pattern Recognit Lett 33(5):654–660CrossRef Rico-Juan JR, Iñesta JM (2012) New rank methods for reducing the size of the training set using the nearest neighbor rule. Pattern Recognit Lett 33(5):654–660CrossRef
16.
Zurück zum Zitat Vallejo CG, Troyano JA, Ortega FJ (2010) InstanceRank: bringing order to datasets. Pattern Recognit Lett 31(2):131–142CrossRef Vallejo CG, Troyano JA, Ortega FJ (2010) InstanceRank: bringing order to datasets. Pattern Recognit Lett 31(2):131–142CrossRef
17.
Zurück zum Zitat Hernandezleal P, Carrascoochoa JA, MartínezTrinidad JF, Olveralopez JA (2013) Instancerank based on borders for instance selection. Pattern Recognit 46(1):365–375CrossRef Hernandezleal P, Carrascoochoa JA, MartínezTrinidad JF, Olveralopez JA (2013) Instancerank based on borders for instance selection. Pattern Recognit 46(1):365–375CrossRef
18.
Zurück zum Zitat Li J, Wang Y (2015) A new fast reduction technique based on binary nearest neighbor tree. Neurocomputing 149(3):647–1657 Li J, Wang Y (2015) A new fast reduction technique based on binary nearest neighbor tree. Neurocomputing 149(3):647–1657
19.
Zurück zum Zitat Cavalcanti GDC, Ren TI, Pereira CL (2013) ATISA: adaptive threshold-based instance selection algorithm. Expert Syst Appl 40(17):6894–6900CrossRef Cavalcanti GDC, Ren TI, Pereira CL (2013) ATISA: adaptive threshold-based instance selection algorithm. Expert Syst Appl 40(17):6894–6900CrossRef
20.
Zurück zum Zitat Leyva E, Antonio G, Raúl P (2015) Three new instance selection methods based on local sets: a comparative study with several approaches from a bi-objective perspective. Pattern Recognit 48(4):1523–1537CrossRef Leyva E, Antonio G, Raúl P (2015) Three new instance selection methods based on local sets: a comparative study with several approaches from a bi-objective perspective. Pattern Recognit 48(4):1523–1537CrossRef
21.
Zurück zum Zitat Yang L, Zhu Q, Huang J, Cheng D, Wu Q, Hong X (2018) Natural neighborhood graph-based instance reduction algorithm without parameters. Appl Soft Comput 70:279–287CrossRef Yang L, Zhu Q, Huang J, Cheng D, Wu Q, Hong X (2018) Natural neighborhood graph-based instance reduction algorithm without parameters. Appl Soft Comput 70:279–287CrossRef
22.
Zurück zum Zitat Yang L, Zhu Q, Huang J, Cheng D, Wu Q, Hong X (2019) Constraint nearest neighbor for instance reduction. Soft Comput 23:13235–13245CrossRef Yang L, Zhu Q, Huang J, Cheng D, Wu Q, Hong X (2019) Constraint nearest neighbor for instance reduction. Soft Comput 23:13235–13245CrossRef
23.
Zurück zum Zitat Khan I, Luo Z, Huang JZ, Shahzad W (2020) Variable weighting in fuzzy k-means clustering to determine the number of clusters. IEEE Trans Knowl Data Eng 23(9):1838–1853CrossRef Khan I, Luo Z, Huang JZ, Shahzad W (2020) Variable weighting in fuzzy k-means clustering to determine the number of clusters. IEEE Trans Knowl Data Eng 23(9):1838–1853CrossRef
24.
Zurück zum Zitat Zhu Q, Feng J, Huang J (2016) Natural neighbor: a self-adaptive neighborhood method without parameter k. Pattern Recognit Lett 80(1):30–36CrossRef Zhu Q, Feng J, Huang J (2016) Natural neighbor: a self-adaptive neighborhood method without parameter k. Pattern Recognit Lett 80(1):30–36CrossRef
25.
Zurück zum Zitat Zhu Y, Jia C, Li G, Song J (2020) Inspector: a lysine succinylation predictor based on edited nearest-neighbor undersampling and adaptive synthetic oversampling. Anal Biochem 593:113592CrossRef Zhu Y, Jia C, Li G, Song J (2020) Inspector: a lysine succinylation predictor based on edited nearest-neighbor undersampling and adaptive synthetic oversampling. Anal Biochem 593:113592CrossRef
26.
Zurück zum Zitat Aziz Y, Memon KH (2023) Fast geometrical extraction of nearest neighbors from multi-dimensional data. Pattern Recognit 136:109183CrossRef Aziz Y, Memon KH (2023) Fast geometrical extraction of nearest neighbors from multi-dimensional data. Pattern Recognit 136:109183CrossRef
27.
Zurück zum Zitat Zhao Y, Wang Y, Zhang J, Fu CW, Xu M, Moritz D (2022) KD-Box: line-segment-based KD-tree for interactive exploration of large-scale time-series data. IEEE Trans Vis Comput Graph 28(1):890–900CrossRef Zhao Y, Wang Y, Zhang J, Fu CW, Xu M, Moritz D (2022) KD-Box: line-segment-based KD-tree for interactive exploration of large-scale time-series data. IEEE Trans Vis Comput Graph 28(1):890–900CrossRef
28.
Zurück zum Zitat Mohammadi M, Hofman W, Tan YH (2019) A comparative study of ontology matching systems via inferential statistics. IEEE Trans Knowl Data Eng 31(4):615–628CrossRef Mohammadi M, Hofman W, Tan YH (2019) A comparative study of ontology matching systems via inferential statistics. IEEE Trans Knowl Data Eng 31(4):615–628CrossRef
29.
Zurück zum Zitat Trabelsi A, Elouedi Z, Lefevre E (2023) An ensemble classifier through rough set reducts for handling data with evidential attributes. Inf Sci 635:414–429CrossRef Trabelsi A, Elouedi Z, Lefevre E (2023) An ensemble classifier through rough set reducts for handling data with evidential attributes. Inf Sci 635:414–429CrossRef
Metadaten
Titel
A heuristic hybrid instance reduction approach based on adaptive relative distance and k-means clustering
verfasst von
Junnan Li
Qing Zhao
Shuang Liu
Publikationsdatum
27.02.2024
Verlag
Springer US
Erschienen in
The Journal of Supercomputing
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-023-05885-x

Premium Partner