Skip to main content
Top

2018 | OriginalPaper | Chapter

Statistical Discretization of Continuous Attributes Using Kolmogorov-Smirnov Test

Authors : Hadi Mohammadzadeh Abachi, Saeid Hosseini, Mojtaba Amiri Maskouni, Mohammadreza Kangavari, Ngai-Man Cheung

Published in: Databases Theory and Applications

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Unlike unsupervised discretization methods that use simple rules to discretize continuous attributes through a low time complexity which mostly depends on sorting procedure, supervised discretization algorithms take the class label of attributes into consideration to achieve high accuracy. Supervised discretization process on continuous features encounters two significant challenges. Firstly, noisy class labels affect the effectiveness of discretization. Secondly, due to the high computational time of supervised algorithms in large-scale datasets, time complexity would rely on discretizing stage rather than sorting procedure. Accordingly, to address the challenges, we devise a statistical unsupervised method named as SUFDA. The SUFDA aims to produce discrete intervals through decreasing differential entropy of the normal distribution with a low temporal complexity and high accuracy. The results show that our unsupervised system obtains a better effectiveness compared to other discretization baselines in large-scale datasets.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Cano, A., Nguyen, D.T., Ventura, S., Cios, K.J.: ur-CAIM: improved CAIM discretization for unbalanced and balanced data. Soft Comput. 20(1), 173–188 (2016)CrossRef Cano, A., Nguyen, D.T., Ventura, S., Cios, K.J.: ur-CAIM: improved CAIM discretization for unbalanced and balanced data. Soft Comput. 20(1), 173–188 (2016)CrossRef
2.
go back to reference Fayyad, U., Irani, K.: Multi-interval discretization of continuous-valued attributes for classification learning (1993) Fayyad, U., Irani, K.: Multi-interval discretization of continuous-valued attributes for classification learning (1993)
3.
go back to reference Garcia, S., Luengo, J., Sez, J.A., Lopez, V., Herrera, F.: A survey of discretization techniques: taxonomy and empirical analysis in supervised learning. IEEE Trans. Knowl. Data Eng. 25(4), 734–750 (2013)CrossRef Garcia, S., Luengo, J., Sez, J.A., Lopez, V., Herrera, F.: A survey of discretization techniques: taxonomy and empirical analysis in supervised learning. IEEE Trans. Knowl. Data Eng. 25(4), 734–750 (2013)CrossRef
5.
go back to reference Hripcsak, G., Rothschild, A.S.: Agreement, the f-measure, and reliability in information retrieval. J. Am. Med. Inform. Assoc. 12(3), 296–298 (2005)CrossRef Hripcsak, G., Rothschild, A.S.: Agreement, the f-measure, and reliability in information retrieval. J. Am. Med. Inform. Assoc. 12(3), 296–298 (2005)CrossRef
6.
go back to reference Landis, J.R., Koch, G.G.: The measurement of observer agreement for categorical data. Biometrics 33(1), 159–174 (1977)CrossRef Landis, J.R., Koch, G.G.: The measurement of observer agreement for categorical data. Biometrics 33(1), 159–174 (1977)CrossRef
7.
go back to reference Massey Jr., F.J.: The Kolmogorov-Smirnov test for goodness of fit. J. Am. Stat. Assoc. 46(253), 68–78 (1951)CrossRef Massey Jr., F.J.: The Kolmogorov-Smirnov test for goodness of fit. J. Am. Stat. Assoc. 46(253), 68–78 (1951)CrossRef
8.
go back to reference Pelz, W., Good, I.J.: Approximating the lower tail-areas of the Kolmogorov-Smirnov one-sample statistic. J. Roy. Stat. Soc. Ser. B (Methodol.) 38(2), 152–156 (1976)MathSciNetMATH Pelz, W., Good, I.J.: Approximating the lower tail-areas of the Kolmogorov-Smirnov one-sample statistic. J. Roy. Stat. Soc. Ser. B (Methodol.) 38(2), 152–156 (1976)MathSciNetMATH
9.
go back to reference Simard, R., L’Ecuyer, P.: Computing the two-sided Kolmogorov-Smirnov distribution. J. Stat. Softw. 39(11), 1–18 (2011)CrossRef Simard, R., L’Ecuyer, P.: Computing the two-sided Kolmogorov-Smirnov distribution. J. Stat. Softw. 39(11), 1–18 (2011)CrossRef
Metadata
Title
Statistical Discretization of Continuous Attributes Using Kolmogorov-Smirnov Test
Authors
Hadi Mohammadzadeh Abachi
Saeid Hosseini
Mojtaba Amiri Maskouni
Mohammadreza Kangavari
Ngai-Man Cheung
Copyright Year
2018
DOI
https://doi.org/10.1007/978-3-319-92013-9_25

Premium Partner