Hostname: page-component-8448b6f56d-c47g7 Total loading time: 0 Render date: 2024-04-23T06:47:01.562Z Has data issue: false hasContentIssue false

Data preprocessing in predictive data mining

Published online by Cambridge University Press:  09 January 2019

Stamatios-Aggelos N. Alexandropoulos
Affiliation:
Computational Intelligence Laboratory (CILab), Department of Mathematics, University of Patras, GR-26110 Patras, Greece e-mail: alekst@math.upatras.gr, sotos@math.upatras.gr, vrahatis@math.upatras.gr
Sotiris B. Kotsiantis
Affiliation:
Computational Intelligence Laboratory (CILab), Department of Mathematics, University of Patras, GR-26110 Patras, Greece e-mail: alekst@math.upatras.gr, sotos@math.upatras.gr, vrahatis@math.upatras.gr
Michael N. Vrahatis
Affiliation:
Computational Intelligence Laboratory (CILab), Department of Mathematics, University of Patras, GR-26110 Patras, Greece e-mail: alekst@math.upatras.gr, sotos@math.upatras.gr, vrahatis@math.upatras.gr

Abstract

A large variety of issues influence the success of data mining on a given problem. Two primary and important issues are the representation and the quality of the dataset. Specifically, if much redundant and unrelated or noisy and unreliable information is presented, then knowledge discovery becomes a very difficult problem. It is well-known that data preparation steps require significant processing time in machine learning tasks. It would be very helpful and quite useful if there were various preprocessing algorithms with the same reliable and effective performance across all datasets, but this is impossible. To this end, we present the most well-known and widely used up-to-date algorithms for each step of data preprocessing in the framework of predictive data mining.

Type
Survey Article
Copyright
© Cambridge University Press, 2019 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Aggarwal, C. C. 2013. An introduction to outlier analysis. In Outlier Analysis., Springer, 1–40.Google Scholar
Angiulli, F. & Fassetti, F. 2014. Exploiting domain knowledge to detect outliers. Data Mining and Knowledge Discovery 28(2), 519568.Google Scholar
Angiulli, F. & Pizzuti, C. 2005. Outlier mining in large high-dimensional data sets. IEEE Transactions on Knowledge and Data Engineering 17(2), 203215.Google Scholar
Aridas, C. K., Kotsiantis, S. B. & Vrahatis, M. N. 2016. Combining prototype selection with local boosting. In Artificial Intelligence Applications and Innovations (AIAI) 2016. IFIP Advances in Information and Communication Technology, Iliadis, L. & Maglogiannis, I. (eds). Springer, 475.Google Scholar
Aridas, C. K., Kotsiantis, S. B. & Vrahatis, M. N. 2017. Hybrid local boosting utilizing unlabeled data in classification tasks. Evolving Systems 111.Google Scholar
Arnaiz-González, Á., Dez-Pastor, J.-F., RodríGuez, J. J. & Garca-Osorio, C. 2016. Instance selection of linear complexity for big data. Knowledge-Based Systems 107, 8395.Google Scholar
Augasta, M. G. & Kathirvalavakumar, T. 2012. A new discretization algorithm based on range coefficient of dispersion and skewness for neural networks classifier. Applied Soft Computing 12(2), 619625.Google Scholar
Batista, G. E., Prati, R. C. & Monard, M. C. 2004. A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explorations Newsletter 6(1), 2029.Google Scholar
Bezdek, J. C. & Kuncheva, L. I. 2001. Nearest prototype classifier designs: an experimental study. International Journal of Intelligent Systems 16(12), 14451473.Google Scholar
Boulle, M. 2004. KHIOPS: a statistical discretization method of continuous attributes. Machine Learning 55(1), 5369.Google Scholar
Brodley, C. E. & Friedl, M. A. 1999. Identifying mislabeled training data. Journal of Artificial Intelligence Research 11, 131167.Google Scholar
Buzzi-Ferraris, G. & Manenti, F. 2011. Outlier detection in large data sets. Computers & Chemical Engineering 35(2), 388390.Google Scholar
Caises, Y., González, A., Leyva, E. & Pérez, R. 2011. Combining instance selection methods based on data characterization: an approach to increase their effectiveness. Information Sciences 181(20), 47804798.Google Scholar
Cano, A., Nguyen, D. T., Ventura, S. & Cios, K. J. 2016. ur-CAIM: improved CAIM discretization for unbalanced and balanced data. Soft Computing 20(1), 173188.Google Scholar
Cano, J.-R., Garca, S. & Herrera, F. 2008. Subgroup discover in large size data sets preprocessed using stratified instance selection for increasing the presence of minority classes. Pattern Recognition Letters 29(16), 21562164.Google Scholar
Cano, J. R., Herrera, F. & Lozano, M. 2005. Strategies for scaling up evolutionary instance reduction algorithms for data mining. In Evolutionary Computation in Data Mining, Springer, 21–39.Google Scholar
Caruana, R. & de Sa, V. R. 2003. Benefitting from the variables that variable selection discards. Journal of Machine Learning Research 3(3), 12451264.Google Scholar
Chawla, N. V., Bowyer, K. W., Hall, L. O. & Kegelmeyer, W. P. 2002. SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16, 321357.Google Scholar
Chen, S., Wang, W. & van Zuylen, H. 2010a. A comparison of outlier detection algorithms for ITS data. Expert Systems with Applications 37(2), 11691178.Google Scholar
Chen, Y., Miao, D. & Zhang, H. 2010b. Neighborhood outlier detection. Expert Systems with Applications 37(12), 87458749.Google Scholar
Chow, T. W. & Huang, D. 2005. Estimating optimal feature subsets using efficient estimation of high-dimensional mutual information. IEEE Transactions on Neural Networks 16(1), 213224.Google Scholar
Cismondi, F., Fialho, A. S., Vieira, S. M., Reti, S. R., Sousa, J. M. & Finkelstein, S. N. 2013. Missing data in medical databases: impute, delete or classify? Artificial Intelligence in Medicine 58(1), 6372.Google Scholar
Crone, S. F., Lessmann, S. & Stahlbock, R. 2006. The impact of preprocessing on data mining: an evaluation of classifier sensitivity in direct marketing. European Journal of Operational Research 173(3), 781800.Google Scholar
Czarnowski, I. 2010. Prototype selection algorithms for distributed learning. Pattern Recognition 43(6), 22922300.Google Scholar
Czarnowski, I. 2012. Cluster-based instance selection for machine classification. Knowledge and Information Systems 30(1), 113133.Google Scholar
de Haro-García, A. & García-Pedrajas, N. 2009. A divide-and-conquer recursive approach for scaling up instance selection algorithms. Data Mining and Knowledge Discovery 18(3), 392418.Google Scholar
de Sá, C. R., Soares, C. & Knobbe, A. 2016. Entropy-based discretization methods for ranking data. Information Sciences 329, 921936.Google Scholar
Delany, S. J., Segata, N. & Mac Namee, B. 2012. Profiling instances in noise reduction. Knowledge-Based Systems 31, 2840.Google Scholar
Derrac, J., Garca, S. & Herrera, F. 2010a. IFS-CoCo: instance and feature selection based on cooperative coevolution with nearest neighbor rule. Pattern Recognition 43(6), 20822105.Google Scholar
Derrac, J., Garca, S. & Herrera, F. 2010b. A survey on evolutionary instance selection and generation. International Journal of Applied Metaheuristic Computing 1, 6092.Google Scholar
Dougherty, J., Kohavi, R. & Sahami, M. 1995. Supervised and unsupervised discretization of continuous features. In Machine Learning Proceedings 1995, 194–202. Elsevier.Google Scholar
Ekambaram, R., Fefilatyev, S., Shreve, M., Kramer, K., Hall, L. O., Goldgof, D. B. & Kasturi, R. 2016. Active cleaning of label noise. Pattern Recognition 51, 463480.Google Scholar
Elomaa, T. & Rousu, J. 2004. Efficient multisplitting revisited: optima-preserving elimination of partition candidates. Data Mining and Knowledge Discovery 8(2), 97126.Google Scholar
Escalante, H. J. 2005. A comparison of outlier detection algorithms for machine learning. In Proceedings of the International Conference on Communications in Computing, 228–237.Google Scholar
Estabrooks, A., Jo, T. & Japkowicz, N. 2004. A multiple resampling method for learning from imbalanced data sets. Computational Intelligence 20(1), 1836.Google Scholar
Farhangfar, A., Kurgan, L. & Dy, J. 2008. Impact of imputation of missing values on classification error for discrete data. Pattern Recognition 41(12), 36923705.Google Scholar
Farhangfar, A., Kurgan, L. & Pedrycz, W. 2007. A novel framework for imputation of missing values in databases. IEEE Transactions on Systems, Man, and Cybernetics Part A: Systems and Humans 37(5), 692709.Google Scholar
Farquad, M. & Bose, I. 2012. Preprocessing unbalanced data using support vector machine. Decision Support Systems 53(1), 226233.Google Scholar
Fernández, A., Carmona, C. J., del Jesus, M. & Herrera, F. 2017. A pareto based ensemble with feature and instance selection for learning from multi-class imbalanced datasets. International Journal of Neural Systems 27, 117.Google Scholar
Filzmoser, P., Maronna, R. & Werner, M. 2008. Outlier identification in high dimensions. Computational Statistics & Data Analysis 52(3), 16941711.Google Scholar
Flores, M. J., Gámez, J. A., Martnez, A. M. & Puerta, J. M. 2011. Handling numeric attributes when comparing bayesian network classifiers: does the discretization method matter? Applied Intelligence 34(3), 372385.Google Scholar
Galar, M., Fernández, A., Barrenechea, E., Bustince, H. & Herrera, F. 2011. An overview of ensemble methods for binary classifiers in multi-class problems: experimental study on one-vs-one and one-vs-all schemes. Pattern Recognition 44(8), 17611776.Google Scholar
Garcia, L. P., de Carvalho, A. C. & Lorena, A. C. 2016a. Noise detection in the meta-learning level. Neurocomputing 176, 1425.Google Scholar
Garcia, L. P., Lorena, A. C., Matwin, S. & de Carvalho, A. C. 2016b. Ensembles of label noise filters: a ranking approach. Data Mining and Knowledge Discovery 30(5), 11921216.Google Scholar
García, S., Cano, J. R. & Herrera, F. 2008. A memetic algorithm for evolutionary prototype selection: a scaling up approach. Pattern Recognition 41(8), 26932709.Google Scholar
García, S., Derrac, J., Cano, J. & Herrera, F. 2012a. Prototype selection for nearest neighbor classification: taxonomy and empirical study. IEEE Transactions on Pattern Analysis and Machine Intelligence 34(3), 417435.Google Scholar
García, S., Luengo, J. & Herrera, F. 2015. Data Preprocessing in Data Mining. Springer.Google Scholar
García, S., Luengo, J., Sáez, J. A., Lopez, V. & Herrera, F. 2013. A survey of discretization techniques: taxonomy and empirical analysis in supervised learning. IEEE Transactions on Knowledge and Data Engineering 25(4), 734750.Google Scholar
García, V., Sánchez, J. S. & Mollineda, R. A. 2012b. On the effectiveness of preprocessing methods when dealing with different levels of class imbalance. Knowledge-Based Systems 25(1), 1321.Google Scholar
García-Pedrajas, N., Del Castillo, J. A. R. & Ortiz-Boyer, D. 2010. A cooperative coevolutionary algorithm for instance selection for instance-based learning. Machine Learning 78(3), 381420.Google Scholar
GarcíA-Pedrajas, N. & PéRez-RodríGuez, J. 2012. Multi-selection of instances: a straightforward way to improve evolutionary instance selection. Applied Soft Computing 12(11), 35903602.Google Scholar
Ghoting, A., Parthasarathy, S. & Otey, M. E. 2008. Fast mining of distance-based outliers in high-dimensional datasets. Data Mining and Knowledge Discovery 16(3), 349364.Google Scholar
Gonzalez-Abril, L., Cuberos, F. J., Velasco, F. & Ortega, J. A. 2009. AMEVA: an autonomous discretization algorithm. Expert Systems with Applications 36(3), 53275332.Google Scholar
Gupta, A., Mehrotra, K. G. & Mohan, C. 2010. A clustering-based discretization for supervised learning. Statistics & Probability Letters 80(9), 816824.Google Scholar
Guyon, I. & Elisseeff, A. 2003. An introduction to variable and feature selection. Journal of Machine Learning Research 3(3), 11571182.Google Scholar
He, H. & Garcia, E. A. 2009. Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering 21(9), 12631284.Google Scholar
Hernandez-Leal, P., Carrasco-Ochoa, J. A., Martnez-Trinidad, J. F. & Olvera-Lopez, J. A. 2013. Instancerank based on borders for instance selection. Pattern Recognition 46(1), 365375.Google Scholar
Hinton, G. E. & Salakhutdinov, R. R. 2006. Reducing the dimensionality of data with neural networks. Science 313(5786), 504507.Google Scholar
Hoffmann, H. 2007. Kernel PCA for novelty detection. Pattern Recognition 40(3), 863874.Google Scholar
Honghai, F., Guoshun, C., Cheng, Y., Bingru, Y. & Yumei, C. 2005. A SVM regression based approach to filling in missing values. In International Conference on Knowledge-Based and Intelligent Information and Engineering Systems, 581–587. Springer.Google Scholar
Hu, Q., Che, X., Zhang, L. & Yu, D. 2010. Feature evaluation and selection based on neighborhood soft margin. Neurocomputing 73(10), 21142124.Google Scholar
Hua, J., Tembe, W. D. & Dougherty, E. R. 2009. Performance of feature-selection methods in the classification of high-dimension data. Pattern Recognition 42(3), 409424.Google Scholar
Hua, J., Xiong, Z., Lowey, J., Suh, E. & Dougherty, E. R. 2005. Optimal number of features as a function of sample size for various classification rules. Bioinformatics 21(8), 15091515.Google Scholar
Huang, H., Lin, J., Chen, C. & Fan, M. 2006. Review of outlier detection. Application Research of Computers 8, 20062008.Google Scholar
Janssens, D., Brijs, T., Vanhoof, K. & Wets, G. 2006. Evaluating the performance of cost-based discretization versus entropy-and error-based discretization. Computers & Operations Research 33(11), 31073123.Google Scholar
Jerez, J. M., Molina, I., Garca-Laencina, P. J., Alba, E., Ribelles, N., Martn, M. & Franco, L. 2010. Missing data imputation using statistical and machine learning methods in a real breast cancer problem. Artificial Intelligence in Medicine 50(2), 105115.Google Scholar
Jin, R., Breitbart, Y. & Muoh, C. 2009. Data discretization unification. Knowledge and Information Systems 19(1), 129.Google Scholar
Kennedy, J. & Eberhart, R. C. 1995. Particle swarm optimization. In IEEE International Conference on Neural Networks Proceedings 1995, 4, 1942–1948. IEEE.Google Scholar
Kim, S., Cho, N. W., Kang, B. & Kang, S.-H. 2011. Fast outlier detection for very large log data. Expert Systems with Applications 38(8), 95879596.Google Scholar
Kim, S.-W. & Oommen, B. J. 2003. A brief taxonomy and ranking of creative prototype reduction schemes. Pattern Analysis & Applications 6(3), 232244.Google Scholar
Klinkenberg, R. 2004. Learning drifting concepts: example selection vs. example weighting. Intelligent Data Analysis 8(3), 281300.Google Scholar
Kumar, A. & Zhang, D. 2007. Hand-geometry recognition using entropy-based discretization. IEEE Transactions on Information Forensics and Security 2(2), 181187.Google Scholar
Kurgan, L. A. & Cios, K. J. 2004. CAIM discretization algorithm. IEEE Transactions on Knowledge and Data Engineering 16(2), 145153.Google Scholar
Lemaître, G., Nogueira, F. & Aridas, C. K. 2017. Imbalanced-learn: a python toolbox to tackle the curse of imbalanced datasets in machine learning. Journal of Machine Learning Research 18(17), 15.Google Scholar
Li, M., Deng, S., Feng, S. & Fan, J. 2011. An effective discretization based on class-attribute coherence maximization. Pattern Recognition Letters 32(15), 19621973.Google Scholar
Lin, W.-C., Tsai, C.-F., Ke, S.-W., Hung, C.-W. & Eberle, W. 2015. Learning to detect representative data for large scale instance selection. Journal of Systems and Software 106, 18.Google Scholar
Liu, C., Wang, W., Wang, M., Lv, F. & Konan, M. 2017. An efficient instance selection algorithm to reconstruct training set for support vector machine. Knowledge-Based Systems 116, 5873.Google Scholar
Liu, F. T., Ting, K. M. & Zhou, Z.-H. 2012. Isolation-based anomaly detection. ACM Transactions on Knowledge Discovery from Data 6(1), 139.Google Scholar
Liu, H., Hussain, F., Tan, C. L. & Dash, M. 2002. Discretization: an enabling technique. Data Mining and Knowledge Discovery 6(4), 393423.Google Scholar
Liu, H. & Motoda, H. 2002. On issues of instance selection. Data Mining and Knowledge Discovery 6(2), 115130.Google Scholar
Liu, H. & Motoda, H. 2007. Computational Methods of Feature Selection. CRC Press.Google Scholar
Liu, X. & Wang, H. 2005. A discretization algorithm based on a heterogeneity criterion. IEEE Transactions on Knowledge and Data Engineering 17(9), 11661173.Google Scholar
Liu, Z.-G., Pan, Q., Dezert, J. & Martin, A. 2016. Adaptive imputation of missing values for incomplete pattern classification. Pattern Recognition 52, 8595.Google Scholar
Lobato, F., Sales, C., Araujo, I., Tadaiesky, V., Dias, L., Ramos, L. & Santana, A. 2015. Multi-objective genetic algorithm for missing data imputation. Pattern Recognition Letters 68, 126131.Google Scholar
López, V., Fernández, A., Moreno-Torres, J. G. & Herrera, F. 2012. Analysis of preprocessing vs. cost-sensitive learning for imbalanced classification. open problems on intrinsic data characteristics. Expert Systems with Applications 39(7), 65856608.Google Scholar
Losing, V., Hammer, B. & Wersing, H. 2018. Incremental on-line learning: a review and comparison of state of the art algorithms. Neurocomputing 275, 12611274.Google Scholar
Luengo, J., Garca, S. & Herrera, F. 2012. On the choice of the best imputation methods for missing values considering three groups of classification methods. Knowledge and Information Systems 32(1), 77108.Google Scholar
Mahanipour, A., Nezamabadi-pour, H. & Nikpour, B. 2018. Using fuzzy-rough set feature selection for feature construction based on genetic programming. In 2018 3rd Conference on Swarm Intelligence and Evolutionary Computation (CSIEC), 1–6. IEEE.Google Scholar
Maldonado, S. & Weber, R. 2009. A wrapper method for feature selection using support vector machines. Information Sciences 179(13), 22082217.Google Scholar
Mao, K. 2004. Orthogonal forward selection and backward elimination algorithms for feature subset selection. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 34(1), 629634.Google Scholar
Marchiori, E. 2008. Hit miss networks with applications to instance selection. Journal of Machine Learning Research 9(6), 9971017.Google Scholar
Mateos-García, D., García-Gutiérrez, J. & Riquelme-Santos, J. C. 2012. On the evolutionary optimization of k-NN by label-dependent feature weighting. Pattern Recognition Letters 33(16), 22322238.Google Scholar
Mizianty, M. J., Kurgan, L. A. & Ogiela, M. R. 2010. Discretization as the enabling technique for the naive Bayes and semi-naive Bayes-based classification. The Knowledge Engineering Review 25(04), 421449.Google Scholar
Moreno-Torres, J. G., Raeder, T., Alaiz-RodríGuez, R., Chawla, N. V. & Herrera, F. 2012. A unifying view on dataset shift in classification. Pattern Recognition 45(1), 521530.Google Scholar
Nanni, L. & Lumini, A. 2011. Prototype reduction techniques: a comparison among different approaches. Expert Systems with Applications 38(9), 1182011828.Google Scholar
Nikolaidis, K., Goulermas, J. Y. & Wu, Q. 2011. A class boundary preserving algorithm for data condensation. Pattern Recognition 44(3), 704715.Google Scholar
Olvera-López, J. A., Carrasco-Ochoa, J. A., Martnez-Trinidad, J. F. & Kittler, J. 2010. A review of instance selection methods. Artificial Intelligence Review 34(2), 133143.Google Scholar
Panday, D., de Amorim, R. C. & Lane, P. 2018. Feature weighting as a tool for unsupervised feature selection. Information Processing Letters 129, 4452.Google Scholar
Park, D.-C. 2009. Centroid neural network with weighted features. Journal of Circuits, Systems, and Computers 18(08), 13531367.Google Scholar
Parsopoulos, K. E. & Vrahatis, M. N. 2010. Particle Swarm Optimization and Intelligence: Advances and Applications. Information Science Publishing (IGI Global).Google Scholar
Pearson, R. K. 2005. Mining Imperfect Data: Dealing with Contamination and Incomplete Records. SIAM.Google Scholar
Piramuthu, S. 2004. Evaluating feature selection methods for learning in data mining applications. European Journal of Operational Research 156(2), 483494.Google Scholar
Piramuthu, S. & Sikora, R. T. 2009. Iterative feature construction for improving inductive learning algorithms. Expert Systems with Applications 36(2), 34013406.Google Scholar
Pkekalska, E., Duin, R. P. & Paclk, P. 2006. Prototype selection for dissimilarity-based classifiers. Pattern Recognition 39(2), 189208.Google Scholar
Pyle, D. 1999. Data Preparation for Data Mining, 1. Morgan Kaufmann.Google Scholar
Qin, Y., Zhang, S., Zhu, X., Zhang, J. & Zhang, C. 2009. POP algorithm: kernel-based imputation to treat missing values in knowledge discovery from databases. Expert Systems with Applications 36(2), 27942804.Google Scholar
Quionero-Candela, J., Sugiyama, M., Schwaighofer, A. & Lawrence, N. D. 2009. Dataset Shift in Machine Learning. MIT Press.Google Scholar
Ramírez-Gallego, S., Garca, S., Mouriño-Taln, H., Martnez-Rego, D., Bolón-Canedo, V., Alonso-Betanzos, A., Bentez, J. M. & Herrera, F. 2016. Data discretization: taxonomy and big data challenge. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 6(1), 521.Google Scholar
Ramírez-Gallego, S., Krawczyk, B., Garca, S., Woźniak, M. & Herrera, F. 2017. A survey on data preprocessing for data stream mining: current status and future directions. Neurocomputing 239, 3956.Google Scholar
Reinartz, T. 2002. A unifying view on instance selection. Data Mining and Knowledge Discovery 6(2), 191210.Google Scholar
Sáez, J. A., Galar, M., Luengo, J. & Herrera, F. 2016. INFFC: an iterative class noise filter based on the fusion of classifiers with noise sensitivity control. Information Fusion 27, 1932.Google Scholar
Sáez, J. A., Luengo, J. & Herrera, F. 2013. Predicting noise filtering efficacy with data complexity measures for nearest neighbor classification. Pattern Recognition 46(1), 355364.Google Scholar
Segata, N., Blanzieri, E., Delany, S. J. & Cunningham, P. 2010. Noise reduction for instance-based learning with a local maximal margin approach. Journal of Intelligent Information Systems 35(2), 301331.Google Scholar
Shen, C., Wang, X. & Yu, D. 2012. Feature weighting of support vector machines based on derivative saliency analysis and its application to financial data mining. International Journal of Advancements in Computing Technology 4(1), 199206.Google Scholar
Shu, W. & Shen, H. 2016. Multi-criteria feature selection on cost-sensitive data with missing values. Pattern Recognition 51, 268280.Google Scholar
Silva-Ramírez, E.-L., Pino-Mejas, R., López-Coello, M. & Cubiles-de-la Vega, M.-D. 2011. Missing value imputation on missing completely at random data using multilayer perceptrons. Neural Networks 24(1), 121129.Google Scholar
Sim, J., Kwon, O. & Lee, K. C. 2016. Adaptive pairing of classifier and imputation methods based on the characteristics of missing values in data sets. Expert Systems with Applications 46, 485493.Google Scholar
Skillicorn, D. B. & McConnell, S. M. 2008. Distributed prediction from vertically partitioned data. Journal of Parallel and Distributed Computing 68(1), 1636.Google Scholar
Smith, M. G. & Bull, L. 2005. Genetic programming with a genetic algorithm for feature construction and selection. Genetic Programming and Evolvable Machines 6(3), 265281.Google Scholar
Somol, P. & Pudil, P. 2002. Feature selection toolbox. Pattern Recognition 35(12), 27492759.Google Scholar
Sun, Y., Wong, A. K. & Kamel, M. S. 2009. Classification of imbalanced data: a review. International Journal of Pattern Recognition and Artificial Intelligence 23(04), 687719.Google Scholar
Triguero, I., Derrac, J., Garcia, S. & Herrera, F. 2012a. Integrating a differential evolution feature weighting scheme into prototype generation. Neurocomputing 97, 332343.Google Scholar
Triguero, I., Derrac, J., Garcia, S. & Herrera, F. 2012b. A taxonomy and experimental study on prototype generation for nearest neighbor classification. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 42(1), 86100.Google Scholar
Tsai, C.-F. & Chang, F.-Y. 2016. Combining instance selection for better missing value imputation. Journal of Systems and Software 122, 6371.Google Scholar
Tsai, C.-J., Lee, C.-I. & Yang, W.-P. 2008. A discretization algorithm based on class-attribute contingency coefficient. Information Sciences 178(3), 714731.Google Scholar
Unler, A., Murat, A. & Chinnam, R. B. 2011. mr 2 PSO: a maximum relevance minimum redundancy feature selection method based on swarm intelligence for support vector machine classification. Information Sciences 181(20), 46254641.Google Scholar
van Hulse, J. & Khoshgoftaar, T. 2009. Knowledge discovery from imbalanced and noisy data. Data & Knowledge Engineering 68(12), 15131542.Google Scholar
van Hulse, J. & Khoshgoftaar, T. M. 2006. Class noise detection using frequent itemsets. Intelligent Data Analysis 10(6), 487507.Google Scholar
van Hulse, J. D., Khoshgoftaar, T. M. & Huang, H. 2007. The pairwise attribute noise detection algorithm. Knowledge and Information Systems 11(2), 171190.Google Scholar
Virgolin, M., Alderliesten, T., Bel, A., Witteveen, C. & Bosman, P. A. 2018. Symbolic regression and feature construction with gp-gomea applied to radiotherapy dose reconstruction of childhood cancer survivors. In Proceedings of the Genetic and Evolutionary Computation Conference, 1395–1402. ACM.Google Scholar
Wang, B. & Japkowicz, N. 2004. Imbalanced data set learning with synthetic samples. In Proc. IRIS Machine Learning Workshop, 19.Google Scholar
Wettschereck, D., Aha, D. W. & Mohri, T. 1997. A review and empirical evaluation of feature weighting methods for a class of lazy learning algorithms. Artificial Intelligence Review 11(1–5), 273314.Google Scholar
Wilson, D. R. & Martinez, T. R. 2000. Reduction techniques for instance-based learning algorithms. Machine Learning 38(3), 257286.Google Scholar
Witten, I. H., Frank, E., Hall, M. A. & Pal, C. J. 2016. Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann.Google Scholar
Wong, T.-T. 2012. A hybrid discretization method for nave bayesian classifiers. Pattern Recognition 45(6), 23212325.Google Scholar
Wu, X. & Zhu, X. 2008. Mining with noise knowledge: error-aware data mining. IEEE Transactions on Systems, Man, and Cybernetics Part A: Systems and Humans 38(4), 917932.Google Scholar
Yang, Y., Webb, G. I. & Wu, X. 2009. Discretization methods. In Data Mining and Knowledge Discovery Handbook. Springer, 101–116.Google Scholar
Zhang, S. 2011. Shell-neighbor method and its application in missing data imputation. Applied Intelligence 35(1), 123133.Google Scholar
Zhang, S., Jin, Z. & Zhu, X. 2011. Missing data imputation by utilizing information within incomplete instances. Journal of Systems and Software 84(3), 452459.Google Scholar
Zhang, S., Zhang, J., Zhu, X., Qin, Y. & Zhang, C. 2008. Missing value imputation based on data clustering. In Transactions on Computational Science I, Springer, 128–138.Google Scholar
Zhou, M.-J. & Chen, X.-J. 2012. An outlier mining algorithm based on dissimilarity. Procedia Environmental Sciences 12, 810814.Google Scholar
Zhou, Y., Chen, Y., Feng, L., Zhang, X., Shen, Z. & Zhou, X. 2018. Supervised and adaptive feature weighting for object-based classification on satellite images. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 11(9), 32243234.Google Scholar
Zhu, X. & Wu, X. 2004. Class noise vs. attribute noise: a quantitative study. Artificial Intelligence Review 22(3), 177210.Google Scholar