Skip to main content

2017 | OriginalPaper | Buchkapitel

Bagging and Feature Selection for Classification with Incomplete Data

verfasst von : Cao Truong Tran, Mengjie Zhang, Peter Andreae, Bing Xue

Erschienen in: Applications of Evolutionary Computation

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Missing values are an unavoidable issue of many real-world datasets. Dealing with missing values is an essential requirement in classification problem, because inadequate treatment with missing values often leads to large classification errors. Some classifiers can directly work with incomplete data, but they often result in big classification errors and generate complex models. Feature selection and bagging have been successfully used to improve classification, but they are mainly applied to complete data. This paper proposes a combination of bagging and feature selection to improve classification with incomplete data. To achieve this purpose, a wrapper-based feature selection which can directly work with incomplete data is used to select suitable feature subsets for bagging. The experiments on eight incomplete datasets were designed to compare the proposed method with three other popular methods that are able to deal with incomplete data using C4.5/REPTree as classifiers and using Particle Swam Optimisation as a search technique in feature selection. Results show that the combination of bagging and feature selection can not only achieve better classification accuracy than the other methods but also generate less complex models compared to the bagging method.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Chandrashekar, G., Sahin, F.: A survey on feature selection methods. Comput. Electr. Eng. 40(1), 16–28 (2014)CrossRef Chandrashekar, G., Sahin, F.: A survey on feature selection methods. Comput. Electr. Eng. 40(1), 16–28 (2014)CrossRef
2.
Zurück zum Zitat Chen, H., Du, Y., Jiang, K.: Classification of incomplete data using classifier ensembles. In: 2012 International Conference on Systems and Informatics (ICSAI), pp. 2229–2232 (2012) Chen, H., Du, Y., Jiang, K.: Classification of incomplete data using classifier ensembles. In: 2012 International Conference on Systems and Informatics (ICSAI), pp. 2229–2232 (2012)
3.
Zurück zum Zitat Clerc, M., Kennedy, J.: The particle swarm-explosion, stability, and convergence in a multidimensional complex space. IEEE Trans. Evol. Comput. 6, 58–73 (2002)CrossRef Clerc, M., Kennedy, J.: The particle swarm-explosion, stability, and convergence in a multidimensional complex space. IEEE Trans. Evol. Comput. 6, 58–73 (2002)CrossRef
4.
Zurück zum Zitat Dietterich, T.G.: Ensemble methods in machine learning. In: International Workshop on Multiple Classifier Systems, pp. 1–15 (2000) Dietterich, T.G.: Ensemble methods in machine learning. In: International Workshop on Multiple Classifier Systems, pp. 1–15 (2000)
5.
Zurück zum Zitat Doquire, G., Verleysen, M.: Feature selection with missing data using mutual information estimators. Neurocomputing 90, 3–11 (2012)CrossRef Doquire, G., Verleysen, M.: Feature selection with missing data using mutual information estimators. Neurocomputing 90, 3–11 (2012)CrossRef
6.
Zurück zum Zitat García-Laencina, P.J., Sancho-Gómez, J.L., Figueiras-Vidal, A.R.: Pattern classification with missing data: a review. Neural Comput. Appl. 19, 263–282 (2010)CrossRef García-Laencina, P.J., Sancho-Gómez, J.L., Figueiras-Vidal, A.R.: Pattern classification with missing data: a review. Neural Comput. Appl. 19, 263–282 (2010)CrossRef
7.
Zurück zum Zitat Guerra-Salcedo, C., Whitley, D.: Feature selection mechanisms for ensemble creation: a genetic search perspective. In: Data Mining with Evolutionary Algorithms: Research Directions. Papers from the AAAI Workshop (1999) Guerra-Salcedo, C., Whitley, D.: Feature selection mechanisms for ensemble creation: a genetic search perspective. In: Data Mining with Evolutionary Algorithms: Research Directions. Papers from the AAAI Workshop (1999)
8.
Zurück zum Zitat Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: An update. SIGKDD Explor. Newsl. 11, 10–18 (2009)CrossRef Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: An update. SIGKDD Explor. Newsl. 11, 10–18 (2009)CrossRef
9.
Zurück zum Zitat Han, J., Pei, J., Kamber, M.: Data Mining: Concepts and Techniques. Elsevier, Waltham (2011)MATH Han, J., Pei, J., Kamber, M.: Data Mining: Concepts and Techniques. Elsevier, Waltham (2011)MATH
10.
Zurück zum Zitat Kennedy, J.: Particle swarm optimization. In: Encyclopedia of Machine Learning, pp. 760–766 (2011) Kennedy, J.: Particle swarm optimization. In: Encyclopedia of Machine Learning, pp. 760–766 (2011)
11.
Zurück zum Zitat Krause, S., Polikar, R.: An ensemble of classifiers approach for the missing feature problem. In: 2003 Proceedings of the International Joint Conference on Neural Networks, vol. 1, pp. 553–558 (2003) Krause, S., Polikar, R.: An ensemble of classifiers approach for the missing feature problem. In: 2003 Proceedings of the International Joint Conference on Neural Networks, vol. 1, pp. 553–558 (2003)
13.
Zurück zum Zitat Little, R.J., Rubin, D.B.: Statistical Analysis with Missing Data. Wiley, New York (2014)MATH Little, R.J., Rubin, D.B.: Statistical Analysis with Missing Data. Wiley, New York (2014)MATH
14.
Zurück zum Zitat Liu, H., Motoda, H.: Feature Selection for Knowledge Discovery and Data Mining, vol. 454. Springer, Heidelberg (2012)MATH Liu, H., Motoda, H.: Feature Selection for Knowledge Discovery and Data Mining, vol. 454. Springer, Heidelberg (2012)MATH
15.
Zurück zum Zitat Oliveira, L.S., Morita, M., Sabourin, R.: Feature selection for ensembles applied to handwriting recognition. Int. J. Doc. Anal. Recogn. (IJDAR) 8, 262–279 (2006)CrossRef Oliveira, L.S., Morita, M., Sabourin, R.: Feature selection for ensembles applied to handwriting recognition. Int. J. Doc. Anal. Recogn. (IJDAR) 8, 262–279 (2006)CrossRef
16.
Zurück zum Zitat Opitz, D., Maclin, R.: Popular ensemble methods: An empirical study. J. Artif. Intell. Res. 11, 169–198 (1999)MATH Opitz, D., Maclin, R.: Popular ensemble methods: An empirical study. J. Artif. Intell. Res. 11, 169–198 (1999)MATH
17.
Zurück zum Zitat Opitz, D.W.: Feature selection for ensembles. In: AAAI/IAAI 379–384 (1999) Opitz, D.W.: Feature selection for ensembles. In: AAAI/IAAI 379–384 (1999)
18.
Zurück zum Zitat Qian, W., Shu, W.: Mutual information criterion for feature selection from incomplete data. Neurocomputing 168, 210–220 (2015)CrossRef Qian, W., Shu, W.: Mutual information criterion for feature selection from incomplete data. Neurocomputing 168, 210–220 (2015)CrossRef
19.
Zurück zum Zitat Quinlan, J.R.: C4.5: Programs for Machine Learning. Elsevier, New York (2014) Quinlan, J.R.: C4.5: Programs for Machine Learning. Elsevier, New York (2014)
20.
Zurück zum Zitat Saar-Tsechansky, M., Provost, F.: Handling missing values when applying classification models. J. Mach. Learn. Res. 8, 1623–1657 (2007)MATH Saar-Tsechansky, M., Provost, F.: Handling missing values when applying classification models. J. Mach. Learn. Res. 8, 1623–1657 (2007)MATH
21.
Zurück zum Zitat Su, J., Zhang, H.: A fast decision tree learning algorithm. In: Proceedings of the 21st National Conference on Artificial Intelligence, vol. 1, pp. 500–505 (2006) Su, J., Zhang, H.: A fast decision tree learning algorithm. In: Proceedings of the 21st National Conference on Artificial Intelligence, vol. 1, pp. 500–505 (2006)
22.
Zurück zum Zitat Tran, C.T., Zhang, M., Andreae, P., Xue, B.: Improving performance for classification with incomplete data using wrapper-based feature selection. Evol. Intell. 9, 81–94 (2016)CrossRef Tran, C.T., Zhang, M., Andreae, P., Xue, B.: Improving performance for classification with incomplete data using wrapper-based feature selection. Evol. Intell. 9, 81–94 (2016)CrossRef
23.
Zurück zum Zitat Tran, C.T., Zhang, M., Andreae, P., Xue, B.: A wrapper feature selection approach to classification with missing data. In: Squillero, G., Burelli, P. (eds.) EvoApplications 2016. LNCS, vol. 9597, pp. 685–700. Springer, Cham (2016). doi:10.1007/978-3-319-31204-0_44CrossRef Tran, C.T., Zhang, M., Andreae, P., Xue, B.: A wrapper feature selection approach to classification with missing data. In: Squillero, G., Burelli, P. (eds.) EvoApplications 2016. LNCS, vol. 9597, pp. 685–700. Springer, Cham (2016). doi:10.​1007/​978-3-319-31204-0_​44CrossRef
24.
Zurück zum Zitat Xue, B., Zhang, M., Browne, W., Yao, X.: A survey on evolutionary computation approaches to feature selection. IEEE Trans. Evol. Comput. 20, 606–626 (2016)CrossRef Xue, B., Zhang, M., Browne, W., Yao, X.: A survey on evolutionary computation approaches to feature selection. IEEE Trans. Evol. Comput. 20, 606–626 (2016)CrossRef
Metadaten
Titel
Bagging and Feature Selection for Classification with Incomplete Data
verfasst von
Cao Truong Tran
Mengjie Zhang
Peter Andreae
Bing Xue
Copyright-Jahr
2017
DOI
https://doi.org/10.1007/978-3-319-55849-3_31

Premium Partner