Skip to main content

2019 | OriginalPaper | Buchkapitel

19. Improving the Stability of Variable Selection for Industrial Datasets

verfasst von : Silvia Cateni, Valentina Colla, Vincenzo Iannino

Erschienen in: Neural Advances in Processing Nonlinear Dynamic Signals

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Variable reduction is an essential step in data mining, which is able effectively to increase both the performance of machine learning and the process knowledge by removing the redundant and irrelevant input variables. The paper presents a variable selection approach merging the dominating set procedure for redundancy analysis and a wrapper approach in order to achieve an informative and not redundant subset of variables improving both the stability and the computational complexity. The proposed approach is tested on different datasets coming from the UCI repository and from industrial contexts and is compared to the exhaustive variable selection approach, which is often considered optimal in terms of system performance. Moreover the novel method is applied to both classification and regression procedures.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
2.
Zurück zum Zitat Bellman, R.: Adaptive Control Processes: A Guided Tour. Princeton University Press (1961) Bellman, R.: Adaptive Control Processes: A Guided Tour. Princeton University Press (1961)
3.
Zurück zum Zitat Biggs, N., Lloyd, E., Wilson, R.: Graph Theory. Oxford University Press (1986) Biggs, N., Lloyd, E., Wilson, R.: Graph Theory. Oxford University Press (1986)
4.
Zurück zum Zitat Bondy, J.A., Murty, U.: Graph Theory. Springer (2008). ISBN 978-1-84628-969-9CrossRef Bondy, J.A., Murty, U.: Graph Theory. Springer (2008). ISBN 978-1-84628-969-9CrossRef
5.
Zurück zum Zitat Breiman, L., Friedman, J.H., Olshen, R.A., Stone., C.J.: Classification and Regression Trees. Wadsworth and Brooks (1984) Breiman, L., Friedman, J.H., Olshen, R.A., Stone., C.J.: Classification and Regression Trees. Wadsworth and Brooks (1984)
7.
Zurück zum Zitat Cateni, S., Colla, V., Vannucci, M., Vannocci, M.: A procedure for building reduced reliable training datasets from realworld data. In: 13th IASTED International Conference on Artificial Intelligence and Applications, AIA 2014, Innsbruck, Austria, pp. 393–399 (2014) Cateni, S., Colla, V., Vannucci, M., Vannocci, M.: A procedure for building reduced reliable training datasets from realworld data. In: 13th IASTED International Conference on Artificial Intelligence and Applications, AIA 2014, Innsbruck, Austria, pp. 393–399 (2014)
8.
Zurück zum Zitat Cateni, S., Colla, V., Vannucci, M.: A fuzzy system for combining filter features selection methods. Int. J. Fuzzy Syst. (2016) Cateni, S., Colla, V., Vannucci, M.: A fuzzy system for combining filter features selection methods. Int. J. Fuzzy Syst. (2016)
9.
Zurück zum Zitat Cateni, S., Colla, V., Vannucci, M.: A hybrid feature selection method for classification purposes. In: 8th European Modeling Symposium on Mathematical Modeling and Computer simulation EMS 2014, Pisa, Italy, vol. 1, pp. 1–8 (2014) Cateni, S., Colla, V., Vannucci, M.: A hybrid feature selection method for classification purposes. In: 8th European Modeling Symposium on Mathematical Modeling and Computer simulation EMS 2014, Pisa, Italy, vol. 1, pp. 1–8 (2014)
10.
Zurück zum Zitat Cateni, S., Colla, V., Vannucci, M.: General purpose input variable extraction: a genetic algorithm based procedure give a gap. In: 9th International Conference on Intelligence Systems Design and Applications, ISDA 2009, pp. 1307–1311 (2009) Cateni, S., Colla, V., Vannucci, M.: General purpose input variable extraction: a genetic algorithm based procedure give a gap. In: 9th International Conference on Intelligence Systems Design and Applications, ISDA 2009, pp. 1307–1311 (2009)
11.
Zurück zum Zitat Cateni, S., Colla, V., Vannucci, M.: Variable selection through genetic algorithms for classification purpose. In: IASTED International Conference on Artificial Intelligence and Applications, AIA 2010, pp. 6–11 (2010) Cateni, S., Colla, V., Vannucci, M.: Variable selection through genetic algorithms for classification purpose. In: IASTED International Conference on Artificial Intelligence and Applications, AIA 2010, pp. 6–11 (2010)
12.
Zurück zum Zitat Cateni, S., Colla, V.: A hybrid variable selection approach for NN-based classification in industrial context. In: Smart Innovation, Systems and Technologies (in press) Cateni, S., Colla, V.: A hybrid variable selection approach for NN-based classification in industrial context. In: Smart Innovation, Systems and Technologies (in press)
13.
Zurück zum Zitat Cateni, S., Colla, V.: Improving the stability of sequential forward and backward variables selection. In: 15th International Conference on Intelligent Systems Design and Applications, ISDA 2015, pp. 374–379 (2016) Cateni, S., Colla, V.: Improving the stability of sequential forward and backward variables selection. In: 15th International Conference on Intelligent Systems Design and Applications, ISDA 2015, pp. 374–379 (2016)
14.
Zurück zum Zitat Cateni, S., Colla, V.: The importance of variable selection for neural networks based classification in an industrial context. In: International Workshop on Neural Networks, WIRN 2015, Smart Innovation, Systems and Technologies, vol. 54, pp. 363–370 (2016)CrossRef Cateni, S., Colla, V.: The importance of variable selection for neural networks based classification in an industrial context. In: International Workshop on Neural Networks, WIRN 2015, Smart Innovation, Systems and Technologies, vol. 54, pp. 363–370 (2016)CrossRef
15.
Zurück zum Zitat Cateni, S., Colla, V.: Improving the stability of wrapper variable selection applied to binary classification. Int. J. Comput. Inf. Syst. Ind. Manag. Appl. 8, 214–225 (2016) Cateni, S., Colla, V.: Improving the stability of wrapper variable selection applied to binary classification. Int. J. Comput. Inf. Syst. Ind. Manag. Appl. 8, 214–225 (2016)
16.
Zurück zum Zitat Cateni, S., Colla, V., Vannucci, M.: A genetic algorithm based approach for selecting input variables and setting relevant network parameters of som based classifier. Int. J. Simul. Syst. Sci. Technol. 12(2), 30–37 (2011) Cateni, S., Colla, V., Vannucci, M.: A genetic algorithm based approach for selecting input variables and setting relevant network parameters of som based classifier. Int. J. Simul. Syst. Sci. Technol. 12(2), 30–37 (2011)
17.
Zurück zum Zitat Cateni, S., Colla, V., Vannucci, M.: A method for resampling imbalanced datadata in binary classification tasks for realworld problems. Neurocomputing 135, 32–41 (2014)CrossRef Cateni, S., Colla, V., Vannucci, M.: A method for resampling imbalanced datadata in binary classification tasks for realworld problems. Neurocomputing 135, 32–41 (2014)CrossRef
18.
Zurück zum Zitat Duda, R., Hart, P., Stork, D.: Pattern Classification. Wiley, New York (2001) Duda, R., Hart, P., Stork, D.: Pattern Classification. Wiley, New York (2001)
19.
Zurück zum Zitat Fiasché, M.: A quantum-inspired evolutionary algorithm for optimization numerical problems. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Part 3. LNCS, vol. 7665, pp. 686–693 (2012)CrossRef Fiasché, M.: A quantum-inspired evolutionary algorithm for optimization numerical problems. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Part 3. LNCS, vol. 7665, pp. 686–693 (2012)CrossRef
20.
Zurück zum Zitat Fiasché, M.: SVM tree for personalized transductive learning in bioinformatics classification problems. Smart Innov. Syst. Technol. 26, 223–231 (2014)CrossRef Fiasché, M.: SVM tree for personalized transductive learning in bioinformatics classification problems. Smart Innov. Syst. Technol. 26, 223–231 (2014)CrossRef
21.
Zurück zum Zitat Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. Mach. Learn. 3, 1157–1182 (2003)MATH Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. Mach. Learn. 3, 1157–1182 (2003)MATH
22.
Zurück zum Zitat Kalousis, A., Prados, J., Hilario, M.: Stability of feature selection algorithms: a study on high-dimensional spaces. Knowl. Inf. Syst. 12, 95–116 (2007)CrossRef Kalousis, A., Prados, J., Hilario, M.: Stability of feature selection algorithms: a study on high-dimensional spaces. Knowl. Inf. Syst. 12, 95–116 (2007)CrossRef
23.
Zurück zum Zitat Kohavi, R., John, G.: Wrappers for feature selection. Artif. Intell. 97, 273–324 (1997)CrossRef Kohavi, R., John, G.: Wrappers for feature selection. Artif. Intell. 97, 273–324 (1997)CrossRef
24.
Zurück zum Zitat Loscalzo, S., Yu, L., Ding, C.: Consensus group stable feature selection. In: Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, vol. 1, pp. 567–575. ACM (2009) Loscalzo, S., Yu, L., Ding, C.: Consensus group stable feature selection. In: Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, vol. 1, pp. 567–575. ACM (2009)
25.
Zurück zum Zitat May, R., Dandy, G., Maier, H.: Review of input variable selection methods for artificial neural networks. Artif. Neural Netw. Methodol. Adv. Biomed. Appl. (2011) May, R., Dandy, G., Maier, H.: Review of input variable selection methods for artificial neural networks. Artif. Neural Netw. Methodol. Adv. Biomed. Appl. (2011)
26.
Zurück zum Zitat Mitchell, T., Toby, J., Beauchamp, J.: Bayesian variable selection in linear regression. J. Am. Stat. Assoc. 83, 1023–32 (1988)MathSciNetCrossRef Mitchell, T., Toby, J., Beauchamp, J.: Bayesian variable selection in linear regression. J. Am. Stat. Assoc. 83, 1023–32 (1988)MathSciNetCrossRef
27.
Zurück zum Zitat Novovicova, J., Somol, P., Pudil, P.: A new measure of feature selection algorithms stability. In: IEEE International Conference Data Mining Workshops, vol. 1, pp. 382–387 (2009) Novovicova, J., Somol, P., Pudil, P.: A new measure of feature selection algorithms stability. In: IEEE International Conference Data Mining Workshops, vol. 1, pp. 382–387 (2009)
28.
Zurück zum Zitat Sun, Y., Robinson, M., Adams, R., Boekhorst, R., Rust, A.G., Davey, N.: Using feature selection filtering methods for binding site predictions. In: Proceedings of 5th IEEE International Conference on Cognitive Informatics (ICCI 2006) (2006) Sun, Y., Robinson, M., Adams, R., Boekhorst, R., Rust, A.G., Davey, N.: Using feature selection filtering methods for binding site predictions. In: Proceedings of 5th IEEE International Conference on Cognitive Informatics (ICCI 2006) (2006)
29.
Zurück zum Zitat Turney, P.: Techncal note: bias and the quantification of stability. Mach. Learn. 20, 23–33 (1995) Turney, P.: Techncal note: bias and the quantification of stability. Mach. Learn. 20, 23–33 (1995)
30.
Zurück zum Zitat Wang, S., Zhu, J.: Variable selection for model-based high dimensional clustering and its application on microarray data. Biometrics 64, 440–448 (2008)MathSciNetCrossRef Wang, S., Zhu, J.: Variable selection for model-based high dimensional clustering and its application on microarray data. Biometrics 64, 440–448 (2008)MathSciNetCrossRef
31.
Zurück zum Zitat Yu, L., Liu, H.: Feature selection for high-dimensional data: a fast correlation based filter solution. In: Proceedings of the 20th International Conference on Machine Learning, ICML, vol. 1, pp. 856–863 (2003) Yu, L., Liu, H.: Feature selection for high-dimensional data: a fast correlation based filter solution. In: Proceedings of the 20th International Conference on Machine Learning, ICML, vol. 1, pp. 856–863 (2003)
Metadaten
Titel
Improving the Stability of Variable Selection for Industrial Datasets
verfasst von
Silvia Cateni
Valentina Colla
Vincenzo Iannino
Copyright-Jahr
2019
DOI
https://doi.org/10.1007/978-3-319-95098-3_19