Skip to main content
Erschienen in: International Journal of Parallel Programming 2/2018

09.02.2017

Parallel Asynchronous Strategies for the Execution of Feature Selection Algorithms

verfasst von: Jorge Silva, Ana Aguiar, Fernando Silva

Erschienen in: International Journal of Parallel Programming | Ausgabe 2/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Reducing the dimensionality of datasets is a fundamental step in the task of building a classification model. Feature selection is the process of selecting a smaller subset of features from the original one in order to enhance the performance of the classification model. The problem is known to be NP-hard, and despite the existence of several algorithms there is not one that outperforms the others in all scenarios. Due to the complexity of the problem usually feature selection algorithms have to compromise the quality of their solutions in order to execute in a practicable amount of time. Parallel computing techniques emerge as a potential solution to tackle this problem. There are several approaches that already execute feature selection in parallel resorting to synchronous models. These are preferred due to their simplicity and capability to use with any feature selection algorithm. However, synchronous models implement pausing points during the execution flow, which decrease the parallel performance. In this paper, we discuss the challenges of executing feature selection algorithms in parallel using asynchronous models, and present a feature selection algorithm that favours these models. Furthermore, we present two strategies for an asynchronous parallel execution not only of our algorithm but of any other feature selection approach. The first strategy solves the problem using the distributed memory paradigm, while the second exploits the use of shared memory. We evaluate the parallel performance of our strategies using up to 32 cores. The results show near linear speedups for both strategies, with the shared memory strategy outperforming the distributed one. Additionally, we provide an example of adapting our strategies to execute the Sequential forward Search asynchronously. We further test this version versus a synchronous one. Our results revealed that, by using an asynchronous strategy, we are able to save an average of 7.5% of the execution time.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Fußnoten
1
Since a task is the process of evaluating a subset after it has been generated.
 
2
In order to avoid pseudo code redundancy, the ProcessTask function refers to the steps of testing a subset. In the specific case of our FS algorithm, this is represented by the lines 5 to 15 of the code in Algorithm 1. However, since we are on the slaves side, the Local Hash Table is used instead of the Global Hash Table.
 
3
The pseudo-code was abbreviated in some parts for the sake of readability.
 
4
We replaced the pseudo-code for a task execution with a function call since this part depends on the feature selection algorithm. In our case this part corresponds to lines 12 to 20 from the code in Algorithm 1.
 
5
In our worst scenario, each worker required access to the Hash Table on an average of each 0.1 s.
 
6
We measured the time wasted for locks during our parallel tests, in the worst scenario 10 sec out of more than 6000 s were spent waiting for locks.
 
7
The successor generation function is the same as the one previously described in Sect.  4.3
 
8
By task we mean evaluating a subset, which is by far the most time consuming part of a task.
 
9
Even for the synchronous strategy the number of expanded subsets is not always the same because of some random components of the classification model, meaning that in different searches the top subsets at each level are different.
 
Literatur
1.
Zurück zum Zitat Alba, E., Troya, J.M.: Analyzing synchronous and asynchronous parallel distributed genetic algorithms. Future Gener. Comput. Syst. 17(4), 451–465 (2001)CrossRefMATH Alba, E., Troya, J.M.: Analyzing synchronous and asynchronous parallel distributed genetic algorithms. Future Gener. Comput. Syst. 17(4), 451–465 (2001)CrossRefMATH
2.
Zurück zum Zitat Azmandian, F., Yilmazer, A., Dy, J.G., Aslam, J., Kaeli, D.R. et al.: Gpu-accelerated feature selection for outlier detection using the local kernel density ratio. In: 2012 IEEE 12th International Conference on Data Mining (ICDM), pp. 51–60. IEEE (2012) Azmandian, F., Yilmazer, A., Dy, J.G., Aslam, J., Kaeli, D.R. et al.: Gpu-accelerated feature selection for outlier detection using the local kernel density ratio. In: 2012 IEEE 12th International Conference on Data Mining (ICDM), pp. 51–60. IEEE (2012)
3.
Zurück zum Zitat Beaumont, O., Legrand, A., Robert, Y.: The master-slave paradigm with heterogeneous processors. IEEE Trans. Parallel Distrib. Syst. 14(9), 897–908 (2003)CrossRef Beaumont, O., Legrand, A., Robert, Y.: The master-slave paradigm with heterogeneous processors. IEEE Trans. Parallel Distrib. Syst. 14(9), 897–908 (2003)CrossRef
4.
Zurück zum Zitat Bolón-Canedo, V., Sánchez-Marono, N., Cervino-Rabunal, J.: Toward parallel feature selection from vertically partitioned data. European Symposium on Artificial Neural Networks (2014) Bolón-Canedo, V., Sánchez-Marono, N., Cervino-Rabunal, J.: Toward parallel feature selection from vertically partitioned data. European Symposium on Artificial Neural Networks (2014)
6.
Zurück zum Zitat Domingos, P.: A few useful things to know about machine learning. Commun. ACM 55(10), 78–87 (2012)CrossRef Domingos, P.: A few useful things to know about machine learning. Commun. ACM 55(10), 78–87 (2012)CrossRef
7.
Zurück zum Zitat Donalek, C., Djorgovski, S.G., Mahabal, A., Graham, M.J., Drake, A.J., Fuchs, T.J., Turmon, M.J., Kumar, A.A., Philip, N.S., Yang, M.T.-C. et al.: Feature selection strategies for classifying high dimensional astronomical data sets. In: 2013 IEEE International Conference on Big Data, pp. 35–41. IEEE (2013) Donalek, C., Djorgovski, S.G., Mahabal, A., Graham, M.J., Drake, A.J., Fuchs, T.J., Turmon, M.J., Kumar, A.A., Philip, N.S., Yang, M.T.-C. et al.: Feature selection strategies for classifying high dimensional astronomical data sets. In: 2013 IEEE International Conference on Big Data, pp. 35–41. IEEE (2013)
8.
Zurück zum Zitat Gerbessiotis, A.V., Valiant, L.G.: Direct bulk-synchronous parallel algorithms. J. Parallel Distrib. Comput. 22(2), 251–267 (1994)CrossRef Gerbessiotis, A.V., Valiant, L.G.: Direct bulk-synchronous parallel algorithms. J. Parallel Distrib. Comput. 22(2), 251–267 (1994)CrossRef
10.
Zurück zum Zitat Guyon, I., Gunn, S., Ben-Hur, A., Dror, G.: Result analysis of the nips 2003 feature selection challenge. In: Advances in Neural Information Processing Systems, pp. 545–552 (2004) Guyon, I., Gunn, S., Ben-Hur, A., Dror, G.: Result analysis of the nips 2003 feature selection challenge. In: Advances in Neural Information Processing Systems, pp. 545–552 (2004)
11.
Zurück zum Zitat Hao, H., Liu, C.-L., Sako, H.: Comparison of genetic algorithm and sequential search methods for classifier subset selection, p. 765. IEEE (2003) Hao, H., Liu, C.-L., Sako, H.: Comparison of genetic algorithm and sequential search methods for classifier subset selection, p. 765. IEEE (2003)
12.
Zurück zum Zitat Janecek, A., Gansterer, W.N., Demel, M., Ecker, G. On the relationship between feature selection and classification accuracy. In: FSDM, pp. 90–105 (2008) Janecek, A., Gansterer, W.N., Demel, M., Ecker, G. On the relationship between feature selection and classification accuracy. In: FSDM, pp. 90–105 (2008)
13.
Zurück zum Zitat Julião, M., Silva, J., Aguiar, A., Moniz, H., Batista, F.: Speech features for discriminating stress using branch and bound wrapper search. In: Languages, Applications and Technologies, pp. 3–14. Springer, Berlin (2015) Julião, M., Silva, J., Aguiar, A., Moniz, H., Batista, F.: Speech features for discriminating stress using branch and bound wrapper search. In: Languages, Applications and Technologies, pp. 3–14. Springer, Berlin (2015)
14.
Zurück zum Zitat Kubicaa, J., Singhb, S., Sorokinac, D.: Parallel large-scale feature selection. Parallel Distrib. Approach. Scaling Mach. Learn (2011) Kubicaa, J., Singhb, S., Sorokinac, D.: Parallel large-scale feature selection. Parallel Distrib. Approach. Scaling Mach. Learn (2011)
15.
Zurück zum Zitat Li, R., Jianjiang, L., Zhang, Y., Zhao, T.: Dynamic adaboost learning with feature selection based on parallel genetic algorithm for image annotation. Knowl. Based Syst. 23, 195–201 (2010)CrossRef Li, R., Jianjiang, L., Zhang, Y., Zhao, T.: Dynamic adaboost learning with feature selection based on parallel genetic algorithm for image annotation. Knowl. Based Syst. 23, 195–201 (2010)CrossRef
16.
Zurück zum Zitat Liu, H., Lei, Y.: Toward integrating feature selection algorithms for classification and clustering. IEEE Trans. Knowl. Data Eng. 17(4), 491–502 (2005)MathSciNetCrossRef Liu, H., Lei, Y.: Toward integrating feature selection algorithms for classification and clustering. IEEE Trans. Knowl. Data Eng. 17(4), 491–502 (2005)MathSciNetCrossRef
17.
Zurück zum Zitat Loughrey, J., Cunningham, P: Using early-stopping to avoid overfitting in wrapper-based feature selection employing stochastic search Loughrey, J., Cunningham, P: Using early-stopping to avoid overfitting in wrapper-based feature selection employing stochastic search
19.
Zurück zum Zitat Molina, L.C., Belanche, L., Nebot, À. Feature selection algorithms: a survey and experimental evaluation. In: 2002 IEEE International Conference on Data Mining, 2002. ICDM 2003. Proceedings, pp. 306–313. IEEE (2002) Molina, L.C., Belanche, L., Nebot, À. Feature selection algorithms: a survey and experimental evaluation. In: 2002 IEEE International Conference on Data Mining, 2002. ICDM 2003. Proceedings, pp. 306–313. IEEE (2002)
20.
Zurück zum Zitat Oommen, T., Misra, D., Twarakavi, N.K.C., Prakash, A., Sahoo, B., Bandopadhyay, S.: An objective analysis of support vector machine based classification for remote sensing. Math. Geosci. 40(4), 409–424 (2008)CrossRefMATH Oommen, T., Misra, D., Twarakavi, N.K.C., Prakash, A., Sahoo, B., Bandopadhyay, S.: An objective analysis of support vector machine based classification for remote sensing. Math. Geosci. 40(4), 409–424 (2008)CrossRefMATH
22.
Zurück zum Zitat Silva, J., Aguiar, A., Silva, F.: A parallel computing hybrid approach for feature selection. In: The 18th International Conference on Computational Science and Engineering, Porto, Portugal. IEEE (2015) Silva, J., Aguiar, A., Silva, F.: A parallel computing hybrid approach for feature selection. In: The 18th International Conference on Computational Science and Engineering, Porto, Portugal. IEEE (2015)
23.
Zurück zum Zitat Somol, P., Pudil, P., Kittler, J.: Fast branch and bound algorithms for optimal feature selection. IEEE Trans. Pattern Anal. Mach. Intell. 26(7), 900–912 (2004)CrossRef Somol, P., Pudil, P., Kittler, J.: Fast branch and bound algorithms for optimal feature selection. IEEE Trans. Pattern Anal. Mach. Intell. 26(7), 900–912 (2004)CrossRef
24.
Zurück zum Zitat Stenström, P.: A survey of cache coherence schemes for multiprocessors. Computer 23(6), 12–24 (1990)CrossRef Stenström, P.: A survey of cache coherence schemes for multiprocessors. Computer 23(6), 12–24 (1990)CrossRef
25.
Zurück zum Zitat Tang, J., Alelyani, S., Liu, H.: Feature selection for classification: a review. Data Classif. Algorithms Appl. 37–64 (2014) Tang, J., Alelyani, S., Liu, H.: Feature selection for classification: a review. Data Classif. Algorithms Appl. 37–64 (2014)
26.
Zurück zum Zitat Zhao, Z., Zhang, R., Cox, J., Duling, D., Sarle, W.: Massively parallel feature selection: an approach based on variance preservation. Mach. Learn. 92(1), 195–220 (2013)MathSciNetCrossRefMATH Zhao, Z., Zhang, R., Cox, J., Duling, D., Sarle, W.: Massively parallel feature selection: an approach based on variance preservation. Mach. Learn. 92(1), 195–220 (2013)MathSciNetCrossRefMATH
27.
Zurück zum Zitat Zhou, Y., Porwal, U., Zhang, C., Ngo, H.Q., Nguyen, L., Christopher, R., Govindaraju, V.: Parallel feature selection inspired by group testing. In: Advances in Neural Information Processing Systems, pp. 3554–3562 (2014) Zhou, Y., Porwal, U., Zhang, C., Ngo, H.Q., Nguyen, L., Christopher, R., Govindaraju, V.: Parallel feature selection inspired by group testing. In: Advances in Neural Information Processing Systems, pp. 3554–3562 (2014)
Metadaten
Titel
Parallel Asynchronous Strategies for the Execution of Feature Selection Algorithms
verfasst von
Jorge Silva
Ana Aguiar
Fernando Silva
Publikationsdatum
09.02.2017
Verlag
Springer US
Erschienen in
International Journal of Parallel Programming / Ausgabe 2/2018
Print ISSN: 0885-7458
Elektronische ISSN: 1573-7640
DOI
https://doi.org/10.1007/s10766-017-0493-2

Weitere Artikel der Ausgabe 2/2018

International Journal of Parallel Programming 2/2018 Zur Ausgabe