Skip to main content
Top

2017 | OriginalPaper | Chapter

A Functional Approach to Parallelizing Data Mining Algorithms in Java

Authors : Ivan Kholod, Andrey Shorov, Sergei Gorlatch

Published in: Parallel Computing Technologies

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

We describe a new approach to parallelizing data mining algorithms. We use the representation of an algorithm as a sequence of functions and we use higher-order functions to express parallel execution. Our approach generalizes the popular MapReduce programming model by enabling not only data-parallel, but also task-parallel implementation and a combination of both. We implement our approach as an extension of the industrial-strength library Xelopes, and we illustrate it by developing a multi-threaded Java program for the 1R classification algorithm, with experiments on a multi-core processor.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Paul, S.: Parallel and distributed data mining. In: Funatsu, K. (ed.) New Fundamental Technologies in Data Mining, Karunya University, Coimbatore, India, pp. 43–54 (2011). ISBN 978-953-307-547-1 Paul, S.: Parallel and distributed data mining. In: Funatsu, K. (ed.) New Fundamental Technologies in Data Mining, Karunya University, Coimbatore, India, pp. 43–54 (2011). ISBN 978-953-307-547-1
2.
go back to reference Zaki, M.: Parallel and distributed association mining : a survey. IEEE Concurrency 7(4), 14–25 (1999)CrossRef Zaki, M.: Parallel and distributed association mining : a survey. IEEE Concurrency 7(4), 14–25 (1999)CrossRef
5.
go back to reference Dean, J. Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: Proceedings of Operating Systems Design and Implementation. San Francisco (2004) Dean, J. Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: Proceedings of Operating Systems Design and Implementation. San Francisco (2004)
7.
go back to reference Gorlatch, S.: Extracting and implementing list homomorphism in parallel program development. Sci. Comput. Program. 33(1), 1–27 (1999)MathSciNetCrossRefMATH Gorlatch, S.: Extracting and implementing list homomorphism in parallel program development. Sci. Comput. Program. 33(1), 1–27 (1999)MathSciNetCrossRefMATH
8.
go back to reference Rasch, A., Gorlatch, S.: Multi-dimensional homomorphisms and their implementation in OpenCL. Int. J. Parallel Prog. 45, 300–319 (2017)CrossRef Rasch, A., Gorlatch, S.: Multi-dimensional homomorphisms and their implementation in OpenCL. Int. J. Parallel Prog. 45, 300–319 (2017)CrossRef
9.
go back to reference Ng, A.Y., Bradski, G., Chu, C.-T., Olukotun, K., Kim, S.K., Lin, Y.-A., Yu, Y.Y.: Map-Reduce for machine learning on multicore. In: Proceedings of the Twentieth Annual Conference on Neural Information Processing Systems, Vancouver, Canada, pp. 281–288 (2006) Ng, A.Y., Bradski, G., Chu, C.-T., Olukotun, K., Kim, S.K., Lin, Y.-A., Yu, Y.Y.: Map-Reduce for machine learning on multicore. In: Proceedings of the Twentieth Annual Conference on Neural Information Processing Systems, Vancouver, Canada, pp. 281–288 (2006)
12.
go back to reference Hastie, T., Tibshirani, R., Friedman, J.: The elements of statistical learning: data mining, inference and prediction, 533 p. Springer, New York (2001)CrossRefMATH Hastie, T., Tibshirani, R., Friedman, J.: The elements of statistical learning: data mining, inference and prediction, 533 p. Springer, New York (2001)CrossRefMATH
13.
go back to reference Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufman, San Francisco (2001)MATH Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufman, San Francisco (2001)MATH
14.
go back to reference Holte, R.C.: Very simple classification rules perform well on most commonly used datasets. Mach. Learn. 11, 63–90 (1993)CrossRefMATH Holte, R.C.: Very simple classification rules perform well on most commonly used datasets. Mach. Learn. 11, 63–90 (1993)CrossRefMATH
15.
go back to reference Witten, I.H., Eibe, F., Hall, M.A.: Data Mining Practical Machine Learning Tools and Techniques, 3rd edn., 629 pp. Morgan Kaufmann, San Francisco (2011) Witten, I.H., Eibe, F., Hall, M.A.: Data Mining Practical Machine Learning Tools and Techniques, 3rd edn., 629 pp. Morgan Kaufmann, San Francisco (2011)
16.
go back to reference Bernstein, A.J.: Program analysis for parallel processing. IEEE Trans. Electron. Comput. EC-15, 757–762 (1966)CrossRefMATH Bernstein, A.J.: Program analysis for parallel processing. IEEE Trans. Electron. Comput. EC-15, 757–762 (1966)CrossRefMATH
Metadata
Title
A Functional Approach to Parallelizing Data Mining Algorithms in Java
Authors
Ivan Kholod
Andrey Shorov
Sergei Gorlatch
Copyright Year
2017
DOI
https://doi.org/10.1007/978-3-319-62932-2_44

Premium Partner