Skip to main content
Erschienen in: Progress in Artificial Intelligence 1/2013

01.03.2013 | Review

A survey of methods for distributed machine learning

verfasst von: Diego Peteiro-Barral, Bertha Guijarro-Berdiñas

Erschienen in: Progress in Artificial Intelligence | Ausgabe 1/2013

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Traditionally, a bottleneck preventing the development of more intelligent systems was the limited amount of data available. Nowadays, the total amount of information is almost incalculable and automatic data analyzers are even more needed. However, the limiting factor is the inability of learning algorithms to use all the data to learn within a reasonable time. In order to handle this problem, a new field in machine learning has emerged: large-scale learning. In this context, distributed learning seems to be a promising line of research since allocating the learning process among several workstations is a natural way of scaling up learning algorithms. Moreover, it allows to deal with data sets that are naturally distributed, a frequent situation in many real applications. This study provides some background regarding the advantages of distributed environments as well as an overview of distributed learning for dealing with “very large” data sets.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
3.
Zurück zum Zitat Catlett, J.: Megainduction: machine learning on very large databases. PhD thesis, School of Computer Science, University of Technology, Sydney, Australia (1991) Catlett, J.: Megainduction: machine learning on very large databases. PhD thesis, School of Computer Science, University of Technology, Sydney, Australia (1991)
4.
Zurück zum Zitat Bottou, L., Bousquet, O.: The tradeoffs of large scale learning. Adv. Neural Inf. Process. Syst. 20, 161–168 (2008) Bottou, L., Bousquet, O.: The tradeoffs of large scale learning. Adv. Neural Inf. Process. Syst. 20, 161–168 (2008)
5.
Zurück zum Zitat Sonnenburg, S., Ratsch, G., Rieck, K.: Large scale learning with string kernels. In: Bottou, L., Chapelle, O., DeCoste, D., Weston, J. (eds.) Large Scale Kernel Machines, pp. 73–104. MIT Press, Cambridge (2007) Sonnenburg, S., Ratsch, G., Rieck, K.: Large scale learning with string kernels. In: Bottou, L., Chapelle, O., DeCoste, D., Weston, J. (eds.) Large Scale Kernel Machines, pp. 73–104. MIT Press, Cambridge (2007)
6.
Zurück zum Zitat Moretti, C., Steinhaeuser, K., Thain, D., Chawla, N.V.: Scaling up classifiers to cloud computers. In: Proceedings of the 8th IEEE International Conference on Data Mining (ICDM), pp. 472–481 (2008) Moretti, C., Steinhaeuser, K., Thain, D., Chawla, N.V.: Scaling up classifiers to cloud computers. In: Proceedings of the 8th IEEE International Conference on Data Mining (ICDM), pp. 472–481 (2008)
7.
Zurück zum Zitat Krishnan, S., Bhattacharyya, C., Hariharan, R.: A randomized algorithm for large scale support vector learning. In: Proceedings of Advances in Neural Information Processing Systems (NIPS), pp. 793–800 (2008) Krishnan, S., Bhattacharyya, C., Hariharan, R.: A randomized algorithm for large scale support vector learning. In: Proceedings of Advances in Neural Information Processing Systems (NIPS), pp. 793–800 (2008)
8.
Zurück zum Zitat Raina, R., Madhavan, A., Ng., A.Y.: Large-scale deep unsupervised learning using graphics processors. In: Proceedings of the 26th Annual International Conference on Machine Learning (ICML), pp. 873–880 (2009) Raina, R., Madhavan, A., Ng., A.Y.: Large-scale deep unsupervised learning using graphics processors. In: Proceedings of the 26th Annual International Conference on Machine Learning (ICML), pp. 873–880 (2009)
9.
Zurück zum Zitat Provost, F., Kolluri, V.: A survey of methods for scaling up inductive algorithms. Data Min. Knowl. Discov. 3(2), 131–169 (1999)CrossRef Provost, F., Kolluri, V.: A survey of methods for scaling up inductive algorithms. Data Min. Knowl. Discov. 3(2), 131–169 (1999)CrossRef
10.
Zurück zum Zitat Giordana, A., Neri, F.: Search-intensive concept induction. Evol. Comput. 3(4), 375–416 (1995)CrossRef Giordana, A., Neri, F.: Search-intensive concept induction. Evol. Comput. 3(4), 375–416 (1995)CrossRef
11.
Zurück zum Zitat Anglano, C., Botta, M.: Now g-net: learning classification programs on networks of workstations. IEEE Trans. Evol. Comput. 6(5), 463–480 (2002)CrossRef Anglano, C., Botta, M.: Now g-net: learning classification programs on networks of workstations. IEEE Trans. Evol. Comput. 6(5), 463–480 (2002)CrossRef
12.
Zurück zum Zitat Rodríguez, M., Escalante, D.M., Peregrín, A.: Efficient distributed genetic algorithm for rule extraction. Appl. Soft Comput. 11(1), 733–743 (2011)CrossRef Rodríguez, M., Escalante, D.M., Peregrín, A.: Efficient distributed genetic algorithm for rule extraction. Appl. Soft Comput. 11(1), 733–743 (2011)CrossRef
13.
Zurück zum Zitat Lopez, L.I., Bardallo, J.M., De Vega, M.A., Peregrin, A.: Regal-tc: a distributed genetic algorithm for concept learning based on regal and the treatment of counterexamples. Soft Comput. 15(7), 1389–1403 (2011)CrossRef Lopez, L.I., Bardallo, J.M., De Vega, M.A., Peregrin, A.: Regal-tc: a distributed genetic algorithm for concept learning based on regal and the treatment of counterexamples. Soft Comput. 15(7), 1389–1403 (2011)CrossRef
14.
Zurück zum Zitat Trelles, O., Prins, P., Snir, M., Jansen, R.C.: Big data, but are we ready? Nat. Rev. Genetics 12(3), 224–224 (2011)CrossRef Trelles, O., Prins, P., Snir, M., Jansen, R.C.: Big data, but are we ready? Nat. Rev. Genetics 12(3), 224–224 (2011)CrossRef
15.
Zurück zum Zitat Stoica, I.: A berkeley view of big data: algorithms, machines and people. In: UC Berkeley EECS Annual Research, Symposium (2011) Stoica, I.: A berkeley view of big data: algorithms, machines and people. In: UC Berkeley EECS Annual Research, Symposium (2011)
16.
Zurück zum Zitat LaValle, S., Lesser, E., Shockley, R., Hopkins, M.S., Kruschwitz, N.: Big data, analytics and the path from insights to value. MIT Sloan Manag. Rev. 52(2), 21–32 (2011) LaValle, S., Lesser, E., Shockley, R., Hopkins, M.S., Kruschwitz,  N.: Big data, analytics and the path from insights to value. MIT Sloan Manag. Rev. 52(2), 21–32 (2011)
17.
Zurück zum Zitat Borthakur, D.: The hadoop distributed file system: architecture and design. Hadoop Project Website 11, 21 (2007) Borthakur, D.: The hadoop distributed file system: architecture and design. Hadoop Project Website 11, 21 (2007)
18.
Zurück zum Zitat Caragea, D., Silvescu, A., Honavar, V.: Analysis and synthesis of agents that learn from distributed dynamic data sources. In: Wermter, S., Austin, J., Willshaw D.J. (eds.) Emergent Neural Computational Architectures Based on Neuroscience, pp. 547–559. Springer-Verlag, Berlin (2001) Caragea, D., Silvescu, A., Honavar, V.: Analysis and synthesis of agents that learn from distributed dynamic data sources. In: Wermter, S., Austin, J., Willshaw D.J. (eds.) Emergent Neural Computational Architectures Based on Neuroscience, pp. 547–559. Springer-Verlag, Berlin (2001)
19.
Zurück zum Zitat Tsoumakas, G., Vlahavas, I.: Distributed data mining. In: Erickson J. (ed.) Database Technologies: Concepts, Methodologies, Tools, and Applications, pp. 157–171. IGI Global, Hershey (2009) Tsoumakas, G., Vlahavas, I.: Distributed data mining. In: Erickson J. (ed.) Database Technologies: Concepts, Methodologies, Tools, and Applications, pp. 157–171. IGI Global, Hershey (2009)
20.
Zurück zum Zitat Kargupta, H., Byung-Hoon, D.H., Johnson, E.: Collective Data Mining: A New Perspective Toward Distributed Data Analysis. In: Kargupta, H., Chan, P. (eds.) Advances in Distributed and Parallel Knowledge Discovery, AAAI Press/The MIT Press, Menlo Park (1999) Kargupta, H., Byung-Hoon, D.H., Johnson, E.: Collective Data Mining: A New Perspective Toward Distributed Data Analysis. In: Kargupta, H., Chan, P. (eds.) Advances in Distributed and Parallel Knowledge Discovery, AAAI Press/The MIT Press, Menlo Park (1999)
21.
Zurück zum Zitat Dietterich, T.: Ensemble methods in machine learning. In: Gayar, N.E., Kittler, J., Roli, F. (eds.) Multiple classifier systems, pp. 1–15. Springer, New York (2000) Dietterich, T.: Ensemble methods in machine learning. In: Gayar, N.E., Kittler, J., Roli, F. (eds.) Multiple classifier systems, pp. 1–15. Springer, New York (2000)
22.
Zurück zum Zitat Guo, Y., Sutiwaraphun, J.: Probing knowledge in distributed data mining. In: Zhong, N., Zhou, L. (eds.) Methodologies for Knowledge Discovery and Data Mining, pp. 443–452. Springer, Berlin (1999) Guo, Y., Sutiwaraphun, J.: Probing knowledge in distributed data mining. In: Zhong, N., Zhou, L. (eds.) Methodologies for Knowledge Discovery and Data Mining, pp. 443–452. Springer, Berlin (1999)
23.
Zurück zum Zitat Hansen, L.K., Salamon, P.: Neural network ensembles. IEEE Trans. Pattern Anal. Mach. Intell. 12(10), 993–1001 (1990)CrossRef Hansen, L.K., Salamon, P.: Neural network ensembles. IEEE Trans. Pattern Anal. Mach. Intell. 12(10), 993–1001 (1990)CrossRef
24.
Zurück zum Zitat Chan, P.K., Stolfo, S.J.: Toward parallel and distributed learning by meta-learning. In: AAAI Workshop in Knowledge Discovery in Databases, pp. 227–240 (1993) Chan, P.K., Stolfo, S.J.: Toward parallel and distributed learning by meta-learning. In: AAAI Workshop in Knowledge Discovery in Databases, pp. 227–240 (1993)
25.
26.
Zurück zum Zitat Ho, T.K., Hull, J.J., Srihari, S.N.: Decision combination in multiple classifier systems. IEEE Trans. Pattern Anal. Mach. Intell. 16(1), 66–75 (1994)CrossRef Ho, T.K., Hull, J.J., Srihari, S.N.: Decision combination in multiple classifier systems. IEEE Trans. Pattern Anal. Mach. Intell. 16(1), 66–75 (1994)CrossRef
27.
Zurück zum Zitat Kittler, J., Hatef, M., Duin, R.P.W., Matas, J.: On combining classifiers. IEEE Trans. Pattern Anal. Mach. Intell. 20(3), 226–239 (1998)CrossRef Kittler, J., Hatef, M., Duin, R.P.W., Matas, J.: On combining classifiers. IEEE Trans. Pattern Anal. Mach. Intell. 20(3), 226–239 (1998)CrossRef
29.
Zurück zum Zitat Breiman, L.: Pasting small votes for classification in large databases and on-line. Mach. Learn. 36(1), 85–103 (1999)CrossRef Breiman, L.: Pasting small votes for classification in large databases and on-line. Mach. Learn. 36(1), 85–103 (1999)CrossRef
31.
Zurück zum Zitat Chawla, N., Hall, L., Bowyer, K., Moore, T., Kegelmeyer, W.: Distributed pasting of small votes. In: Gayar, N.E., Kittler, J., Roli, F. (eds.) Multiple Classifier Systems, pp. 52–61. Springer, New York (2002) Chawla, N., Hall, L., Bowyer, K., Moore, T., Kegelmeyer, W.: Distributed pasting of small votes. In: Gayar, N.E., Kittler, J., Roli, F. (eds.) Multiple Classifier Systems, pp. 52–61. Springer, New York (2002)
32.
Zurück zum Zitat Tsoumakas G., Vlahavas, I.: Effective stacking of distributed classifiers. In: ECAI, pp. 340–344 (2002) Tsoumakas G., Vlahavas, I.: Effective stacking of distributed classifiers. In: ECAI, pp. 340–344 (2002)
33.
Zurück zum Zitat Lazarevic, A., Obradovic, Z.: Boosting algorithms for parallel and distributed learning. Distrib. Parallel Databases 11(2), 203–229 (2002)MATHCrossRef Lazarevic, A., Obradovic, Z.: Boosting algorithms for parallel and distributed learning. Distrib. Parallel Databases 11(2), 203–229 (2002)MATHCrossRef
34.
Zurück zum Zitat Freund,Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: International Conference on Machine Learning, pp. 148–156. Morgan Kaufmann Publishers, Inc., San Francisco (1996) Freund,Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: International Conference on Machine Learning, pp. 148–156. Morgan Kaufmann Publishers, Inc., San Francisco (1996)
35.
Zurück zum Zitat Hand, D.J., Mannila, H., Smyth, P.: Principles of data mining. The MIT press, Cambridge (2001) Hand, D.J., Mannila, H., Smyth, P.: Principles of data mining. The MIT press, Cambridge (2001)
36.
Zurück zum Zitat Tsoumakas, G., Katakis, I., Vlahavas, I.: Effective voting of heterogeneous classifiers. In: Machine Learning: ECML, pp. 465–476 (2004) Tsoumakas, G., Katakis, I., Vlahavas, I.: Effective voting of heterogeneous classifiers. In: Machine Learning: ECML, pp. 465–476 (2004)
37.
Zurück zum Zitat Woods, K., Kegelmeyer, W.P. Jr., Bowyer, K.: Combination of multiple classifiers using local accuracy estimates. IEEE Trans. Pattern Anal. Mach. Intell. 19(4), 405–410 (1997) Woods, K., Kegelmeyer, W.P. Jr., Bowyer, K.: Combination of multiple classifiers using local accuracy estimates. IEEE Trans. Pattern Anal. Mach. Intell. 19(4), 405–410 (1997)
38.
Zurück zum Zitat Gama, J., Rodrigues, P.P., Sebastião, R.: Evaluating algorithms that learn from data streams. In: Proceedings of the 2009 ACM symposium on Applied Computing (ACM), pp. 1496–1500 (2009) Gama, J., Rodrigues, P.P., Sebastião, R.: Evaluating algorithms that learn from data streams. In: Proceedings of the 2009 ACM symposium on Applied Computing (ACM), pp. 1496–1500 (2009)
39.
Zurück zum Zitat Urban, P., Défago, X., Schiper, A.: Neko: a single environment to simulate and prototype distributed algorithms. In: 15th International Conference on Information Networking, pp. 503–511. IEEE (2001) Urban, P., Défago, X., Schiper, A.: Neko: a single environment to simulate and prototype distributed algorithms. In: 15th International Conference on Information Networking, pp. 503–511. IEEE (2001)
40.
Zurück zum Zitat Tsoumakas, G., Angelis, L., Vlahavas, I.: Clustering classifiers for knowledge discovery from physically distributed databases. Data Knowl. Eng. 49(3), 223–242 (2004)CrossRef Tsoumakas, G., Angelis, L., Vlahavas, I.: Clustering classifiers for knowledge discovery from physically distributed databases. Data Knowl. Eng. 49(3), 223–242 (2004)CrossRef
41.
Zurück zum Zitat Sonnenburg, S., Franc, V., Yom-Tov, E., Sebag, M.: PASCAL large scale Learning challenge. In: 25th International Conference on Machine Learning (ICML2008) Workshop. http://largescale.first.fraunhofer.de. J. Mach. Learn. Res. 10, 1937–1953 (2008) Sonnenburg, S., Franc, V., Yom-Tov, E., Sebag, M.: PASCAL large scale Learning challenge. In: 25th International Conference on Machine Learning (ICML2008) Workshop. http://​largescale.​first.​fraunhofer.​de. J. Mach. Learn. Res. 10, 1937–1953 (2008)
42.
Zurück zum Zitat Peteiro-Barral, D., Bolon-Canedo, V., Alonso-Betanzos, A., Guijarro-Berdinas, B., Sanchez-Marono, N.: Scalability analysis of filter-based methods for feature selection. In: Howlett R. (ed.) Advances in Smart Systems Research, vol. 2, no. 1, pp. 21–26. Future Technology Publications, Shoreham-by-sea, UK (2012) Peteiro-Barral, D., Bolon-Canedo, V., Alonso-Betanzos, A., Guijarro-Berdinas, B., Sanchez-Marono, N.: Scalability analysis of filter-based methods for feature selection. In: Howlett R. (ed.) Advances in Smart Systems Research, vol. 2, no. 1, pp. 21–26. Future Technology Publications, Shoreham-by-sea, UK (2012)
Metadaten
Titel
A survey of methods for distributed machine learning
verfasst von
Diego Peteiro-Barral
Bertha Guijarro-Berdiñas
Publikationsdatum
01.03.2013
Verlag
Springer-Verlag
Erschienen in
Progress in Artificial Intelligence / Ausgabe 1/2013
Print ISSN: 2192-6352
Elektronische ISSN: 2192-6360
DOI
https://doi.org/10.1007/s13748-012-0035-5

Weitere Artikel der Ausgabe 1/2013

Progress in Artificial Intelligence 1/2013 Zur Ausgabe