nach oben

Progress in Artificial Intelligence

Erschienen in:

01.03.2013 | Review

A survey of methods for distributed machine learning

verfasst von: Diego Peteiro-Barral, Bertha Guijarro-Berdiñas

Erschienen in: Progress in Artificial Intelligence | Ausgabe 1/2013

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Traditionally, a bottleneck preventing the development of more intelligent systems was the limited amount of data available. Nowadays, the total amount of information is almost incalculable and automatic data analyzers are even more needed. However, the limiting factor is the inability of learning algorithms to use all the data to learn within a reasonable time. In order to handle this problem, a new field in machine learning has emerged: large-scale learning. In this context, distributed learning seems to be a promising line of research since allocating the learning process among several workstations is a natural way of scaling up learning algorithms. Moreover, it allows to deal with data sets that are naturally distributed, a frequent situation in many real applications. This study provides some background regarding the advantages of distributed environments as well as an overview of distributed learning for dealing with “very large” data sets.

Nächster Artikel Learning domain structure through probabilistic policy reuse in reinforcement learning

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

School of Information and Management and Systems. How much information? http://www2.sims.berkeley.edu/research/projects/how-much-info/internet.html (2000). Accessed 27 Sept 2010

D-Lib Magazine. A research library based on the historical collections of the Internet Archive. http://www.dlib.org/dlib/february06/arms/02arms.html (2006). Accessed 27 Oct 2010

Catlett, J.: Megainduction: machine learning on very large databases. PhD thesis, School of Computer Science, University of Technology, Sydney, Australia (1991)

Bottou, L., Bousquet, O.: The tradeoffs of large scale learning. Adv. Neural Inf. Process. Syst. 20, 161–168 (2008)

Sonnenburg, S., Ratsch, G., Rieck, K.: Large scale learning with string kernels. In: Bottou, L., Chapelle, O., DeCoste, D., Weston, J. (eds.) Large Scale Kernel Machines, pp. 73–104. MIT Press, Cambridge (2007)

Moretti, C., Steinhaeuser, K., Thain, D., Chawla, N.V.: Scaling up classifiers to cloud computers. In: Proceedings of the 8th IEEE International Conference on Data Mining (ICDM), pp. 472–481 (2008)

Krishnan, S., Bhattacharyya, C., Hariharan, R.: A randomized algorithm for large scale support vector learning. In: Proceedings of Advances in Neural Information Processing Systems (NIPS), pp. 793–800 (2008)

Raina, R., Madhavan, A., Ng., A.Y.: Large-scale deep unsupervised learning using graphics processors. In: Proceedings of the 26th Annual International Conference on Machine Learning (ICML), pp. 873–880 (2009)

Provost, F., Kolluri, V.: A survey of methods for scaling up inductive algorithms. Data Min. Knowl. Discov. 3(2), 131–169 (1999)CrossRef

10.

Giordana, A., Neri, F.: Search-intensive concept induction. Evol. Comput. 3(4), 375–416 (1995)CrossRef

11.

Anglano, C., Botta, M.: Now g-net: learning classification programs on networks of workstations. IEEE Trans. Evol. Comput. 6(5), 463–480 (2002)CrossRef

12.

Rodríguez, M., Escalante, D.M., Peregrín, A.: Efficient distributed genetic algorithm for rule extraction. Appl. Soft Comput. 11(1), 733–743 (2011)CrossRef

13.

Lopez, L.I., Bardallo, J.M., De Vega, M.A., Peregrin, A.: Regal-tc: a distributed genetic algorithm for concept learning based on regal and the treatment of counterexamples. Soft Comput. 15(7), 1389–1403 (2011)CrossRef

14.

Trelles, O., Prins, P., Snir, M., Jansen, R.C.: Big data, but are we ready? Nat. Rev. Genetics 12(3), 224–224 (2011)CrossRef

15.

Stoica, I.: A berkeley view of big data: algorithms, machines and people. In: UC Berkeley EECS Annual Research, Symposium (2011)

16.

LaValle, S., Lesser, E., Shockley, R., Hopkins, M.S., Kruschwitz, N.: Big data, analytics and the path from insights to value. MIT Sloan Manag. Rev. 52(2), 21–32 (2011)

17.

Borthakur, D.: The hadoop distributed file system: architecture and design. Hadoop Project Website 11, 21 (2007)

18.

Caragea, D., Silvescu, A., Honavar, V.: Analysis and synthesis of agents that learn from distributed dynamic data sources. In: Wermter, S., Austin, J., Willshaw D.J. (eds.) Emergent Neural Computational Architectures Based on Neuroscience, pp. 547–559. Springer-Verlag, Berlin (2001)

19.

Tsoumakas, G., Vlahavas, I.: Distributed data mining. In: Erickson J. (ed.) Database Technologies: Concepts, Methodologies, Tools, and Applications, pp. 157–171. IGI Global, Hershey (2009)

20.

Kargupta, H., Byung-Hoon, D.H., Johnson, E.: Collective Data Mining: A New Perspective Toward Distributed Data Analysis. In: Kargupta, H., Chan, P. (eds.) Advances in Distributed and Parallel Knowledge Discovery, AAAI Press/The MIT Press, Menlo Park (1999)

21.

Dietterich, T.: Ensemble methods in machine learning. In: Gayar, N.E., Kittler, J., Roli, F. (eds.) Multiple classifier systems, pp. 1–15. Springer, New York (2000)

22.

Guo, Y., Sutiwaraphun, J.: Probing knowledge in distributed data mining. In: Zhong, N., Zhou, L. (eds.) Methodologies for Knowledge Discovery and Data Mining, pp. 443–452. Springer, Berlin (1999)

23.

Hansen, L.K., Salamon, P.: Neural network ensembles. IEEE Trans. Pattern Anal. Mach. Intell. 12(10), 993–1001 (1990)CrossRef

24.

Chan, P.K., Stolfo, S.J.: Toward parallel and distributed learning by meta-learning. In: AAAI Workshop in Knowledge Discovery in Databases, pp. 227–240 (1993)

25.

Kittler, J.: Combining classifiers: a theoretical framework. Pattern Anal. Appl. 1(1), 18–27 (1998)MathSciNetCrossRef

26.

Ho, T.K., Hull, J.J., Srihari, S.N.: Decision combination in multiple classifier systems. IEEE Trans. Pattern Anal. Mach. Intell. 16(1), 66–75 (1994)CrossRef

27.

Kittler, J., Hatef, M., Duin, R.P.W., Matas, J.: On combining classifiers. IEEE Trans. Pattern Anal. Mach. Intell. 20(3), 226–239 (1998)CrossRef

28.

Wolpert, D.H.: Stacked generalization. Neural Netw. 5(2), 241–259 (1992)MathSciNetCrossRef

29.

Breiman, L.: Pasting small votes for classification in large databases and on-line. Mach. Learn. 36(1), 85–103 (1999)CrossRef

30.

Breiman. L.: Out-of-bag estimation. Technical report. Available at ftp://ftp.stat.berkeley.edu/pub/users/breiman/OOBestimation.ps (1996)

31.

Chawla, N., Hall, L., Bowyer, K., Moore, T., Kegelmeyer, W.: Distributed pasting of small votes. In: Gayar, N.E., Kittler, J., Roli, F. (eds.) Multiple Classifier Systems, pp. 52–61. Springer, New York (2002)

32.

Tsoumakas G., Vlahavas, I.: Effective stacking of distributed classifiers. In: ECAI, pp. 340–344 (2002)

33.

Lazarevic, A., Obradovic, Z.: Boosting algorithms for parallel and distributed learning. Distrib. Parallel Databases 11(2), 203–229 (2002)MATHCrossRef

34.

Freund,Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: International Conference on Machine Learning, pp. 148–156. Morgan Kaufmann Publishers, Inc., San Francisco (1996)

35.

Hand, D.J., Mannila, H., Smyth, P.: Principles of data mining. The MIT press, Cambridge (2001)

36.

Tsoumakas, G., Katakis, I., Vlahavas, I.: Effective voting of heterogeneous classifiers. In: Machine Learning: ECML, pp. 465–476 (2004)

37.

Woods, K., Kegelmeyer, W.P. Jr., Bowyer, K.: Combination of multiple classifiers using local accuracy estimates. IEEE Trans. Pattern Anal. Mach. Intell. 19(4), 405–410 (1997)

38.

Gama, J., Rodrigues, P.P., Sebastião, R.: Evaluating algorithms that learn from data streams. In: Proceedings of the 2009 ACM symposium on Applied Computing (ACM), pp. 1496–1500 (2009)

39.

Urban, P., Défago, X., Schiper, A.: Neko: a single environment to simulate and prototype distributed algorithms. In: 15th International Conference on Information Networking, pp. 503–511. IEEE (2001)

40.

Tsoumakas, G., Angelis, L., Vlahavas, I.: Clustering classifiers for knowledge discovery from physically distributed databases. Data Knowl. Eng. 49(3), 223–242 (2004)CrossRef

41.

Sonnenburg, S., Franc, V., Yom-Tov, E., Sebag, M.: PASCAL large scale Learning challenge. In: 25th International Conference on Machine Learning (ICML2008) Workshop. http://largescale.first.fraunhofer.de. J. Mach. Learn. Res. 10, 1937–1953 (2008)

42.

Peteiro-Barral, D., Bolon-Canedo, V., Alonso-Betanzos, A., Guijarro-Berdinas, B., Sanchez-Marono, N.: Scalability analysis of filter-based methods for feature selection. In: Howlett R. (ed.) Advances in Smart Systems Research, vol. 2, no. 1, pp. 21–26. Future Technology Publications, Shoreham-by-sea, UK (2012)

Titel: A survey of methods for distributed machine learning
verfasst von: Diego Peteiro-Barral
Bertha Guijarro-Berdiñas
Publikationsdatum: 01.03.2013
Verlag: Springer-Verlag
Erschienen in: Progress in Artificial Intelligence / Ausgabe 1/2013
Print ISSN: 2192-6352
Elektronische ISSN: 2192-6360
DOI: https://doi.org/10.1007/s13748-012-0035-5

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 1/2013

Learning domain structure through probabilistic policy reuse in reinforcement learning

Interval feature extraction for classification of event-related potentials (ERP) in EEG data analysis

Boosting for class-imbalanced datasets using genetically evolved supervised non-linear projections

Passive–aggressive online distance metric learning and extensions

A novel framework for class imbalance learning using intelligent under-sampling

The quest for the optimal class distribution: an approach for enhancing the effectiveness of learning via resampling methods for imbalanced data sets