Skip to main content
Erschienen in: Knowledge and Information Systems 1/2019

20.06.2018 | Regular Paper

Modeling and implementing distributed data mining strategies in JaCa-DDM

verfasst von: Xavier Limón, Alejandro Guerra-Hernández, Nicandro Cruz-Ramírez, Francisco Grimaldo

Erschienen in: Knowledge and Information Systems | Ausgabe 1/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This work introduces JaCa-DDM, a novel distributed data mining system founded on the agents and artifacts paradigm, conceived to design, implement, deploy, and evaluate learning strategies. Jason rational agents conform to such strategies to cope with distributed computing environments, where CArtAgO artifacts encapsulate learning algorithms, data sources, evaluation tools, and other services implemented in Weka for data mining tasks. The set of strategies presented in this paper aims at encouraging the use of JaCa-DDM to develop new ones, suited to different needs. For this, our system provides tools to evaluate the resulting models in terms of accuracy, number of instances employed to learn, time of convergence, and volume of communications. Although the emphasis in decision trees, JaCa-DDM can be easily extended by adopting new artifacts, e.g., for meta-learning. The main contributions of the paper are as follows: (i) From the multi-agent systems perspective, our approach illustrates how to exploit the so-called “agentification” of Weka for the sake of code reusability, while preserving the benefits of reasoning at the Belief–Desire–Intention level with Jason; (ii) from the data mining perspective, JaCa-DDM is promoted as an extensible tool to define and test distributed strategies; and (iii) a set of strategies including centralizing, meta-learning and Windowing-based approaches, is carefully analyzed to provide comparisons among them.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Literatur
1.
Zurück zum Zitat Albashiri KA, Coenen F (2009) Agent-enriched data mining using an extendable framework. In: Agents and data mining interaction. Springer, pp 53–68 Albashiri KA, Coenen F (2009) Agent-enriched data mining using an extendable framework. In: Agents and data mining interaction. Springer, pp 53–68
2.
Zurück zum Zitat Bache K, Lichman M (2013) UCI machine learning repository Bache K, Lichman M (2013) UCI machine learning repository
3.
Zurück zum Zitat Baik SW, Bala J, Cho JS (2005) Agent based distributed data mining. In: Parallel and distributed computing: applications and technologies. Springer, pp 42–45 Baik SW, Bala J, Cho JS (2005) Agent based distributed data mining. In: Parallel and distributed computing: applications and technologies. Springer, pp 42–45
4.
Zurück zum Zitat Bailey S, Grossman R, Sivakumar H, Turinsky A (1999) Papyrus: a system for data mining over local and wide area clusters and super-clusters. In: Proceedings of the 1999 ACM/IEEE conference on Supercomputing. ACM, p 63 Bailey S, Grossman R, Sivakumar H, Turinsky A (1999) Papyrus: a system for data mining over local and wide area clusters and super-clusters. In: Proceedings of the 1999 ACM/IEEE conference on Supercomputing. ACM, p 63
5.
Zurück zum Zitat Bellifemine F, Caire G, Greenwood D (2007) Developing multi-agent systems with JADE. Wiley, LondonCrossRef Bellifemine F, Caire G, Greenwood D (2007) Developing multi-agent systems with JADE. Wiley, LondonCrossRef
6.
Zurück zum Zitat Benesty J, Chen J, Huang Y, Cohen I (2009) Pearson correlation coefficient. In: Noise reduction in speech processing. Springer, pp 1–4 Benesty J, Chen J, Huang Y, Cohen I (2009) Pearson correlation coefficient. In: Noise reduction in speech processing. Springer, pp 1–4
7.
Zurück zum Zitat Bifet A, Holmes G, Kirkby R, Pfahringer B (2010) Moa: massive online analysis. J Mach Learn Res 11:1601–1604 Bifet A, Holmes G, Kirkby R, Pfahringer B (2010) Moa: massive online analysis. J Mach Learn Res 11:1601–1604
8.
Zurück zum Zitat Bordini RH, Hübner JF, Wooldridge M (2007) Programming multi-agent systems in agent-speak using Jason. Wiley, LondonCrossRefMATH Bordini RH, Hübner JF, Wooldridge M (2007) Programming multi-agent systems in agent-speak using Jason. Wiley, LondonCrossRefMATH
9.
Zurück zum Zitat Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140MATH Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140MATH
10.
Zurück zum Zitat Caire G, Quarantotto E, Sacchi G (2009) Wade: an open source platform for workflows and agents. In: MALLOW Caire G, Quarantotto E, Sacchi G (2009) Wade: an open source platform for workflows and agents. In: MALLOW
11.
Zurück zum Zitat Cao L, Weiss G, Philip SY (2012) A brief introduction to agent mining. Auton Agents Multi Agent Syst 25(3):419–424CrossRef Cao L, Weiss G, Philip SY (2012) A brief introduction to agent mining. Auton Agents Multi Agent Syst 25(3):419–424CrossRef
12.
Zurück zum Zitat Cao L (2009) Data mining and multi-agent integration. Springer, Berlin Heidelberg New York LondonCrossRefMATH Cao L (2009) Data mining and multi-agent integration. Springer, Berlin Heidelberg New York LondonCrossRefMATH
13.
Zurück zum Zitat Cao L, Bazzan ALC, Gorodetsky V, Mitkas PA, Weiss G, Philip SY (2010) Agents and data mining interaction: 6th ADMI 2010, Toronto, ON, Canada, volume 5980 ofLecture Notes in Artificial Intelligence. Springer Verlag, Berlin Heidelberg Cao L, Bazzan ALC, Gorodetsky V, Mitkas PA, Weiss G, Philip SY (2010) Agents and data mining interaction: 6th ADMI 2010, Toronto, ON, Canada, volume 5980 ofLecture Notes in Artificial Intelligence. Springer Verlag, Berlin Heidelberg
14.
Zurück zum Zitat Cao L, Gorodetsky V, Liu J, Gerhard G, Philip SY (2009) Agents and data mining interaction: 4th ADMI, Budapes, Hungary, vol 5680. Lecture notes in artificial intelligence. Springer Verlag, Berlin Heidelberg New York Cao L, Gorodetsky V, Liu J, Gerhard G, Philip SY (2009) Agents and data mining interaction: 4th ADMI, Budapes, Hungary, vol 5680. Lecture notes in artificial intelligence. Springer Verlag, Berlin Heidelberg New York
15.
Zurück zum Zitat Chan PK, Stolfo SJ (1997) On the accuracy of meta-learning for scalable data mining. J Intell Inf Syst 8(1):5–28CrossRef Chan PK, Stolfo SJ (1997) On the accuracy of meta-learning for scalable data mining. J Intell Inf Syst 8(1):5–28CrossRef
16.
Zurück zum Zitat Cumming G (2012) Understanding the new statistics: effect sizes, confidence intervals, and meta-analysis. Routledge, London Cumming G (2012) Understanding the new statistics: effect sizes, confidence intervals, and meta-analysis. Routledge, London
17.
Zurück zum Zitat Da Silva JC, Giannella C, Bhargava R, Kargupta H, Klusch M (2005) Distributed data mining and agents. Eng Appl Artif Intell 18(7):791–807CrossRef Da Silva JC, Giannella C, Bhargava R, Kargupta H, Klusch M (2005) Distributed data mining and agents. Eng Appl Artif Intell 18(7):791–807CrossRef
18.
Zurück zum Zitat Domingos P, Hulten G (2000) Mining high-speed data streams. In: Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 71–80 Domingos P, Hulten G (2000) Mining high-speed data streams. In: Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 71–80
19.
Zurück zum Zitat Finin T et al (1992) An overview of KQML: a knowledge query and manipulation language. Technical report, University of Maryland, CS Department, Finin T et al (1992) An overview of KQML: a knowledge query and manipulation language. Technical report, University of Maryland, CS Department,
20.
Zurück zum Zitat Freund Y, Schapire RE et al (1996) Experiments with a new boosting algorithm. ICML 96:148–156 Freund Y, Schapire RE et al (1996) Experiments with a new boosting algorithm. ICML 96:148–156
21.
Zurück zum Zitat Fürnkranz J (1998) Integrative windowing. arXiv preprint cs/9805101 Fürnkranz J (1998) Integrative windowing. arXiv preprint cs/9805101
22.
Zurück zum Zitat Gorodetsky V, Karsaeyv O, Samoilov V (2003) Multi-agent technology for distributed data mining and classification. In: Intelligent agent technology, 2003. IAT 2003. IEEE/WIC international conference on. IEEE, pp 438–441 Gorodetsky V, Karsaeyv O, Samoilov V (2003) Multi-agent technology for distributed data mining and classification. In: Intelligent agent technology, 2003. IAT 2003. IEEE/WIC international conference on. IEEE, pp 438–441
23.
Zurück zum Zitat Guo Y, Sutiwaraphun J (1998) Knowledge probing in distributed data mining. In: Working notes of the KDD-97 workshop on distributed data mining. pp 61–69 Guo Y, Sutiwaraphun J (1998) Knowledge probing in distributed data mining. In: Working notes of the KDD-97 workshop on distributed data mining. pp 61–69
24.
Zurück zum Zitat Hulten G, Spencer L, Domingos P (2001) Mining time-changing data streams. In: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 97–106 Hulten G, Spencer L, Domingos P (2001) Mining time-changing data streams. In: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 97–106
25.
Zurück zum Zitat Kargupta H, Byung-Hoon DH, Johnson E (1999) Collective data mining: a new perspective toward distributed data analysis. In: Advances in distributed and parallel knowledge discovery. Citeseer Kargupta H, Byung-Hoon DH, Johnson E (1999) Collective data mining: a new perspective toward distributed data analysis. In: Advances in distributed and parallel knowledge discovery. Citeseer
26.
Zurück zum Zitat Klusch M, Lodi S, Moro G (2003) Agent-based distributed data mining: The kdec scheme. In: Intelligent information agents. Springer, pp 104–122 Klusch M, Lodi S, Moro G (2003) Agent-based distributed data mining: The kdec scheme. In: Intelligent information agents. Springer, pp 104–122
27.
Zurück zum Zitat Klusch M, Lodi S, Moro G (2003) Issues of agent-based distributed data mining. In: Proceedings of the second international joint conference on Autonomous agents and multiagent systems. ACM, pp 1034–1035 Klusch M, Lodi S, Moro G (2003) Issues of agent-based distributed data mining. In: Proceedings of the second international joint conference on Autonomous agents and multiagent systems. ACM, pp 1034–1035
28.
Zurück zum Zitat Limón X, Guerra-Hernández A, Cruz-Ramírez N, Grimaldo F (2013) An agents and artifacts approach to distributed data mining. In Castro F, Gelbukh A, Mendoza MG (eds), 11th MICAI, volume 8266 ofLNAI. Springer, Berlin Heidelbergpp 338–349 Limón X, Guerra-Hernández A, Cruz-Ramírez N, Grimaldo F (2013) An agents and artifacts approach to distributed data mining. In Castro F, Gelbukh A, Mendoza MG (eds), 11th MICAI, volume 8266 ofLNAI. Springer, Berlin Heidelbergpp 338–349
29.
Zurück zum Zitat Luo P, He Q, Huang R, Lin F, Shi Z (2005) Execution engine of meta-learning system for kdd in multi-agent environment. In: AIS-ADM, volume 3505 of LNAI. Springer, Berlin Heidelberg, pp 149–160 Luo P, He Q, Huang R, Lin F, Shi Z (2005) Execution engine of meta-learning system for kdd in multi-agent environment. In: AIS-ADM, volume 3505 of LNAI. Springer, Berlin Heidelberg, pp 149–160
30.
Zurück zum Zitat Moemeng C, Gorodetsky V, Zuo Z, Yang Y, Zhang C (2009) Agent-based distributed data mining: a survey. In: Data mining and multi-agent integration. Springer, pp 47–58 Moemeng C, Gorodetsky V, Zuo Z, Yang Y, Zhang C (2009) Agent-based distributed data mining: a survey. In: Data mining and multi-agent integration. Springer, pp 47–58
31.
Zurück zum Zitat Moemeng C, Zhu X, Cao L (2010) Integrating workflow into agent-based distributed data mining systems. In: Agents and data mining interaction. Springer, pp 4–15 Moemeng C, Zhu X, Cao L (2010) Integrating workflow into agent-based distributed data mining systems. In: Agents and data mining interaction. Springer, pp 4–15
32.
Zurück zum Zitat Moemeng C, Zhu X, Cao L, Jiahang C (2010) i-analyst: an agent-based distributed data mining platform. In: Data mining workshops (ICDMW), 2010 IEEE international conference on. IEEE, pp1404–1406 Moemeng C, Zhu X, Cao L, Jiahang C (2010) i-analyst: an agent-based distributed data mining platform. In: Data mining workshops (ICDMW), 2010 IEEE international conference on. IEEE, pp1404–1406
33.
Zurück zum Zitat Nguyen H-L, Woon Y-K, Ng W-K (2015) A survey on data stream clustering and classification. Knowl Inf Syst 45(3):535–569CrossRef Nguyen H-L, Woon Y-K, Ng W-K (2015) A survey on data stream clustering and classification. Knowl Inf Syst 45(3):535–569CrossRef
34.
Zurück zum Zitat Omicini A, Ricci A, Viroli M (2008) Artifacts in the A&A meta-model for multi-agent systems. Auton Agents Multi Agent Syst 17(3):432–456CrossRef Omicini A, Ricci A, Viroli M (2008) Artifacts in the A&A meta-model for multi-agent systems. Auton Agents Multi Agent Syst 17(3):432–456CrossRef
35.
Zurück zum Zitat Park B-H, Kargupta H (2002) Distributed data mining: algorithms, systems, and applications. pp 341–358 Park B-H, Kargupta H (2002) Distributed data mining: algorithms, systems, and applications. pp 341–358
36.
Zurück zum Zitat Prodromidis A, Chan P, Stolfo S (2000) Meta-learning in distributed data mining systems: issues and approaches. Adv Distrib Parallel Knowl Discov 3:81–114 Prodromidis A, Chan P, Stolfo S (2000) Meta-learning in distributed data mining systems: issues and approaches. Adv Distrib Parallel Knowl Discov 3:81–114
37.
Zurück zum Zitat Quinlan JR (1993) C4. 5: programs for machine learning, vol 1. Morgan kaufmann, Burlington Quinlan JR (1993) C4. 5: programs for machine learning, vol 1. Morgan kaufmann, Burlington
38.
Zurück zum Zitat Raftery AE, Madigan D, Hoeting JA (1997) Bayesian model averaging for linear regression models. J Am Stat Assoc 92(437):179–191MathSciNetCrossRefMATH Raftery AE, Madigan D, Hoeting JA (1997) Bayesian model averaging for linear regression models. J Am Stat Assoc 92(437):179–191MathSciNetCrossRefMATH
39.
Zurück zum Zitat Rao AS (1996) AgentSpeak(L): BDI agents speak out in a logical computable language. In: van Hoe R (ed) Seventh European Workshop on Modelling Autonomous Agents in a Multi-Agent World. Eindhoven, The Netherlands Rao AS (1996) AgentSpeak(L): BDI agents speak out in a logical computable language. In: van Hoe R (ed) Seventh European Workshop on Modelling Autonomous Agents in a Multi-Agent World. Eindhoven, The Netherlands
40.
Zurück zum Zitat Rao VS (2009) Multi agent-based distributed data mining: an overview. Int J Rev Comput 3:83–92 Rao VS (2009) Multi agent-based distributed data mining: an overview. Int J Rev Comput 3:83–92
41.
Zurück zum Zitat Ricci A, Piunti M, Viroli M (2011) Environment programming in multi-agent systems: an artifact-based perspective. Auton Agents Multi Agent Syst 23(2):158–192CrossRef Ricci A, Piunti M, Viroli M (2011) Environment programming in multi-agent systems: an artifact-based perspective. Auton Agents Multi Agent Syst 23(2):158–192CrossRef
42.
Zurück zum Zitat Ricci A, Viroli M, Omicini A (2006) Construenda est cartago: toward an infrastructure for artifacts in MAS. Cybern Syst 2:569–574 Ricci A, Viroli M, Omicini A (2006) Construenda est cartago: toward an infrastructure for artifacts in MAS. Cybern Syst 2:569–574
43.
Zurück zum Zitat Secretan J (2009) An architecture for high-performance privacy-preserving and distributed data mining. PhD thesis, University of Central Florida Orlando, Florida, Orlando, FL., USA Secretan J (2009) An architecture for high-performance privacy-preserving and distributed data mining. PhD thesis, University of Central Florida Orlando, Florida, Orlando, FL., USA
45.
Zurück zum Zitat Stolfo SJ, Prodromidis AL, Tselepis S, Lee W, Fan DW, Chan PK (1997) Jam: Java agents for meta-learning over distributed databases. In: KDD volume 97, pp 74–81 Stolfo SJ, Prodromidis AL, Tselepis S, Lee W, Fan DW, Chan PK (1997) Jam: Java agents for meta-learning over distributed databases. In: KDD volume 97, pp 74–81
46.
Zurück zum Zitat Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San FranciscoMATH Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San FranciscoMATH
47.
48.
Zurück zum Zitat Xu J, Li Y, Li L, Chen Y (2014) Sampling based multi-agent joint learning for association rule mining. In: Proceedings of the 2014 international conference on Autonomous agents and multi-agent systems. International Foundation for Autonomous Agents and Multiagent Systems, pp 1469–1470 Xu J, Li Y, Li L, Chen Y (2014) Sampling based multi-agent joint learning for association rule mining. In: Proceedings of the 2014 international conference on Autonomous agents and multi-agent systems. International Foundation for Autonomous Agents and Multiagent Systems, pp 1469–1470
49.
Zurück zum Zitat Xu L, Jordan MI (1993) Em learning on a generalized finite mixture model for combining multiple classifiers. In: Proceedings of the world congress on neural networks, volume 4, pp 227–230 Xu L, Jordan MI (1993) Em learning on a generalized finite mixture model for combining multiple classifiers. In: Proceedings of the world congress on neural networks, volume 4, pp 227–230
50.
Zurück zum Zitat Zhong N, Matsui Y, Okuno T, Liu C (2002) Framework of a multi-agent kdd system. In: Intelligent data engineering and automated learning—IDEAL 2002. Springer, pp 337–346 Zhong N, Matsui Y, Okuno T, Liu C (2002) Framework of a multi-agent kdd system. In: Intelligent data engineering and automated learning—IDEAL 2002. Springer, pp 337–346
Metadaten
Titel
Modeling and implementing distributed data mining strategies in JaCa-DDM
verfasst von
Xavier Limón
Alejandro Guerra-Hernández
Nicandro Cruz-Ramírez
Francisco Grimaldo
Publikationsdatum
20.06.2018
Verlag
Springer London
Erschienen in
Knowledge and Information Systems / Ausgabe 1/2019
Print ISSN: 0219-1377
Elektronische ISSN: 0219-3116
DOI
https://doi.org/10.1007/s10115-018-1222-x

Weitere Artikel der Ausgabe 1/2019

Knowledge and Information Systems 1/2019 Zur Ausgabe