Skip to main content

2019 | OriginalPaper | Buchkapitel

Parallelizing Convergent Cross Mapping Using Apache Spark

verfasst von : Bo Pu, Lujie Duan, Nathaniel D. Osgood

Erschienen in: Social, Cultural, and Behavioral Modeling

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Identifying the causal relationships between subjects or variables remains an important problem across various scientific fields. This is particularly important but challenging in complex systems, such as those involving human behavior, sociotechnical contexts, and natural ecosystems. By exploiting state space reconstruction via lagged embedding of time series, convergent cross mapping (CCM) serves as an important method for addressing this problem. While powerful, CCM is computationally costly; moreover, CCM results are highly sensitive to several parameter values. While best practice entails exploring a range of parameter settings when assessing casual relationships, the resulting computational burden can raise barriers to practical use, especially for long time series exhibiting weak causal linkages. We demonstrate here several means of accelerating CCM by harnessing the distributed Apache Spark platform. We characterize and report on results of several experiments with parallelized solutions that demonstrate high scalability and a capacity for over an order of magnitude performance improvement for the baseline configuration. Such economies in computation time can speed learning and robust identification of causal drivers in complex systems.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Cao, L.: Practical method for determining the minimum embedding dimension of a scalar time series. Phys. D: Nonlinear Phenom. 110(1–2), 43–50 (1997)CrossRef Cao, L.: Practical method for determining the minimum embedding dimension of a scalar time series. Phys. D: Nonlinear Phenom. 110(1–2), 43–50 (1997)CrossRef
2.
Zurück zum Zitat Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRef Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRef
3.
Zurück zum Zitat Heylen, R., Burazerovic, D., Scheunders, P.: Fully constrained least squares spectral unmixing by simplex projection. IEEE Transact. Geosci. Remote Sens. 49(11), 4112–4122 (2011)CrossRef Heylen, R., Burazerovic, D., Scheunders, P.: Fully constrained least squares spectral unmixing by simplex projection. IEEE Transact. Geosci. Remote Sens. 49(11), 4112–4122 (2011)CrossRef
4.
Zurück zum Zitat Kantz, H., Schreiber, T.: Nonlinear Time Series Analysis, vol. 7. Cambridge University Press, Cambridge (2004)MATH Kantz, H., Schreiber, T.: Nonlinear Time Series Analysis, vol. 7. Cambridge University Press, Cambridge (2004)MATH
5.
Zurück zum Zitat Kugiumtzis, D.: State space reconstruction parameters in the analysis of chaotic time series–the role of the time window length. Phys. D: Nonlinear Phenom. 95(1), 13–28 (1996)CrossRef Kugiumtzis, D.: State space reconstruction parameters in the analysis of chaotic time series–the role of the time window length. Phys. D: Nonlinear Phenom. 95(1), 13–28 (1996)CrossRef
6.
Zurück zum Zitat Luke, D.A., Stamatakis, K.A.: Systems science methods in public health: dynamics, networks, and agents. Annu. Rev. Public Health 33, 357–376 (2012)CrossRef Luke, D.A., Stamatakis, K.A.: Systems science methods in public health: dynamics, networks, and agents. Annu. Rev. Public Health 33, 357–376 (2012)CrossRef
7.
Zurück zum Zitat Luo, C., Zheng, X., Zeng, D.: Causal inference in social media using convergent cross mapping. In: 2014 IEEE Joint Intelligence and Security Informatics Conference, pp. 260–263. IEEE (2014) Luo, C., Zheng, X., Zeng, D.: Causal inference in social media using convergent cross mapping. In: 2014 IEEE Joint Intelligence and Security Informatics Conference, pp. 260–263. IEEE (2014)
8.
Zurück zum Zitat Ma, H., Aihara, K., Chen, L.: Detecting causality from nonlinear dynamics with short-term time series. Sci. Rep. 4, 7464 (2014)CrossRef Ma, H., Aihara, K., Chen, L.: Detecting causality from nonlinear dynamics with short-term time series. Sci. Rep. 4, 7464 (2014)CrossRef
9.
Zurück zum Zitat Maillo, J., Ramírez, S., Triguero, I., Herrera, F.: kNN-is: An iterative spark-based design of the k-nearest neighbors classifier for big data. Knowl.-Based Syst. 117, 3–15 (2017)CrossRef Maillo, J., Ramírez, S., Triguero, I., Herrera, F.: kNN-is: An iterative spark-based design of the k-nearest neighbors classifier for big data. Knowl.-Based Syst. 117, 3–15 (2017)CrossRef
10.
Zurück zum Zitat Mønster, D., Fusaroli, R., Tylén, K., Roepstorff, A., Sherson, J.F.: Causal inference from noisy time-series data–testing the convergent cross-mapping algorithm in the presence of noise and external influence. Future Gener. Comput. Syst. 73, 52–62 (2017)CrossRef Mønster, D., Fusaroli, R., Tylén, K., Roepstorff, A., Sherson, J.F.: Causal inference from noisy time-series data–testing the convergent cross-mapping algorithm in the presence of noise and external influence. Future Gener. Comput. Syst. 73, 52–62 (2017)CrossRef
11.
12.
Zurück zum Zitat Reyes-Ortiz, J.L., Oneto, L., Anguita, D.: Big data analytics in the cloud: spark on hadoop vs MPI/OpenMP on Beowulf. Proc. Comput. Sci. 53, 121–130 (2015)CrossRef Reyes-Ortiz, J.L., Oneto, L., Anguita, D.: Big data analytics in the cloud: spark on hadoop vs MPI/OpenMP on Beowulf. Proc. Comput. Sci. 53, 121–130 (2015)CrossRef
13.
Zurück zum Zitat Sugihara, G., et al.: Detecting causality in complex ecosystems. Sci. 338(6106), 496–500 (2012)CrossRef Sugihara, G., et al.: Detecting causality in complex ecosystems. Sci. 338(6106), 496–500 (2012)CrossRef
15.
Zurück zum Zitat Vavilapalli, V.K., et al.: Apache Hadoop YARN: yet another resource negotiator. In: Proceedings of the 4th annual Symposium on Cloud Computing, p. 5. ACM (2013) Vavilapalli, V.K., et al.: Apache Hadoop YARN: yet another resource negotiator. In: Proceedings of the 4th annual Symposium on Cloud Computing, p. 5. ACM (2013)
16.
Zurück zum Zitat Verma, A.K., Garg, A., Blaber, A., Fazel-Rezai, R., Tavakolian, K.: Analysis of causal cardio-postural interaction under orthostatic stress using convergent cross mapping. In: 2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp. 2319–2322. IEEE (2016) Verma, A.K., Garg, A., Blaber, A., Fazel-Rezai, R., Tavakolian, K.: Analysis of causal cardio-postural interaction under orthostatic stress using convergent cross mapping. In: 2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp. 2319–2322. IEEE (2016)
17.
Zurück zum Zitat Ye, H., Clark, A., Deyle, E., Sugihara, G.: rEDM: an R package for empirical dynamic modeling and convergent cross-mapping (2016) Ye, H., Clark, A., Deyle, E., Sugihara, G.: rEDM: an R package for empirical dynamic modeling and convergent cross-mapping (2016)
18.
Zurück zum Zitat Zaharia, M., et al.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, p. 2. USENIX Association (2012) Zaharia, M., et al.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, p. 2. USENIX Association (2012)
19.
Zurück zum Zitat Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. HotCloud 10(10–10), 95 (2010) Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. HotCloud 10(10–10), 95 (2010)
Metadaten
Titel
Parallelizing Convergent Cross Mapping Using Apache Spark
verfasst von
Bo Pu
Lujie Duan
Nathaniel D. Osgood
Copyright-Jahr
2019
DOI
https://doi.org/10.1007/978-3-030-21741-9_14

Premium Partner