Skip to main content
Top

2019 | OriginalPaper | Chapter

Parallelizing Convergent Cross Mapping Using Apache Spark

Authors : Bo Pu, Lujie Duan, Nathaniel D. Osgood

Published in: Social, Cultural, and Behavioral Modeling

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Identifying the causal relationships between subjects or variables remains an important problem across various scientific fields. This is particularly important but challenging in complex systems, such as those involving human behavior, sociotechnical contexts, and natural ecosystems. By exploiting state space reconstruction via lagged embedding of time series, convergent cross mapping (CCM) serves as an important method for addressing this problem. While powerful, CCM is computationally costly; moreover, CCM results are highly sensitive to several parameter values. While best practice entails exploring a range of parameter settings when assessing casual relationships, the resulting computational burden can raise barriers to practical use, especially for long time series exhibiting weak causal linkages. We demonstrate here several means of accelerating CCM by harnessing the distributed Apache Spark platform. We characterize and report on results of several experiments with parallelized solutions that demonstrate high scalability and a capacity for over an order of magnitude performance improvement for the baseline configuration. Such economies in computation time can speed learning and robust identification of causal drivers in complex systems.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Cao, L.: Practical method for determining the minimum embedding dimension of a scalar time series. Phys. D: Nonlinear Phenom. 110(1–2), 43–50 (1997)CrossRef Cao, L.: Practical method for determining the minimum embedding dimension of a scalar time series. Phys. D: Nonlinear Phenom. 110(1–2), 43–50 (1997)CrossRef
2.
go back to reference Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRef Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRef
3.
go back to reference Heylen, R., Burazerovic, D., Scheunders, P.: Fully constrained least squares spectral unmixing by simplex projection. IEEE Transact. Geosci. Remote Sens. 49(11), 4112–4122 (2011)CrossRef Heylen, R., Burazerovic, D., Scheunders, P.: Fully constrained least squares spectral unmixing by simplex projection. IEEE Transact. Geosci. Remote Sens. 49(11), 4112–4122 (2011)CrossRef
4.
go back to reference Kantz, H., Schreiber, T.: Nonlinear Time Series Analysis, vol. 7. Cambridge University Press, Cambridge (2004)MATH Kantz, H., Schreiber, T.: Nonlinear Time Series Analysis, vol. 7. Cambridge University Press, Cambridge (2004)MATH
5.
go back to reference Kugiumtzis, D.: State space reconstruction parameters in the analysis of chaotic time series–the role of the time window length. Phys. D: Nonlinear Phenom. 95(1), 13–28 (1996)CrossRef Kugiumtzis, D.: State space reconstruction parameters in the analysis of chaotic time series–the role of the time window length. Phys. D: Nonlinear Phenom. 95(1), 13–28 (1996)CrossRef
6.
go back to reference Luke, D.A., Stamatakis, K.A.: Systems science methods in public health: dynamics, networks, and agents. Annu. Rev. Public Health 33, 357–376 (2012)CrossRef Luke, D.A., Stamatakis, K.A.: Systems science methods in public health: dynamics, networks, and agents. Annu. Rev. Public Health 33, 357–376 (2012)CrossRef
7.
go back to reference Luo, C., Zheng, X., Zeng, D.: Causal inference in social media using convergent cross mapping. In: 2014 IEEE Joint Intelligence and Security Informatics Conference, pp. 260–263. IEEE (2014) Luo, C., Zheng, X., Zeng, D.: Causal inference in social media using convergent cross mapping. In: 2014 IEEE Joint Intelligence and Security Informatics Conference, pp. 260–263. IEEE (2014)
8.
go back to reference Ma, H., Aihara, K., Chen, L.: Detecting causality from nonlinear dynamics with short-term time series. Sci. Rep. 4, 7464 (2014)CrossRef Ma, H., Aihara, K., Chen, L.: Detecting causality from nonlinear dynamics with short-term time series. Sci. Rep. 4, 7464 (2014)CrossRef
9.
go back to reference Maillo, J., Ramírez, S., Triguero, I., Herrera, F.: kNN-is: An iterative spark-based design of the k-nearest neighbors classifier for big data. Knowl.-Based Syst. 117, 3–15 (2017)CrossRef Maillo, J., Ramírez, S., Triguero, I., Herrera, F.: kNN-is: An iterative spark-based design of the k-nearest neighbors classifier for big data. Knowl.-Based Syst. 117, 3–15 (2017)CrossRef
10.
go back to reference Mønster, D., Fusaroli, R., Tylén, K., Roepstorff, A., Sherson, J.F.: Causal inference from noisy time-series data–testing the convergent cross-mapping algorithm in the presence of noise and external influence. Future Gener. Comput. Syst. 73, 52–62 (2017)CrossRef Mønster, D., Fusaroli, R., Tylén, K., Roepstorff, A., Sherson, J.F.: Causal inference from noisy time-series data–testing the convergent cross-mapping algorithm in the presence of noise and external influence. Future Gener. Comput. Syst. 73, 52–62 (2017)CrossRef
12.
go back to reference Reyes-Ortiz, J.L., Oneto, L., Anguita, D.: Big data analytics in the cloud: spark on hadoop vs MPI/OpenMP on Beowulf. Proc. Comput. Sci. 53, 121–130 (2015)CrossRef Reyes-Ortiz, J.L., Oneto, L., Anguita, D.: Big data analytics in the cloud: spark on hadoop vs MPI/OpenMP on Beowulf. Proc. Comput. Sci. 53, 121–130 (2015)CrossRef
13.
go back to reference Sugihara, G., et al.: Detecting causality in complex ecosystems. Sci. 338(6106), 496–500 (2012)CrossRef Sugihara, G., et al.: Detecting causality in complex ecosystems. Sci. 338(6106), 496–500 (2012)CrossRef
15.
go back to reference Vavilapalli, V.K., et al.: Apache Hadoop YARN: yet another resource negotiator. In: Proceedings of the 4th annual Symposium on Cloud Computing, p. 5. ACM (2013) Vavilapalli, V.K., et al.: Apache Hadoop YARN: yet another resource negotiator. In: Proceedings of the 4th annual Symposium on Cloud Computing, p. 5. ACM (2013)
16.
go back to reference Verma, A.K., Garg, A., Blaber, A., Fazel-Rezai, R., Tavakolian, K.: Analysis of causal cardio-postural interaction under orthostatic stress using convergent cross mapping. In: 2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp. 2319–2322. IEEE (2016) Verma, A.K., Garg, A., Blaber, A., Fazel-Rezai, R., Tavakolian, K.: Analysis of causal cardio-postural interaction under orthostatic stress using convergent cross mapping. In: 2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp. 2319–2322. IEEE (2016)
17.
go back to reference Ye, H., Clark, A., Deyle, E., Sugihara, G.: rEDM: an R package for empirical dynamic modeling and convergent cross-mapping (2016) Ye, H., Clark, A., Deyle, E., Sugihara, G.: rEDM: an R package for empirical dynamic modeling and convergent cross-mapping (2016)
18.
go back to reference Zaharia, M., et al.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, p. 2. USENIX Association (2012) Zaharia, M., et al.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, p. 2. USENIX Association (2012)
19.
go back to reference Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. HotCloud 10(10–10), 95 (2010) Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. HotCloud 10(10–10), 95 (2010)
Metadata
Title
Parallelizing Convergent Cross Mapping Using Apache Spark
Authors
Bo Pu
Lujie Duan
Nathaniel D. Osgood
Copyright Year
2019
DOI
https://doi.org/10.1007/978-3-030-21741-9_14

Premium Partner