skip to main content
10.1145/3589806.3600032acmotherconferencesArticle/Chapter ViewAbstractPublication Pagesacm-repConference Proceedingsconference-collections
research-article

KheOps: Cost-effective Repeatability, Reproducibility, and Replicability of Edge-to-Cloud Experiments

Published:28 June 2023Publication History

ABSTRACT

Distributed infrastructures for computation and analytics are now evolving towards an interconnected ecosystem allowing complex scientific workflows to be executed across hybrid systems spanning from IoT Edge devices to Clouds, and sometimes to supercomputers (the Computing Continuum). Understanding the performance trade-offs of large-scale workflows deployed on such complex Edge-to-Cloud Continuum is challenging. To achieve this, one needs to systematically perform experiments, to enable their reproducibility and allow other researchers to replicate the study and the obtained conclusions on different infrastructures. This breaks down to the tedious process of reconciling the numerous experimental requirements and constraints with low-level infrastructure design choices.

To address the limitations of the main state-of-the-art approaches for distributed, collaborative experimentation, such as Google Colab, Kaggle, and Code Ocean, we propose KheOps, a collaborative environment specifically designed to enable cost-effective reproducibility and replicability of Edge-to-Cloud experiments. KheOps is composed of three core elements: (1) an experiment repository; (2) a notebook environment; and (3) a multi-platform experiment methodology.

We illustrate KheOps with a real-life Edge-to-Cloud application. The evaluations explore the point of view of the authors of an experiment described in an article (who aim to make their experiments reproducible) and the perspective of their readers (who aim to replicate the experiment). The results show how KheOps helps authors to systematically perform repeatable and reproducible experiments on the Grid5000 + FIT IoT LAB testbeds. Furthermore, KheOps helps readers to cost-effectively replicate authors experiments in different infrastructures such as Chameleon Cloud + CHI@Edge testbeds, and obtain the same conclusions with high accuracies (> 88% for all performance metrics).

Skip Supplemental Material Section

Supplemental Material

References

  1. [n. d.]. Artifact Review and Badging Version 1.1.https://www.acm.org/publications/policies/artifact-review-and-badging-currentGoogle ScholarGoogle Scholar
  2. [n. d.]. What is Docker Hub?Retrieved Jun 1, 2023 from https://www.docker.com/products/docker-hub/Google ScholarGoogle Scholar
  3. 2018. Dool (Dstat) monitoring.Retrieved Jan 14, 2023 from https://github.com/scottchiefbaker/doolGoogle ScholarGoogle Scholar
  4. 2018. GitHub.Retrieved Jan 14, 2023 from https://github.com/Google ScholarGoogle Scholar
  5. 2018. Zenodo.Retrieved Jan 14, 2023 from https://zenodo.org/Google ScholarGoogle Scholar
  6. 2019. E2Clab source code.Retrieved Jan 14, 2023 from https://gitlab.inria.fr/E2Clab/e2clabGoogle ScholarGoogle Scholar
  7. 2023. AI Hub.Retrieved Jan 14, 2023 from https://aihub.cloud.google.com/Google ScholarGoogle Scholar
  8. 2023. Apache Zeppelin.Retrieved Jan 15, 2023 from https://zeppelin.apache.org/Google ScholarGoogle Scholar
  9. 2023. Code Ocean Explore: Open Science Library.Retrieved Jan 19, 2023 from https://codeocean.com/exploreGoogle ScholarGoogle Scholar
  10. 2023. Colab: Cloud Storage from the command line.Retrieved Jan 18, 2023 from https://cloud.google.com/storage/docs/gsutilGoogle ScholarGoogle Scholar
  11. 2023. Colab: Google Spreadsheets.Retrieved Jan 18, 2023 from https://github.com/burnash/gspread#more-examplesGoogle ScholarGoogle Scholar
  12. 2023. Compute skylake cluster at [email protected] Feb 16, 2023 from https://www.chameleoncloud.org/hardware/node/sites/tacc/clusters/chameleon/nodes/0b0bceb9-14bf-423e-890f-3ef187511d71/Google ScholarGoogle Scholar
  13. 2023. Dahu cluster.Retrieved Feb 16, 2023 from https://www.grid5000.fr/w/Grenoble:Hardware#dahuGoogle ScholarGoogle Scholar
  14. 2023. Docker.Retrieved Jan 18, 2023 from https://www.docker.com/Google ScholarGoogle Scholar
  15. 2023. E2Clab User Defined Services.Retrieved Feb 8, 2023 from https://gitlab.inria.fr/E2Clab/user-defined-servicesGoogle ScholarGoogle Scholar
  16. 2023. Experiment artifacts.Retrieved Feb 8, 2023 from https://www.chameleoncloud.org/experiment/share/347adbf3-7c14-4834-b802-b45fdd0d9564Google ScholarGoogle Scholar
  17. 2023. Experiment results.Retrieved Jan 14, 2023 from https://gitlab.inria.fr/E2Clab/Paper-ArtifactsGoogle ScholarGoogle Scholar
  18. 2023. Google Colab.Retrieved Jan 17, 2023 from https://colab.research.google.com/Google ScholarGoogle Scholar
  19. 2023. Google Colab: Frequently Asked Questions.Retrieved Jan 18, 2023 from https://research.google.com/colaboratory/faq.htmlGoogle ScholarGoogle Scholar
  20. 2023. Google Colab vs Kaggle. Retrieved Jan 20, 2023 from https://datasciencenotebook.org/compare/colab/kaggleGoogle ScholarGoogle Scholar
  21. 2023. Kaggle community.Retrieved Jan 19, 2023 from https://www.kaggle.com/Google ScholarGoogle Scholar
  22. 2023. Kaggle datasets.Retrieved Jan 20, 2023 from https://www.kaggle.com/datasetsGoogle ScholarGoogle Scholar
  23. 2023. MQTT: The Standard for IoT Messaging.Retrieved Feb 16, 2023 from https://mqtt.org/Google ScholarGoogle Scholar
  24. 2023. Python zlib.Retrieved Feb 16, 2023 from https://docs.python.org/3/library/zlib.htmlGoogle ScholarGoogle Scholar
  25. 2023. Raspberry Pi 3 Model B.Retrieved Feb 16, 2023 from https://www.iot-lab.info/docs/boards/raspberry-pi-3/Google ScholarGoogle Scholar
  26. 2023. Raspberry Pi 4.Retrieved Feb 16, 2023 from https://chameleoncloud.org/experiment/chiedge/hardware-info/Google ScholarGoogle Scholar
  27. 2023. SC: The largest Reproducibility Laboratory.Retrieved Feb 8, 2023 from https://www.chameleoncloud.org/blog/2023/02/20/sc-the-largest-reproducibility-laboratory/Google ScholarGoogle Scholar
  28. 2023. Trovi: Practical Open Reproducibility.Retrieved Jan 20, 2023 from https://chameleoncloud.gitbook.io/trovi/Google ScholarGoogle Scholar
  29. 2023. Yocto Project.Retrieved Jan 14, 2023 from https://www.yoctoproject.org/Google ScholarGoogle Scholar
  30. 2023. Zooniverse dataset.Retrieved Feb 16, 2023 from https://www.zooniverse.org/organizations/meredithspalmer/snapshot-safariGoogle ScholarGoogle Scholar
  31. Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, 2016. Tensorflow: A system for large-scale machine learning. In 12th { USENIX} symposium on operating systems design and implementation ({ OSDI} 16). 265–283.Google ScholarGoogle Scholar
  32. Cedric Adjih, Emmanuel Baccelli, Eric Fleury, Gaetan Harter, Nathalie Mitton, Thomas Noel, Roger Pissard-Gibollet, Frederic Saint-Marcel, Guillaume Schreiner, Julien Vandaele, 2015. FIT IoT-LAB: A large scale open experimental IoT testbed. In 2015 IEEE 2nd World Forum on Internet of Things (WF-IoT). IEEE, 459–464.Google ScholarGoogle Scholar
  33. Jason Anderson and Kate Keahey. 2019. A case for integrating experimental containers with notebooks. In 2019 IEEE International Conference on Cloud Computing Technology and Science (CloudCom). IEEE, 151–158.Google ScholarGoogle ScholarCross RefCross Ref
  34. L. A. Barba and G. K. Thiruvathukal. 2017. Reproducible Research for Computing in Science Engineering. Computing in Science Engineering 19, 6 (2017), 85–87.Google ScholarGoogle ScholarCross RefCross Ref
  35. Raphaël Bolze, Franck Cappello, Eddy Caron, Michel Dayde, Frédéric Desprez, Emmanuel Jeannot, Yvon Jégou, Stephane Lanteri, Julien Leduc, Nouredine Melab, Guillaume Mornet, Raymond Namyst, Pascale Primet, Benjamin Quétier, Olivier Richard, El-Ghazali Talbi, and Iréa Touche. 2006. Grid’5000: A Large Scale And Highly Reconfigurable Experimental Grid Testbed. International Journal of High Performance Computing Applications 20, 4 (2006), 481–494. https://doi.org/10.1177/1094342006070078Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Ronan-Alexandre Cherrueau, Marie Delavergne, Alexandre Van Kempen, Adrien Lebre, Dimitri Pertin, Javier Rojas Balderrama, Anthony Simonet, and Matthieu Simonin. 2021. Enoslib: A library for experiment-driven research in distributed computing. IEEE Transactions on Parallel and Distributed Systems 33, 6 (2021), 1464–1477.Google ScholarGoogle ScholarCross RefCross Ref
  37. April Clyburne-Sherin, Xu Fei, and Seth Ariel Green. 2019. Computational reproducibility via containers in psychology. Meta-psychology 3 (2019).Google ScholarGoogle Scholar
  38. Geoff Cumming, Fiona Fidler, and David L Vaux. 2007. Error bars in experimental biology. The Journal of cell biology 177, 1 (2007), 7–11.Google ScholarGoogle ScholarCross RefCross Ref
  39. ETP4HPC. April 29, 2020. ETP4HPC Strategic Research Agenda. https://www.etp4hpc.eu/sra.html.Google ScholarGoogle Scholar
  40. Odd Erik Gundersen, Yolanda Gil, and David W Aha. 2018. On reproducible AI: Towards reproducible research, open science, and digital scholarship in AI publications. AI magazine 39, 3 (2018), 56–68.Google ScholarGoogle Scholar
  41. Benjamin Haibe-Kains, George Alexandru Adam, Ahmed Hosny, Farnoosh Khodakarami, Massive Analysis Quality Control (MAQC) Society Board of Directors Shraddha Thakkar 35 Kusko Rebecca 36 Sansone Susanna-Assunta 37 Tong Weida 35 Wolfinger Russ D. 38 Mason Christopher E. 39 Jones Wendell 40 Dopazo Joaquin 41 Furlanello Cesare 42, Levi Waldron, Bo Wang, Chris McIntosh, Anna Goldenberg, Anshul Kundaje, 2020. Transparency and reproducibility in artificial intelligence. Nature 586, 7829 (2020), E14–E16.Google ScholarGoogle Scholar
  42. Kate Keahey. 2020. The Silver Lining. IEEE Internet Computing 24, 4 (2020), 55–59.Google ScholarGoogle ScholarCross RefCross Ref
  43. Kate Keahey, Jason Anderson, Michael Sherman, Zhuo Zhen, Mark Powers, Isabel Brunkan, and Adam Cooper. 2021. Chameleon@Edge Community Workshop Report.Google ScholarGoogle Scholar
  44. Kate Keahey, Jason Anderson, Zhuo Zhen, Pierre Riteau, Paul Ruth, Dan Stanzione, Mert Cevik, Jacob Colleran, Haryadi S Gunawi, Cody Hammock, 2020. Lessons learned from the chameleon testbed. In 2020 USENIX Annual Technical Conference (USENIX ATC 20). 219–233.Google ScholarGoogle Scholar
  45. Thomas Kluyver, Benjamin Ragan-Kelley, Fernando Pérez, Brian E Granger, Matthias Bussonnier, Jonathan Frederic, Kyle Kelley, Jessica B Hamrick, Jason Grout, Sylvain Corlay, 2016. Jupyter Notebooks-a publishing format for reproducible computational workflows. Vol. 2016.Google ScholarGoogle Scholar
  46. Matthew S Krafczyk, A Shi, Adhithya Bhaskar, D Marinov, and Victoria Stodden. 2021. Learning from reproducing computational results: introducing three principles and the Reproduction Package. Philosophical Transactions of the Royal Society A 379, 2197 (2021), 20200069.Google ScholarGoogle Scholar
  47. Ling Liu and M Tamer Özsu. 2009. Encyclopedia of database systems. Vol. 6. Springer.Google ScholarGoogle Scholar
  48. Engineering National Academies of Sciences, Medicine, 2019. Reproducibility and replicability in science. National Academies Press.Google ScholarGoogle Scholar
  49. Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, 2019. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems 32 (2019), 8026–8037.Google ScholarGoogle Scholar
  50. Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, 2011. Scikit-learn: Machine learning in Python. the Journal of machine Learning research 12 (2011), 2825–2830.Google ScholarGoogle Scholar
  51. Daniel Rosendo, Alexandru Costan, Gabriel Antoniu, Matthieu Simonin, Jean-Christophe Lombardo, Alexis Joly, and Patrick Valduriez. 2021. Reproducible Performance Optimization of Complex Applications on the Edge-to-Cloud Continuum. In Cluster 2021 - IEEE International Conference on Cluster Computing. Portland, OR, United States, 23–34. https://doi.org/10.1109/Cluster48925.2021.00043Google ScholarGoogle Scholar
  52. Daniel Rosendo, Alexandru Costan, Patrick Valduriez, and Gabriel Antoniu. 2022. Distributed intelligence on the Edge-to-Cloud Continuum: A systematic literature review. Journal of Parallel and Distributed Computing 166 (Aug. 2022), 71–94. https://doi.org/10.1016/j.jpdc.2022.04.004Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Daniel Rosendo, Pedro Silva, Matthieu Simonin, Alexandru Costan, and Gabriel Antoniu. 2020. E2Clab: Exploring the Computing Continuum through Repeatable, Replicable and Reproducible Edge-to-Cloud Experiments. In Cluster 2020 - IEEE International Conference on Cluster Computing. Kobe, Japan, 1–11. https://doi.org/10.1109/CLUSTER49012.2020.00028Google ScholarGoogle Scholar
  54. Renan Souza, Vítor Silva, Jose J. Camata, Alvaro L. G. A. Coutinho, Patrick Valduriez, and Marta Mattoso. 2019. Keeping Track of User Steering Actions in Dynamic Workflows. Future Generation Computer Systems 99 (2019), 624–643. https://doi.org/10.1016/j.future.2019.05.011Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Victoria Stodden, Marcia McNutt, David H Bailey, Ewa Deelman, Yolanda Gil, Brooks Hanson, Michael A Heroux, John PA Ioannidis, and Michela Taufer. 2016. Enhancing reproducibility for computational methods. Science 354, 6317 (2016), 1240–1241.Google ScholarGoogle Scholar
  56. Victoria Stodden and Sheila Miguez. 2014. Best Practices for Computational Science: Software Infrastructure and Environments for Reproducible and Extensible Research. Journal of Open Research Software (Jul 2014). https://openresearchsoftware.metajnl.com/articles/10.5334/jors.ayGoogle ScholarGoogle ScholarCross RefCross Ref
  57. Mark D Wilkinson, Michel Dumontier, IJsbrand Jan Aalbersberg, Gabrielle Appleton, Myles Axton, Arie Baak, Niklas Blomberg, Jan-Willem Boiten, Luiz Bonino da Silva Santos, Philip E Bourne, 2016. The FAIR Guiding Principles for scientific data management and stewardship. Scientific data 3, 1 (2016), 1–9.Google ScholarGoogle Scholar

Index Terms

  1. KheOps: Cost-effective Repeatability, Reproducibility, and Replicability of Edge-to-Cloud Experiments

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Other conferences
          ACM REP '23: Proceedings of the 2023 ACM Conference on Reproducibility and Replicability
          June 2023
          127 pages
          ISBN:9798400701764
          DOI:10.1145/3589806

          Copyright © 2023 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 28 June 2023

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed limited
        • Article Metrics

          • Downloads (Last 12 months)64
          • Downloads (Last 6 weeks)5

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format