skip to main content
10.1145/3290605.3300493acmconferencesArticle/Chapter ViewAbstractPublication PageschiConference Proceedingsconference-collections
research-article
Public Access
Honorable Mention

Practitioners Teaching Data Science in Industry and Academia: Expectations, Workflows, and Challenges

Authors Info & Claims
Published:02 May 2019Publication History

ABSTRACT

Data science has been growing in prominence across both academia and industry, but there is still little formal consensus about how to teach it. Many people who currently teach data science are practitioners such as computational researchers in academia or data scientists in industry. To understand how these practitioner-instructors pass their knowledge onto novices and how that contrasts with teaching more traditional forms of programming, we interviewed 20 data scientists who teach in settings ranging from small-group workshops to large online courses. We found that: 1) they must empathize with a diverse array of student backgrounds and expectations, 2) they teach technical workflows that integrate authentic practices surrounding code, data, and communication, 3) they face challenges involving authenticity versus abstraction in software setup, finding and curating pedagogically-relevant datasets, and acclimating students to live with uncertainty in data analysis. These findings can point the way toward better tools for data science education and help bring data literacy to more people around the world.

References

  1. 2017. The 50 Most Popular MOOCs of All Time. http://www. onlinecoursereport.com/the-50-most-popular-moocs-of-all-time/.Google ScholarGoogle Scholar
  2. 2017. The Complete List of Data Science Bootcamps & Fellowships. http://www.skilledup.com/articles/list-data-science-bootcamps.Google ScholarGoogle Scholar
  3. 2017. Project Jupyter. http://jupyter.org/.Google ScholarGoogle Scholar
  4. 2018. Data Carpentry: Building communities teaching universal data literacy. https://datacarpentry.org/. Accessed: 2018-09--20.Google ScholarGoogle Scholar
  5. 2018. DataCamp: Learn R, Python & Data Science Online. https: //www.datacamp.com/. Accessed: 2018-09--20.Google ScholarGoogle Scholar
  6. 2018. Dataquest: Learn Data Science With Python And R Projects. https://www.dataquest.io/. Accessed: 2018-09--20.Google ScholarGoogle Scholar
  7. 2018. ggplot2 is a system for declaratively creating graphics, based on The Grammar of Graphics. https://ggplot2.tidyverse.org/. Accessed: 2018-09--20.Google ScholarGoogle Scholar
  8. 2018. JupyterHub: A multi-user version of the notebook designed for companies, classrooms and research labs. http://jupyter.org/hub. Accessed: 2018-09--20.Google ScholarGoogle Scholar
  9. 2018. R Markdown: Analyze. Share. Reproduce. https://rmarkdown. rstudio.com/. Accessed: 2018-09--20.Google ScholarGoogle Scholar
  10. 2018. RStudio for the Enterprise. https://www.rstudio.com/products/ rstudio-server-pro/. Accessed: 2018-09--20.Google ScholarGoogle Scholar
  11. 2018. TPI: Teaching Perspectives Inventory. http://www. teachingperspectives.com/tpi/. Accessed: 2018-09--20.Google ScholarGoogle Scholar
  12. 2018. Welcome to GP! GP is a free, general-purpose blocks programming language. https://gpblocks.org/. Accessed: 2018-09--20.Google ScholarGoogle Scholar
  13. U.S. General Services Administration. 2018. The home of the U.S. Government's open data. https://www.data.gov/. Accessed: 2018-0920.Google ScholarGoogle Scholar
  14. Ruth E. Anderson, Michael D. Ernst, Robert Ordóñez, Paul Pham, and Ben Tribelhorn. 2015. A Data Programming CS1 Course. In Proceedings of the 46th ACM Technical Symposium on Computer Science Education (SIGCSE '15). ACM, New York, NY, USA, 150--155. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Craig Anslow, John Brosz, Frank Maurer, and Mike Boyes. 2016. Datathons: An Experience Report of Data Hackathons for Data Science Education. In Proceedings of the 47th ACM Technical Symposium on Computing Science Education (SIGCSE '16). ACM, New York, NY, USA, 615--620. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Austin Cory Bart, Dennis Kafura, Clifford A. Shaffer, and Eli Tilevich. 2018. Reconciling the Promise and Pragmatics of Enhancing Computing Pedagogy with Data Science. In Proceedings of the 49th ACM Technical Symposium on Computer Science Education (SIGCSE '18). ACM, New York, NY, USA, 1029--1034. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Austin Cory Bart, Ryan Whitcomb, Dennis Kafura, Clifford A. Shaffer, and Eli Tilevich. 2017. Computing with CORGIS: Diverse, Real-world Datasets for Introductory Computing. In Proceedings of the 2017 ACM SIGCSE Technical Symposium on Computer Science Education (SIGCSE '17). ACM, New York, NY, USA, 57--62. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Nick Bertoni and Scott Keeter. 2018. How to access Pew Research Center survey data. http://www.pewresearch.org/fact-tank/2018/03/ 09/how-to-access-pew-research-center-survey-data/. Accessed: 201809--20.Google ScholarGoogle Scholar
  19. Rolf Biehler. 1997. Software for learning and for doing statistics. International Statistical Review 65, 2 (1997), 167--189.Google ScholarGoogle ScholarCross RefCross Ref
  20. Bootstrap. 2018. Data Science Curriculum (Spring 2018 edition). http://www.bootstrapworld.org/materials/spring2018/courses/ data-science/english/. Accessed: 2018-09-01.Google ScholarGoogle Scholar
  21. Robert J. Brunner and Edward J. Kim. 2016. Teaching Data Science. Procedia Comput. Sci. 80, C (June 2016), 1947--1956. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Ricardo Caceffo, Steve Wolfman, Kellogg S. Booth, and Rodolfo Azevedo. 2016. Developing a Computer Science Concept Inventory for Introductory Programming. In Proceedings of the 47th ACM Technical Symposium on Computing Science Education (SIGCSE '16). ACM, New York, NY, USA, 364--369. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Parmit K. Chilana, Rishabh Singh, and Philip J. Guo. 2016. Understanding Conversational Programmers: A Perspective from the Software Industry. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (CHI '16). ACM, New York, NY, USA, 1462--1472. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Juliet M. Corbin and Anselm L. Strauss. 2008. Basics of qualitative research: techniques and procedures for developing grounded theory. SAGE Publications, Inc.Google ScholarGoogle Scholar
  25. Sarah Dahlby Albright, Titus H. Klinge, and Samuel A. Rebelsky. 2018. A Functional Approach to Data Science in CS1. In Proceedings of the 49th ACM Technical Symposium on Computer Science Education (SIGCSE '18). ACM, New York, NY, USA, 1035--1040. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Sayamindu Dasgupta and Benjamin Mako Hill. 2017. Scratch Community Blocks: Supporting Children As Data Scientists. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (CHI '17). ACM, New York, NY, USA, 3620--3631. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Brian Dorn and Mark Guzdial. 2006. Graphic Designers Who Program As Informal Computer Science Learners. In Proceedings of the Second International Workshop on Computing Education Research (ICER '06). ACM, New York, NY, USA, 127--134. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Brian Dorn and Mark Guzdial. 2010. Discovering Computing: Perspectives of Web Designers. In Proceedings of the Sixth International Workshop on Computing Education Research (ICER '10). ACM, New York, NY, USA, 23--30. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Brian Dorn and Mark Guzdial. 2010. Learning on the Job: Characterizing the Programming Knowledge and Learning Strategies of Web Designers. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '10). ACM, New York, NY, USA, 703--712. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Mohammed F. Farghally, Kyu Han Koh, Jeremy V. Ernst, and Clifford A. Shaffer. 2017. Towards a Concept Inventory for Algorithm Analysis Topics. In Proceedings of the 2017 ACM SIGCSE Technical Symposium on Computer Science Education (SIGCSE '17). ACM, New York, NY, USA, 207--212. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Leon A Gatys, Alexander S Ecker, and Matthias Bethge. 2016. Image style transfer using convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2414--2423.Google ScholarGoogle ScholarCross RefCross Ref
  32. Philip J. Guo. 2012. Software Tools to Facilitate Research Programming. Ph.D. Dissertation. Stanford University.Google ScholarGoogle Scholar
  33. Mark Guzdial. 2015. Learner-Centered Design of Computing Education: Research on Computing for Everyone. Synthesis Lectures on Human-Centered Informatics 8, 6 (2015), 1--165.Google ScholarGoogle ScholarCross RefCross Ref
  34. Olaf A. Hall-Holt and Kevin R. Sanft. 2015. Statistics-infused Introduction to Computer Science. In Proceedings of the 46th ACM Technical Symposium on Computer Science Education (SIGCSE '15). ACM, New York, NY, USA, 138--143. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. J. Hardin, R. Hoerl, Nicholas J. Horton, D. Nolan, B. Baumer, O. HallHolt, P. Murrell, R. Peng, P. Roback, D. Temple Lang, and M. D. Ward. 2015. Data Science in Statistics Curricula: Preparing Students to ?Think with Data". The American Statistician 69, 4 (2015), 343--353.Google ScholarGoogle ScholarCross RefCross Ref
  36. Andrew Head, Elena Glassman, Gustavo Soares, Ryo Suzuki, Lucas Figueredo, Loris D'Antoni, and Björn Hartmann. 2017. Writing Reusable Code Feedback at Scale with Mixed-Initiative Program Synthesis. In Proceedings of the Fourth (2017) ACM Conference on Learning @ Scale (L@S '17). ACM, New York, NY, USA, 89--98. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Stephanie C. Hicks and Rafael A. Irizarry. 2017. A Guide to Teaching Data Science. The American Statistician 0, ja (2017), 00--00.Google ScholarGoogle Scholar
  38. Suz Hinton. 2017. Lessons from my first year of live coding on Twitch. https://medium.freecodecamp.org/ lessons-from-my-first-year-of-live-coding- on-twitch-41a32e2f41c1.Google ScholarGoogle Scholar
  39. Daniela Huppenkothen, Anthony Arendt, David W. Hogg, Karthik Ram, Jacob T. VanderPlas, and Ariel Rokem. 2018. Hack weeks as a model for data science education and collaboration. Proceedings of the National Academy of Sciences (2018). arXiv:http://www.pnas.org/content/early/2018/08/17/1717196115.full.pdfGoogle ScholarGoogle ScholarCross RefCross Ref
  40. Justin Johnson, Alexandre Alahi, and Li Fei-Fei. 2016. Perceptual losses for real-time style transfer and super-resolution. In European Conference on Computer Vision. Springer, 694--711.Google ScholarGoogle ScholarCross RefCross Ref
  41. Sean Kandel, Andreas Paepcke, Joseph M. Hellerstein, and Jeffrey Heer. 2012. Enterprise Data Analysis and Visualization: An Interview Study. IEEE Transactions on Visualization and Computer Graphics 18, 12 (Dec. 2012), 2917--2926. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Mary Beth Kery, Amber Horvath, and Brad Myers. 2017. Variolite: Supporting Exploratory Programming by Data Scientists. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (CHI '17). ACM, New York, NY, USA, 1265--1276. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Mary Beth Kery and Brad A. Myers. 2017. Exploring exploratory programming. In 2017 IEEE Symposium on Visual Languages and HumanCentric Computing (VL/HCC). 25--29.Google ScholarGoogle Scholar
  44. Miryung Kim, Thomas Zimmermann, Robert DeLine, and Andrew Begel. 2016. The Emerging Role of Data Scientists on Software Development Teams. In Proceedings of the 38th International Conference on Software Engineering (ICSE '16). ACM, New York, NY, USA, 96--107. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Donald E. Knuth. 1984. Literate Programming. Comput. J. 27, 2 (May 1984), 97--111. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Andrew J. Ko, Robin Abraham, Laura Beckwith, Alan Blackwell, Margaret Burnett, Martin Erwig, Chris Scaffidi, Joseph Lawrance, Henry Lieberman, Brad Myers, Mary Beth Rosson, Gregg Rothermel, Mary Shaw, and Susan Wiedenbeck. 2011. The State of the Art in End-user Software Engineering. ACM Comput. Surv. 43, 3, Article 21 (April 2011), 44 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Andrew J. Ko, Brad A. Myers, and Htet Htet Aung. 2004. Six Learning Barriers in End-User Programming Systems. In Proceedings of the 2004 IEEE Symposium on Visual Languages - Human Centric Computing (VLHCC '04). IEEE Computer Society, Washington, DC, USA, 199--206. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. J. Lave and E. Wenger. 1991. Situated Learning: Legitimate Peripheral Participation. Cambridge University Press.Google ScholarGoogle ScholarCross RefCross Ref
  49. Steve Lohr. 2017. Where the STEM Jobs Are (and Where They Aren't). New York Times.Google ScholarGoogle Scholar
  50. Geraldine Mason and Annette Jinks. 1994. Examining the role of the practitioner-teacher in nursing. British Journal of Nursing 3, 20 (1994), 1063--1072. arXiv:https://doi.org/10.12968/bjon.1994.3.20.1063 PMID: 7827455.Google ScholarGoogle ScholarCross RefCross Ref
  51. Justin Matejka and George Fitzmaurice. 2017. Same Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics Through Simulated Annealing. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (CHI '17). ACM, New York, NY, USA, 1290--1294. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Amelia Ahlers McNamara. 2015. Bridging the gap between tools for learning and for doing statistics. Ph.D. Dissertation. UCLA.Google ScholarGoogle Scholar
  53. Kevin Mickey. 2013. The best teacher is a practitioner. http://polis. iupui.edu/index.php/the-best-teacher-is-a-practitioner/.Google ScholarGoogle Scholar
  54. Lijun Ni. 2011. Building Professional Identity As Computer Science Teachers: Supporting High School Computer Science Teachers Through Reflection and Community Building. Ph.D. Dissertation. Atlanta, GA, USA. Advisor(s) Guzdial, Mark. AAI3500584. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Lijun Ni and Mark Guzdial. 2012. Who AM I?: Understanding High School Computer Science Teachers' Professional Identity. In Proceedings of the 43rd ACM Technical Symposium on Computer Science Education (SIGCSE '12). ACM, New York, NY, USA, 499--504. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. NIST.gov. 2018. Engineering statistics handbook: Measures of Skewness and Kurtosis. https://www.itl.nist.gov/div898/handbook/eda/ section3/eda35b.htm. Accessed: 2018-09--20.Google ScholarGoogle Scholar
  57. Natasha Noy. 2018. Making it easier to discover datasets. https://www. blog.google/products/search/making-it-easier-discover-datasets/. Accessed: 2018-09--20.Google ScholarGoogle Scholar
  58. The University of Michigan. 2018. ICPSR Timeline. https://www.icpsr. umich.edu/icpsrweb/content/about/history/timeline.html. Accessed: 2018-09--20.Google ScholarGoogle Scholar
  59. American Association of University Professors. 2018. Professors of Practice. https://www.aaup.org/report/professors-practice. Accessed: 2018-09--20.Google ScholarGoogle Scholar
  60. Leo Porter, Mark Guzdial, Charlie McDowell, and Beth Simon. 2013. Success in Introductory Programming: What Works? Commun. ACM 56, 8 (Aug. 2013), 34--36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Bina Ramamurthy. 2016. A Practical and Sustainable Model for Learning and Teaching Data Science. In Proceedings of the 47th ACM Technical Symposium on Computing Science Education (SIGCSE '16). ACM, New York, NY, USA, 169--174. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Mitchel Resnick, John Maloney, Andrés Monroy-Hernández, Natalie Rusk, Evelyn Eastmond, Karen Brennan, Amon Millner, Eric Rosenbaum, Jay Silver, Brian Silverman, and Yasmin Kafai. 2009. Scratch: Programming for All. Commun. ACM 52, 11 (Nov. 2009), 60--67. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Reudismam Rolim, Gustavo Soares, Loris D'Antoni, Oleksandr Polozov, Sumit Gulwani, Rohit Gheyi, Ryo Suzuki, and Björn Hartmann. 2017. Learning Syntactic Program Transformations from Examples. In Proceedings of the 39th International Conference on Software Engineering (ICSE '17). IEEE Press, Piscataway, NJ, USA, 404--415. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Adam Rule, Aurélien Tabard, and James D. Hollan. 2018. Exploration and Explanation in Computational Notebooks. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI '18). ACM, New York, NY, USA, Article 32, 12 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Judith Segal. 2007. Some Problems of Professional End User Developers. In Proceedings of the IEEE Symposium on Visual Languages and HumanCentric Computing (VLHCC '07). IEEE Computer Society, Washington, DC, USA, 111--118. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Kent Smith. 2013. A Brief History of NCBIâ's Formation and Growth. https://www.ncbi.nlm.nih.gov/books/NBK148949/. Accessed: 2018-0920.Google ScholarGoogle Scholar
  67. Sarah L.R. Stevens, Mateusz Kuzak, Carlos Martinez, Aurelia Moser, Petra M. Bleeker, and Marc Galland. 2018. Building a local community of practice in scientific programming for Life Scientists. bioRxiv (2018).Google ScholarGoogle Scholar
  68. Allison Elliott Tew and Mark Guzdial. 2011. The FCS1: A Language Independent Assessment of CS1 Knowledge. In Proceedings of the 42Nd ACM Technical Symposium on Computer Science Education (SIGCSE '11). ACM, New York, NY, USA, 111--116. Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Rachel Treisman. 2017. Yale to offer new major in data science. http://yaledailynews.com/blog/2017/03/08/ yale-to-offer-new-major-in-data-science/.Google ScholarGoogle Scholar
  70. Alexa Vanhooser. 2018. UC Berkeley announces data science pipeline program for students. The Daily Californian.Google ScholarGoogle Scholar
  71. Allegra Via, Thomas Blicher, Erik Bongcam-Rudloff, Michelle D. Brazas, Cath Brooksbank, Aidan Budd, Javier De Las Rivas, Jacqueline Dreyer, Pedro L. Fernandes, Celia van Gelder, Joachim Jacob, Rafael C. Jimenez, Jane Loveland, Federico Moran, Nicola Mulder, Tommi Nyronen, Kristian Rother, Maria Victoria Schneider, and Teresa K. Attwood. 2013. Best practices in bioinformatics training for life scientists. Briefings in Bioinformatics 14, 5 (2013), 528--537.Google ScholarGoogle ScholarCross RefCross Ref
  72. Clifford H Wagner. 1982. Simpson's paradox in real life. The American Statistician 36, 1 (1982), 46--48.Google ScholarGoogle ScholarCross RefCross Ref
  73. April Y. Wang, Ryan Mitts, Philip J. Guo, and Parmit K. Chilana. 2018. Mismatch of Expectations: How Modern Learning Resources Fail Conversational Programmers. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI '18). ACM, New York, NY, USA, Article 511, 13 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. Hadley Wickham. 2014. Tidy Data. Journal of Statistical Software 59, 1 (2014), 1--23.Google ScholarGoogle ScholarCross RefCross Ref
  75. G. Wilson. 2006. Software Carpentry: Getting Scientists to Write Better Code by Making Them More Productive. Computing in Science Engineering 8, 6 (Nov 2006), 66--69. Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. Greg Wilson. 2018. End-User Teachers. http://third-bit.com/2018/06/ 20/end-user-teachers.html. Accessed: 2018-09-01.Google ScholarGoogle Scholar
  77. Alexey Zagalsky, Joseph Feliciano, Margaret-Anne Storey, Yiyun Zhao, and Weiliang Wang. 2015. The Emergence of GitHub As a Collaborative Platform for Education. In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing (CSCW '15). ACM, New York, NY, USA, 1906--1917. Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. 2017. Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. In IEEE International Conference on Computer Vision.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Practitioners Teaching Data Science in Industry and Academia: Expectations, Workflows, and Challenges

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CHI '19: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems
      May 2019
      9077 pages
      ISBN:9781450359702
      DOI:10.1145/3290605

      Copyright © 2019 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 2 May 2019

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      CHI '19 Paper Acceptance Rate703of2,958submissions,24%Overall Acceptance Rate6,199of26,314submissions,24%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format