ABSTRACT
Data science has been growing in prominence across both academia and industry, but there is still little formal consensus about how to teach it. Many people who currently teach data science are practitioners such as computational researchers in academia or data scientists in industry. To understand how these practitioner-instructors pass their knowledge onto novices and how that contrasts with teaching more traditional forms of programming, we interviewed 20 data scientists who teach in settings ranging from small-group workshops to large online courses. We found that: 1) they must empathize with a diverse array of student backgrounds and expectations, 2) they teach technical workflows that integrate authentic practices surrounding code, data, and communication, 3) they face challenges involving authenticity versus abstraction in software setup, finding and curating pedagogically-relevant datasets, and acclimating students to live with uncertainty in data analysis. These findings can point the way toward better tools for data science education and help bring data literacy to more people around the world.
- 2017. The 50 Most Popular MOOCs of All Time. http://www. onlinecoursereport.com/the-50-most-popular-moocs-of-all-time/.Google Scholar
- 2017. The Complete List of Data Science Bootcamps & Fellowships. http://www.skilledup.com/articles/list-data-science-bootcamps.Google Scholar
- 2017. Project Jupyter. http://jupyter.org/.Google Scholar
- 2018. Data Carpentry: Building communities teaching universal data literacy. https://datacarpentry.org/. Accessed: 2018-09--20.Google Scholar
- 2018. DataCamp: Learn R, Python & Data Science Online. https: //www.datacamp.com/. Accessed: 2018-09--20.Google Scholar
- 2018. Dataquest: Learn Data Science With Python And R Projects. https://www.dataquest.io/. Accessed: 2018-09--20.Google Scholar
- 2018. ggplot2 is a system for declaratively creating graphics, based on The Grammar of Graphics. https://ggplot2.tidyverse.org/. Accessed: 2018-09--20.Google Scholar
- 2018. JupyterHub: A multi-user version of the notebook designed for companies, classrooms and research labs. http://jupyter.org/hub. Accessed: 2018-09--20.Google Scholar
- 2018. R Markdown: Analyze. Share. Reproduce. https://rmarkdown. rstudio.com/. Accessed: 2018-09--20.Google Scholar
- 2018. RStudio for the Enterprise. https://www.rstudio.com/products/ rstudio-server-pro/. Accessed: 2018-09--20.Google Scholar
- 2018. TPI: Teaching Perspectives Inventory. http://www. teachingperspectives.com/tpi/. Accessed: 2018-09--20.Google Scholar
- 2018. Welcome to GP! GP is a free, general-purpose blocks programming language. https://gpblocks.org/. Accessed: 2018-09--20.Google Scholar
- U.S. General Services Administration. 2018. The home of the U.S. Government's open data. https://www.data.gov/. Accessed: 2018-0920.Google Scholar
- Ruth E. Anderson, Michael D. Ernst, Robert Ordóñez, Paul Pham, and Ben Tribelhorn. 2015. A Data Programming CS1 Course. In Proceedings of the 46th ACM Technical Symposium on Computer Science Education (SIGCSE '15). ACM, New York, NY, USA, 150--155. Google ScholarDigital Library
- Craig Anslow, John Brosz, Frank Maurer, and Mike Boyes. 2016. Datathons: An Experience Report of Data Hackathons for Data Science Education. In Proceedings of the 47th ACM Technical Symposium on Computing Science Education (SIGCSE '16). ACM, New York, NY, USA, 615--620. Google ScholarDigital Library
- Austin Cory Bart, Dennis Kafura, Clifford A. Shaffer, and Eli Tilevich. 2018. Reconciling the Promise and Pragmatics of Enhancing Computing Pedagogy with Data Science. In Proceedings of the 49th ACM Technical Symposium on Computer Science Education (SIGCSE '18). ACM, New York, NY, USA, 1029--1034. Google ScholarDigital Library
- Austin Cory Bart, Ryan Whitcomb, Dennis Kafura, Clifford A. Shaffer, and Eli Tilevich. 2017. Computing with CORGIS: Diverse, Real-world Datasets for Introductory Computing. In Proceedings of the 2017 ACM SIGCSE Technical Symposium on Computer Science Education (SIGCSE '17). ACM, New York, NY, USA, 57--62. Google ScholarDigital Library
- Nick Bertoni and Scott Keeter. 2018. How to access Pew Research Center survey data. http://www.pewresearch.org/fact-tank/2018/03/ 09/how-to-access-pew-research-center-survey-data/. Accessed: 201809--20.Google Scholar
- Rolf Biehler. 1997. Software for learning and for doing statistics. International Statistical Review 65, 2 (1997), 167--189.Google ScholarCross Ref
- Bootstrap. 2018. Data Science Curriculum (Spring 2018 edition). http://www.bootstrapworld.org/materials/spring2018/courses/ data-science/english/. Accessed: 2018-09-01.Google Scholar
- Robert J. Brunner and Edward J. Kim. 2016. Teaching Data Science. Procedia Comput. Sci. 80, C (June 2016), 1947--1956. Google ScholarDigital Library
- Ricardo Caceffo, Steve Wolfman, Kellogg S. Booth, and Rodolfo Azevedo. 2016. Developing a Computer Science Concept Inventory for Introductory Programming. In Proceedings of the 47th ACM Technical Symposium on Computing Science Education (SIGCSE '16). ACM, New York, NY, USA, 364--369. Google ScholarDigital Library
- Parmit K. Chilana, Rishabh Singh, and Philip J. Guo. 2016. Understanding Conversational Programmers: A Perspective from the Software Industry. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (CHI '16). ACM, New York, NY, USA, 1462--1472. Google ScholarDigital Library
- Juliet M. Corbin and Anselm L. Strauss. 2008. Basics of qualitative research: techniques and procedures for developing grounded theory. SAGE Publications, Inc.Google Scholar
- Sarah Dahlby Albright, Titus H. Klinge, and Samuel A. Rebelsky. 2018. A Functional Approach to Data Science in CS1. In Proceedings of the 49th ACM Technical Symposium on Computer Science Education (SIGCSE '18). ACM, New York, NY, USA, 1035--1040. Google ScholarDigital Library
- Sayamindu Dasgupta and Benjamin Mako Hill. 2017. Scratch Community Blocks: Supporting Children As Data Scientists. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (CHI '17). ACM, New York, NY, USA, 3620--3631. Google ScholarDigital Library
- Brian Dorn and Mark Guzdial. 2006. Graphic Designers Who Program As Informal Computer Science Learners. In Proceedings of the Second International Workshop on Computing Education Research (ICER '06). ACM, New York, NY, USA, 127--134. Google ScholarDigital Library
- Brian Dorn and Mark Guzdial. 2010. Discovering Computing: Perspectives of Web Designers. In Proceedings of the Sixth International Workshop on Computing Education Research (ICER '10). ACM, New York, NY, USA, 23--30. Google ScholarDigital Library
- Brian Dorn and Mark Guzdial. 2010. Learning on the Job: Characterizing the Programming Knowledge and Learning Strategies of Web Designers. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '10). ACM, New York, NY, USA, 703--712. Google ScholarDigital Library
- Mohammed F. Farghally, Kyu Han Koh, Jeremy V. Ernst, and Clifford A. Shaffer. 2017. Towards a Concept Inventory for Algorithm Analysis Topics. In Proceedings of the 2017 ACM SIGCSE Technical Symposium on Computer Science Education (SIGCSE '17). ACM, New York, NY, USA, 207--212. Google ScholarDigital Library
- Leon A Gatys, Alexander S Ecker, and Matthias Bethge. 2016. Image style transfer using convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2414--2423.Google ScholarCross Ref
- Philip J. Guo. 2012. Software Tools to Facilitate Research Programming. Ph.D. Dissertation. Stanford University.Google Scholar
- Mark Guzdial. 2015. Learner-Centered Design of Computing Education: Research on Computing for Everyone. Synthesis Lectures on Human-Centered Informatics 8, 6 (2015), 1--165.Google ScholarCross Ref
- Olaf A. Hall-Holt and Kevin R. Sanft. 2015. Statistics-infused Introduction to Computer Science. In Proceedings of the 46th ACM Technical Symposium on Computer Science Education (SIGCSE '15). ACM, New York, NY, USA, 138--143. Google ScholarDigital Library
- J. Hardin, R. Hoerl, Nicholas J. Horton, D. Nolan, B. Baumer, O. HallHolt, P. Murrell, R. Peng, P. Roback, D. Temple Lang, and M. D. Ward. 2015. Data Science in Statistics Curricula: Preparing Students to ?Think with Data". The American Statistician 69, 4 (2015), 343--353.Google ScholarCross Ref
- Andrew Head, Elena Glassman, Gustavo Soares, Ryo Suzuki, Lucas Figueredo, Loris D'Antoni, and Björn Hartmann. 2017. Writing Reusable Code Feedback at Scale with Mixed-Initiative Program Synthesis. In Proceedings of the Fourth (2017) ACM Conference on Learning @ Scale (L@S '17). ACM, New York, NY, USA, 89--98. Google ScholarDigital Library
- Stephanie C. Hicks and Rafael A. Irizarry. 2017. A Guide to Teaching Data Science. The American Statistician 0, ja (2017), 00--00.Google Scholar
- Suz Hinton. 2017. Lessons from my first year of live coding on Twitch. https://medium.freecodecamp.org/ lessons-from-my-first-year-of-live-coding- on-twitch-41a32e2f41c1.Google Scholar
- Daniela Huppenkothen, Anthony Arendt, David W. Hogg, Karthik Ram, Jacob T. VanderPlas, and Ariel Rokem. 2018. Hack weeks as a model for data science education and collaboration. Proceedings of the National Academy of Sciences (2018). arXiv:http://www.pnas.org/content/early/2018/08/17/1717196115.full.pdfGoogle ScholarCross Ref
- Justin Johnson, Alexandre Alahi, and Li Fei-Fei. 2016. Perceptual losses for real-time style transfer and super-resolution. In European Conference on Computer Vision. Springer, 694--711.Google ScholarCross Ref
- Sean Kandel, Andreas Paepcke, Joseph M. Hellerstein, and Jeffrey Heer. 2012. Enterprise Data Analysis and Visualization: An Interview Study. IEEE Transactions on Visualization and Computer Graphics 18, 12 (Dec. 2012), 2917--2926. Google ScholarDigital Library
- Mary Beth Kery, Amber Horvath, and Brad Myers. 2017. Variolite: Supporting Exploratory Programming by Data Scientists. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (CHI '17). ACM, New York, NY, USA, 1265--1276. Google ScholarDigital Library
- Mary Beth Kery and Brad A. Myers. 2017. Exploring exploratory programming. In 2017 IEEE Symposium on Visual Languages and HumanCentric Computing (VL/HCC). 25--29.Google Scholar
- Miryung Kim, Thomas Zimmermann, Robert DeLine, and Andrew Begel. 2016. The Emerging Role of Data Scientists on Software Development Teams. In Proceedings of the 38th International Conference on Software Engineering (ICSE '16). ACM, New York, NY, USA, 96--107. Google ScholarDigital Library
- Donald E. Knuth. 1984. Literate Programming. Comput. J. 27, 2 (May 1984), 97--111. Google ScholarDigital Library
- Andrew J. Ko, Robin Abraham, Laura Beckwith, Alan Blackwell, Margaret Burnett, Martin Erwig, Chris Scaffidi, Joseph Lawrance, Henry Lieberman, Brad Myers, Mary Beth Rosson, Gregg Rothermel, Mary Shaw, and Susan Wiedenbeck. 2011. The State of the Art in End-user Software Engineering. ACM Comput. Surv. 43, 3, Article 21 (April 2011), 44 pages. Google ScholarDigital Library
- Andrew J. Ko, Brad A. Myers, and Htet Htet Aung. 2004. Six Learning Barriers in End-User Programming Systems. In Proceedings of the 2004 IEEE Symposium on Visual Languages - Human Centric Computing (VLHCC '04). IEEE Computer Society, Washington, DC, USA, 199--206. Google ScholarDigital Library
- J. Lave and E. Wenger. 1991. Situated Learning: Legitimate Peripheral Participation. Cambridge University Press.Google ScholarCross Ref
- Steve Lohr. 2017. Where the STEM Jobs Are (and Where They Aren't). New York Times.Google Scholar
- Geraldine Mason and Annette Jinks. 1994. Examining the role of the practitioner-teacher in nursing. British Journal of Nursing 3, 20 (1994), 1063--1072. arXiv:https://doi.org/10.12968/bjon.1994.3.20.1063 PMID: 7827455.Google ScholarCross Ref
- Justin Matejka and George Fitzmaurice. 2017. Same Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics Through Simulated Annealing. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (CHI '17). ACM, New York, NY, USA, 1290--1294. Google ScholarDigital Library
- Amelia Ahlers McNamara. 2015. Bridging the gap between tools for learning and for doing statistics. Ph.D. Dissertation. UCLA.Google Scholar
- Kevin Mickey. 2013. The best teacher is a practitioner. http://polis. iupui.edu/index.php/the-best-teacher-is-a-practitioner/.Google Scholar
- Lijun Ni. 2011. Building Professional Identity As Computer Science Teachers: Supporting High School Computer Science Teachers Through Reflection and Community Building. Ph.D. Dissertation. Atlanta, GA, USA. Advisor(s) Guzdial, Mark. AAI3500584. Google ScholarDigital Library
- Lijun Ni and Mark Guzdial. 2012. Who AM I?: Understanding High School Computer Science Teachers' Professional Identity. In Proceedings of the 43rd ACM Technical Symposium on Computer Science Education (SIGCSE '12). ACM, New York, NY, USA, 499--504. Google ScholarDigital Library
- NIST.gov. 2018. Engineering statistics handbook: Measures of Skewness and Kurtosis. https://www.itl.nist.gov/div898/handbook/eda/ section3/eda35b.htm. Accessed: 2018-09--20.Google Scholar
- Natasha Noy. 2018. Making it easier to discover datasets. https://www. blog.google/products/search/making-it-easier-discover-datasets/. Accessed: 2018-09--20.Google Scholar
- The University of Michigan. 2018. ICPSR Timeline. https://www.icpsr. umich.edu/icpsrweb/content/about/history/timeline.html. Accessed: 2018-09--20.Google Scholar
- American Association of University Professors. 2018. Professors of Practice. https://www.aaup.org/report/professors-practice. Accessed: 2018-09--20.Google Scholar
- Leo Porter, Mark Guzdial, Charlie McDowell, and Beth Simon. 2013. Success in Introductory Programming: What Works? Commun. ACM 56, 8 (Aug. 2013), 34--36. Google ScholarDigital Library
- Bina Ramamurthy. 2016. A Practical and Sustainable Model for Learning and Teaching Data Science. In Proceedings of the 47th ACM Technical Symposium on Computing Science Education (SIGCSE '16). ACM, New York, NY, USA, 169--174. Google ScholarDigital Library
- Mitchel Resnick, John Maloney, Andrés Monroy-Hernández, Natalie Rusk, Evelyn Eastmond, Karen Brennan, Amon Millner, Eric Rosenbaum, Jay Silver, Brian Silverman, and Yasmin Kafai. 2009. Scratch: Programming for All. Commun. ACM 52, 11 (Nov. 2009), 60--67. Google ScholarDigital Library
- Reudismam Rolim, Gustavo Soares, Loris D'Antoni, Oleksandr Polozov, Sumit Gulwani, Rohit Gheyi, Ryo Suzuki, and Björn Hartmann. 2017. Learning Syntactic Program Transformations from Examples. In Proceedings of the 39th International Conference on Software Engineering (ICSE '17). IEEE Press, Piscataway, NJ, USA, 404--415. Google ScholarDigital Library
- Adam Rule, Aurélien Tabard, and James D. Hollan. 2018. Exploration and Explanation in Computational Notebooks. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI '18). ACM, New York, NY, USA, Article 32, 12 pages. Google ScholarDigital Library
- Judith Segal. 2007. Some Problems of Professional End User Developers. In Proceedings of the IEEE Symposium on Visual Languages and HumanCentric Computing (VLHCC '07). IEEE Computer Society, Washington, DC, USA, 111--118. Google ScholarDigital Library
- Kent Smith. 2013. A Brief History of NCBIâ's Formation and Growth. https://www.ncbi.nlm.nih.gov/books/NBK148949/. Accessed: 2018-0920.Google Scholar
- Sarah L.R. Stevens, Mateusz Kuzak, Carlos Martinez, Aurelia Moser, Petra M. Bleeker, and Marc Galland. 2018. Building a local community of practice in scientific programming for Life Scientists. bioRxiv (2018).Google Scholar
- Allison Elliott Tew and Mark Guzdial. 2011. The FCS1: A Language Independent Assessment of CS1 Knowledge. In Proceedings of the 42Nd ACM Technical Symposium on Computer Science Education (SIGCSE '11). ACM, New York, NY, USA, 111--116. Google ScholarDigital Library
- Rachel Treisman. 2017. Yale to offer new major in data science. http://yaledailynews.com/blog/2017/03/08/ yale-to-offer-new-major-in-data-science/.Google Scholar
- Alexa Vanhooser. 2018. UC Berkeley announces data science pipeline program for students. The Daily Californian.Google Scholar
- Allegra Via, Thomas Blicher, Erik Bongcam-Rudloff, Michelle D. Brazas, Cath Brooksbank, Aidan Budd, Javier De Las Rivas, Jacqueline Dreyer, Pedro L. Fernandes, Celia van Gelder, Joachim Jacob, Rafael C. Jimenez, Jane Loveland, Federico Moran, Nicola Mulder, Tommi Nyronen, Kristian Rother, Maria Victoria Schneider, and Teresa K. Attwood. 2013. Best practices in bioinformatics training for life scientists. Briefings in Bioinformatics 14, 5 (2013), 528--537.Google ScholarCross Ref
- Clifford H Wagner. 1982. Simpson's paradox in real life. The American Statistician 36, 1 (1982), 46--48.Google ScholarCross Ref
- April Y. Wang, Ryan Mitts, Philip J. Guo, and Parmit K. Chilana. 2018. Mismatch of Expectations: How Modern Learning Resources Fail Conversational Programmers. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI '18). ACM, New York, NY, USA, Article 511, 13 pages. Google ScholarDigital Library
- Hadley Wickham. 2014. Tidy Data. Journal of Statistical Software 59, 1 (2014), 1--23.Google ScholarCross Ref
- G. Wilson. 2006. Software Carpentry: Getting Scientists to Write Better Code by Making Them More Productive. Computing in Science Engineering 8, 6 (Nov 2006), 66--69. Google ScholarDigital Library
- Greg Wilson. 2018. End-User Teachers. http://third-bit.com/2018/06/ 20/end-user-teachers.html. Accessed: 2018-09-01.Google Scholar
- Alexey Zagalsky, Joseph Feliciano, Margaret-Anne Storey, Yiyun Zhao, and Weiliang Wang. 2015. The Emergence of GitHub As a Collaborative Platform for Education. In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing (CSCW '15). ACM, New York, NY, USA, 1906--1917. Google ScholarDigital Library
- Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. 2017. Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. In IEEE International Conference on Computer Vision.Google ScholarCross Ref
Index Terms
- Practitioners Teaching Data Science in Industry and Academia: Expectations, Workflows, and Challenges
Recommendations
Equalizing Data Science Curriculum for Computer Science Pupils
Koli Calling '20: Proceedings of the 20th Koli Calling International Conference on Computing Education ResearchData science is a new interdisciplinary field of research that focuses on extracting value from data. As an interdisciplinary science it integrates knowledge and methods from computer science, mathematics and statistics, and the domain knowledge of the ...
Iterative Design of a Socially-Relevant and Engaging Middle School Data Science Unit
SIGCSE 2024: Proceedings of the 55th ACM Technical Symposium on Computer Science Education V. 1Data science education can help broaden participation in computer science (CS) because it provides rich, authentic contexts for students to apply their computing knowledge. Data literacy, particularly among underrepresented students, is critical to ...
New Trends in Teaching Programming in Secondary Education in Slovakia
ITiCSE '17: Proceedings of the 2017 ACM Conference on Innovation and Technology in Computer Science EducationWe describe a pilot research study within the project "Innovative methods in teaching Informatics in secondary education". We introduce some results of a survey regarding the content of the school subject Informatics in lower and upper secondary ...
Comments