Skip to main content

2023 | Buch

Guide to Teaching Data Science

An Interdisciplinary Approach

insite
SUCHEN

Über dieses Buch

Data science is a new field that touches on almost every domain of our lives, and thus it is taught in a variety of environments. Accordingly, the book is suitable for teachers and lecturers in all educational frameworks: K-12, academia and industry.

This book aims at closing a significant gap in the literature on the pedagogy of data science. While there are many articles and white papers dealing with the curriculum of data science (i.e., what to teach?), the pedagogical aspect of the field (i.e., how to teach?) is almost neglected. At the same time, the importance of the pedagogical aspects of data science increases as more and more programs are currently open to a variety of people.

This book provides a variety of pedagogical discussions and specific teaching methods and frameworks, as well as includes exercises, and guidelines related to many data science concepts (e.g., data thinking and the data science workflow), main machine learning algorithms and concepts (e.g., KNN, SVM, Neural Networks, performance metrics, confusion matrix, and biases) and data science professional topics (e.g., ethics, skills and research approach).

Professor Orit Hazzan is a faculty member at the Technion’s Department of Education in Science and Technology since October 2000. Her research focuses on computer science, software engineering and data science education. Within this framework, she studies the cognitive and social processes on the individual, the team and the organization levels, in all kinds of organizations.

Dr. Koby Mike is a Ph.D. graduate from the Technion's Department of Education in Science and Technology under the supervision of Professor Orit Hazzan. He continued his post-doc research on data science education at the Bar-Ilan University, and obtained a B.Sc. and an M.Sc. in Electrical Engineering from Tel Aviv University.

Inhaltsverzeichnis

Frontmatter
Chapter 1. Introduction—What is This Guide About?
Abstract
Data science is a new discipline of research that is gaining growing interest in both industry and academia. As a result, demand is increasing for data science programs for a variety of learners from a variety of disciplines (data science, computer science, statistics, engineering, life science, social science and humanities) and a variety of levels (from school children to academia and industry). While significant efforts are being invested in the development of data science curricula, or in other words, in what to teach, only sporadic discussions focus today on the data science pedagogy, that is, on how to teach. This is the focus of this guide. In the following introduction, we present the motivation for writing this guide (Sect. 1.2), followed by the pedagogical principles we applied in it (Sect. 1.3), its structure (Sect. 1.4), and how it can be used by educators who teach data science in different educational frameworks (Sect. 1.5). Finally, we present several main kinds of learning environments that are appropriate for teaching and learning data science (Sect. 1.6).
Orit Hazzan, Koby Mike

Overview of Data Science and Data Science Education

Frontmatter
Chapter 2. What is Data Science?
Abstract
Although many attempts have been made to define data science, such a definition has not yet been reached. One reason for the difficulty to reach a single, consensus definition for data science is its multifaceted nature: it can be described as a science, as a research method, as a discipline, as a workflow, or as a profession. One single definition just cannot capture this diverse essence of data science. In this chapter, we first take an interdisciplinary perspective and review the background for the development of data science (Sect. 2.1). Then we present data science from several perspectives: data science as a science (Sect. 2.2), data science as a research method (Sect. 2.3), data science as a discipline (Sect. 2.4), data science as a workflow (Sect. 2.5), and data science as a profession (Sect. 2.6). We conclude by highlighting three main characteristics of data science: interdisciplinarity, learner diversity, and its research-oriented nature (Sect. 2.7).
Orit Hazzan, Koby Mike
Chapter 3. Data Science Thinking
Abstract
This chapter highlights the cognitive aspect of data science. It presents a variety of modes of thinking, which are associated with the different components of data science, and describes the contribution of each one to data thinking—the mode of thinking required of data scientists (not only professional ones). Indeed, data science thinking integrates the thinking modes associated with the various disciplines that make up data science. Specifically, computer science contributes computational thinking (Sect. 3.2.1), statistics contributes statistical thinking (Sect. 3.2.2), mathematics adds different ways in which data science concepts can be conceived (Sect. 3.2.3), and each application domain brings with it its thinking skills, core principles, and ethical considerations (Sect. 3.2.4). Finally, based on these thinking modes, which are associated with the components of data science, we present data thinking (Sect. 3.2.5). The definition of data science inspires the message that processes of solving real-life problems using data science methods should not be based only on algorithms and data, but also on the application domain knowledge. In Sect. 3.3 we present a set of exercises that analyze the thinking skills associated with data science.
Orit Hazzan, Koby Mike
Chapter 4. The Birth of a New Discipline: Data Science Education
Abstract
Data science is a young field of research and its associated educational knowledge—data science education—is even younger. As of the time of writing this book, data science education has not yet gained recognition as a distinct field and is mainly discussed in the context of the education of the disciplines that make up data science, i.e., computer science education, statistics education, mathematics education, and the educational fields of the applications domains, such as medical education, business analytics education, or learning analytics. There are, however, voice that call to integrate the relevant knowledge from these educational disciplines, and to form a coherent and integrative data science education body of knowledge, based on which data science programs can be designed. In this chapter, we present the story of the birth of the field of data science education by describing its short history. We focus on the main efforts invested in the design of an undergraduate data science curriculum (Sect. 4.2), and on the main initiatives aimed at tailoring a data science curriculum for school pupils (Sect. 4.3). We also suggest several meta-analysis exercises that examine these efforts (Sect. 4.4).
Orit Hazzan, Koby Mike

Opportunities and Challenges of Data Science Education

Frontmatter
Chapter 5. Opportunities in Data Science Education
Abstract
Data science education opens up multiple new educational opportunities. In this chapter, we elaborate on six such opportunities: teaching STEM in a real-world context (Sect. 5.2), teaching STEM with real-world data (Sect. 5.3), bridging gender gaps in STEM education (Sect. 5.4), teaching twenty-first century skills (Sect. 5.5), interdisciplinary pedagogy (Sect. 5.6), and professional development for teachers (Sect. 5.7). We conclude with an interdisciplinary perspective on the opportunities of data science education (Sect. 5.8).
Orit Hazzan, Koby Mike
Chapter 6. The Interdisciplinarity Challenge
Abstract
In Sect. 2.​4, Data Science as a Discipline, we present the interdisciplinary nature of data science. This interdisciplinary structure is, as can be seen, challenging from an educational perspective (that is, in terms of curricula and pedagogy). In Chap. 4, we discuss the challenge of integrating the application domain into data science education, and in this chapter, we elaborate on the challenges posed by the interdisciplinary structure of data science. First, we describe the unique and complex interdisciplinary structure of data science (Sect. 6.2). Then, we present the challenge of balancing computer science and statistics in data science education (Sect. 6.3), and the challenge of actually integrating the application domain knowledge into data science study programs, courses, and student projects (Sect. 6.4). Although this chapter focuses on the challenges that emerge from the interdisciplinarity of data science, we note that it also presents an opportunity, expressed for example in the closing of gender gaps in STEM education (see Sect. 5.​4 and Chap. 19).
Orit Hazzan, Koby Mike
Chapter 7. The Variety of Data Science Learners
Abstract
Since data science is considered to be an important twenty-first century skill, it should be acquired by everyone—children as well as adults—on a suitable level, to a suitable breadth, and to a suitable depth. And so, after reviewing the importance of data science knowledge for everyone (Sect. 7.1), this chapter reviews the teaching of data science to different populations: K-12 pupils in general (Sect. 7.2) and high school computer science pupils in particular (Sect. 7.3), undergraduate students (Sect. 7.4), graduate students (Sect. 7.5), researchers (Sect. 7.6), data science educators (Sect. 7.7), practitioners in the industry (Sect. 7.8), policy makers (Sect. 7.9), users (Sect. 7.10), and the general public (Sect. 7.11). For each population, we discuss the main purpose of teaching it data science, main concepts that the said population should learn and (in some cases) learning environments and exercises that fit it. In Sect. 7.12, we present several activities about the fitness of difference learning environments for data science teaching to the different populations we discuss in this chapter. In the conclusion (Sect. 7.13) we highlight the concept of diversity in the context of data science.
Orit Hazzan, Koby Mike
Chapter 8. Data Science as a Research Method
Abstract
In this chapter, we focus on the challenges that emerge from the fact that data science is also a research method. First, we describe the essence of the research process that data science inspires (Sect. 8.2). Then, Sect. 8.3 presents examples of cognitive, organizational, and technological skills which are important for coping with the challenge of data science as a research method, and Sect. 8.4 highlights pedagogical methods for coping with it. In the conclusion of this chapter (Sect. 8.5), we review, from an interdisciplinary perspective, the skills required to perform data science research. The discussions about data science skills in this chapter and in Chap. 11 are especially important today due to the increasing awareness that scientists and engineers, in general, and data scientists, in particular, should acquire professional skills, in addition to disciplinary and technical knowledge.
Orit Hazzan, Koby Mike
Chapter 9. The Pedagogical Chasm in Data Science Education
Abstract
As an interdisciplinary discipline, data science poses many challenges for teachers. This chapter presents the story of one of them, specifically of the adoption of a new data science curriculum developed in Israel for high school computer science pupils, by high school computer science teachers. We analyze the adoption process using the diffusion of innovation and the crossing the chasm theories. Accordingly, we first present the diffusion of innovation theory (Sect. 9.1) and the crossing the chasm theory (Sect. 9.2). Then, we present the data science for high school curriculum case study (Sect. 9.3). Data collected from teachers who learned to teach the program reveals that when a new curriculum is adopted, a pedagogical chasm might exist (i.e., a pedagogical challenge that reduces the motivation of the majority of teachers to adopt the curriculum) that slows down the adoption process of the innovation (Sect. 9.4). Finally, we discuss the implications of the pedagogical chasm for data science education (Sect. 9.5).
Orit Hazzan, Koby Mike

Teaching Professional Aspects of Data Science

Frontmatter
Chapter 10. The Data Science Workflow
Abstract
The examination of data science as a workflow is yet another facet of data science. In this chapter we elaborate on the data science workflow from an educational perspective. First, we present several approaches to the data science workflow (Sect. 10.1), following which we elaborate on the pedagogical aspects of the different phases of the workflow: data collection (Sect. 10.2), data preparation (Sect. 10.3), exploratory data analysis (Sect. 10.4), modeling (Sect. 10.5), and communication and action (Sect. 10.6). We conclude with an interdisciplinary perspective on the data science workflow (Sect. 10.7).
Orit Hazzan, Koby Mike
Chapter 11. Professional Skills and Soft Skills in Data Science
Abstract
In this chapter, we highlight skills that are required to deal with data science in a meaningful manner. The chapter describes two kinds of skills: professional skills (Sect. 11.2) and soft skills (Sect. 11.3). Professional skills are specific skills that are needed in order to engage in data science, while soft skills are more general skills that acquire unique importance in the context of data science. In each section, we address both cognitive, organizational, and technological skills. The chapter also offers exercises to practice the skills discussed and it ends with several teaching notes (Sect. 11.4). The discussion about data science skills is especially important today due to the increasing awareness of the fact that scientists and engineers in general, and data scientists in particular, should acquire professional and soft skills, in addition to disciplinary and technical knowledge.
Orit Hazzan, Koby Mike
Chapter 12. Social and Ethical Issues of Data Science
Abstract
The teaching of social issues related to data science should be given special attention regardless of the framework or level at which data science is taught. This assertion is derived from the fact that data science (a) is relevant for many aspects of our lives (such as health, education, social life, and transportation); (b) can be applied in harmful ways (even without explicit intention); and (c) involves ethical considerations derived from the application domain. Of the many possible social topics whose teaching might have been discussed in this chapter, we focus on data science ethics (Sect. 12.2). We also present teaching methods that are especially appropriate for the teaching of social issues of data science (Sect. 12.3). Throughout the chapter, we highlight the social perspective, which in turn further emphasizes the interdisciplinarity of data science.
Orit Hazzan, Koby Mike

Machine Learning Education

Frontmatter
Chapter 13. The Pedagogical Challenge of Machine Learning Education
Abstract
Machine learning (ML) is the essence of the modeling phase of the data science workflow. In this chapter, we focus on the pedagogical challenges of teaching ML to various populations. We first describe the terms white box and black box in the context of ML education (Sect. 13.2). Next, we describe the pedagogical challenge with respect to different learner populations including data science major students as well as non-major students (Sect. 13.3). Then, we present three framework remarks for teaching ML (regarding statistical thinking, interdisciplinary projects, and the application domain knowledge), which, despite not being mentioned frequently in this part of the book, are important to be kept in mind in ML teaching processes (Sect. 13.4). We conclude this chapter by highlighting the importance of ML education in the context of the application domain (Sect. 13.5).
Orit Hazzan, Koby Mike
Chapter 14. Core Concepts of Machine Learning
Abstract
In this chapter, we focus on the teaching of several core concepts that are common to many machine learning (ML) algorithms (such as hyper-parameter tuning) and, as such, are essential learning goals in themselves, regardless of the ML algorithms. Specifically, we discuss types of ML (Sect. 14.2), ML parameters and hyperparameters (Sect. 14.3), model training, validation, and testing (Sect. 14.4), ML performance indicators (Sect. 14.5), bias and variance (Sect. 14.6), model complexity (Sect. 14.7), overfitting and underfitting (Sect. 14.8), loss function optimization and the gradient descent algorithm (Sect. 14.9), and regularization (Sect. 14.10). We conclude this chapter by emphasizing what ML core concepts should be discussed in the context of the application domain (Sect. 14.11).
Orit Hazzan, Koby Mike
Chapter 15. Machine Learning Algorithms
Abstract
In this chapter, we describe the teaching of several machine learning (ML) algorithms that are commonly taught in introduction to ML courses, and analyze them from a pedagogical perspective. The algorithms we discuss are the K-nearest neighbors (KNN) (Sect. 15.2), decision trees (Sect. 15.3), Perceptron (Sect. 15.4), linear regression (Sect. 15.5), logistic regression (Sect. 15.6), and neural networks (Sect. 15.7). Finally, we discuss interrelations between the interdisciplinarity of data science and the teaching of ML algorithms (Sect. 15.8).
Orit Hazzan, Koby Mike
Chapter 16. Teaching Methods for Machine Learning
Abstract
In this chapter, we review four teaching methods for machine learning: visualization (Sect. 16.2), hands-on tasks (Sect. 16.3), programming tasks (Sect. 16.4), and project-based learning (Sect. 16.5). When relevant, as part of the presentation of these pedagogical tools, we analyze them from the perspective of the process-object duality theory and the reduction of abstraction theory.
Orit Hazzan, Koby Mike

Frameworks for Teaching Data Science

Frontmatter
Chapter 17. Data Science for Managers and Policymakers
Abstract
In this chapter, we focus on the first component of the MERge model—management. In line with the MERge model as a professional development framework, we show how managers and policymakers (on all levels) can use data science in their decision-making processes. We describe a workshop for policy makers that focuses on the integration of data science into education systems for policy, governance, and operational purposes (Sect. 17.2). The messages conveyed in this chapter can be applied in other systems and organizations in all sectors—governmental (the first sector), for-profit organizations (the second sector), and non-profit organizations (the third sector). We conclude with an interdisciplinary perspective on data science for managers and policymakers (Sect. 17.3).
Orit Hazzan, Koby Mike
Chapter 18. Data Science Teacher Preparation: The “Method for Teaching Data Science” Course
Abstract
In this chapter, we focus on the second component of the MERge model, namely education. We present a detailed description of the Method for Teaching Data Science (MTDS) course that we designed and taught to prospective computer science teachers at our institution, the Technion—Israel Institute of Technology. Since our goal in this chapter is to encourage the implementation and teaching of the MTDS course in different frameworks, we provide the readership with as many details as possible about the course, including the course environment (Sect. 18.2), the course design (Sect. 18.3), the learning targets and structure of the course (Sect. 18.4), the grading policy and assignments (Sect. 18.5), teaching principles we employed in the course (Sect. 18.6), and a detailed description of two of the course lessons (Sect. 18.7). Full, detailed descriptions of all 13 course lessons are available on our Data Science Education website. We hope that this detailed presentation partially closes the pedagogical chasm presented in Chap. 9.
Orit Hazzan, Koby Mike
Chapter 19. Data Science for Social Science and Digital Humanities Research
Abstract
In this chapter and in Chap. 20, we focus on the third component of the MERge model—research, and describe two data science teaching frameworks for researchers: this chapter addresses researchers in social science and digital humanities; Chap. 20 addresses researchers in science and engineering. Following a discussion of the relevancy of data science for social science and digital humanities researchers (Sect. 19.2), we describe a data science bootcamp designed for researchers in those areas (Sect. 19.3). Then, we present the curriculum of a year-long specialization program in data science for graduate psychology students that was developed based on this bootcamp (Sect. 19.4). Finally, we discuss the data science teaching frameworks for researchers in social science and digital humanities from motivational perspectives (Sect. 19.5) and conclude by illuminating the importance of an interdisciplinary approach in designing data science curricula for application domain specialists (Sect. 19.6).
Orit Hazzan, Koby Mike
Chapter 20. Data Science for Research on Human Aspects of Science and Engineering
Abstract
In this chapter and in Chap. 19, we focus on the third component of the MERge model—research, and describe two data science teaching frameworks for researchers: Chap. 19 addresses researchers in social science and digital humanities; this chapter addresses science and engineering researchers and discusses how to teach data science methods to science and engineering graduate students to assist them in conducting research on human aspects of science and engineering. In most cases, these target populations, unlike the community of social scientists (discussed in Chap. 19), have the required background in computer science, mathematics, and statistics, and need to be exposed to the human aspects of science and engineering which, in many cases, are not included in scientific and engineering study programs. We start with the presentation of possible human-related science and engineering topics for investigation (Sect. 20.2). Then, we describe a workshop for science and engineering graduate students that can be facilitated in a hybrid format, combining synchronous (online or face to face) and asynchronous meetings (Sect. 20.3). We conclude with an interdisciplinary perspective of data science for research on human aspects of science and engineering (Sect. 20.4).
Orit Hazzan, Koby Mike
Backmatter
Metadaten
Titel
Guide to Teaching Data Science
verfasst von
Orit Hazzan
Koby Mike
Copyright-Jahr
2023
Electronic ISBN
978-3-031-24758-3
Print ISBN
978-3-031-24757-6
DOI
https://doi.org/10.1007/978-3-031-24758-3

Premium Partner