1 Introduction
The gold standard for helping learners to acquire and comprehend knowledge is a personal human educator. This is because human educators can recognize learners’ gaps, anticipate needs, suggest suitable learning materials, or provide individualized feedback. However, such individualized learning is only common in small classes, as such learner-centered teaching approaches are hardly scalable to large classrooms (Gupta and Bostrom
2013; Huang et al.
2021). Therefore, individual support and personal recommendations for students in their learning processes remain a pending challenge that has not yet been solved in many contemporary learning scenarios (e.g., Kulik and Fletcher
2016). By
learning scenario, we mean a certain course, class, or lecture which an educator or a machine is giving to students to help the students reach certain learning outcomes (e.g., understanding and applying a programming language). High schools and universities struggle to offer this kind of individual support due to financial and organizational constraints (Seaman et al.
2018). As Winkler et al. (
2021) state,
“the growing number of [large] classroom sizes in high schools and vocational schools, mass lectures at universities with more than 100 students per lecturer, and massive open online courses (MOOCs) with more than 1000 participants make individual interaction with a teacher or tutor even more difficult” (Winkler et al.
2021, p. 1). Several studies have revealed that this lack of individualized support leads to procrastination, low learning outcomes, high dropout rates, and dissatisfaction with the overall learning experience, which can ultimately widen the gap among students (Eom et al.
2006; Brinton et al.
2015; Hone and El Said
2016; Huang et al.
2021).
In response to the lack of individual student guidance, there is a steady growth of technology-mediated learning (TML) information systems in education, such as the learning management systems Canvas or Moodle. These systems both facilitate the traditional teacher route to personalized learning and enable the more learner-centered route, where learners take a more active role in their educational journey (Betts and Rosemann
2022). Extending this concept further, the integration of machine learning and artificial intelligence into these platforms proposes a machine or augmented route that offers unprecedented opportunities for personalized learning experiences at scale (Betts and Rosemann
2022). TML thus represents a solution for providing more individualized learning support and TML systems offer the potential to analyze the students’ learning process and identify potential gaps that can be addressed through individualized learning recommendations (Gupta and Bostrom
2009,
2013). According to Dahlstrom et al. (
2014), 99% of U.S. colleges and universities organize their learning scenarios in a standard learning management system. Taking a TML perspective
, a learning process is a sequence of single learning activities students perform to reach a certain learning goal (e.g., analyzing a historical text, and discussing its meaning with peers to understand the complexity of a certain historical event).
Both the organization of courses in a system and the embedding of exercises in intelligent tutoring systems (i.e., Kulik and Fletcher
2016) or computer-supported collaborative learning tools (i.e., Dillenbourg et al.
2009) are expected to continuously grow to a market size of $336.98 billion by 2026 (Bogarín et al.
2018; Syngene Research LLP
2019; Romero and Ventura
2020). However, a learning management system that only organizes courses does not necessarily provide individualized learning experiences to learners (Huang et al.
2021). It is important to capture learner behavior and preferences while deploying and managing learning materials and activities (Nguyen et al.
2020b; Huang et al.
2021). Canvas or Moodle offer promising opportunities to capture data about learning activities. For example, rich traces are captured in
event logs – a combination of data points based on a timestamp, an event ID, and an activity (e.g., starting or ending event of a particular exercise at a certain time), textual data (e.g., written essays) or measured learning outcomes (e.g., a student’s skill level after taking a quiz) (Nguyen et al.
2020b; Cerezo et al.
2020; Han et al.
2021). This opens promising potentials to discover individual learning activities at different granularity levels, anticipate learner needs, and create individualized learning experiences (Nguyen et al.
2020b; Han et al.
2021).
Even though TML systems in education are omnipresent, the potential of leveraging the created learner data for monitoring, discovering, and enhancing the students’learning has not been tapped (e.g., Nguyen et al.
2020a). Process mining techniques provide an effective way to use learner data to improve education by combining the benefits of learning analytics with process modeling and analysis. For example, process mining can be employed to enhance the identification of students at risk of dropping out or underperforming by analyzing their interaction patterns as learning processes in online learning platforms such as Moodle or Canvas. By examining sequences of activities and their frequencies, educators can pinpoint deviations from successful learning paths and thus receive a more transparent analysis compared to contemporary non-process focused monitoring techniques. Originating as a sub-discipline of data mining, process mining adds a process-oriented perspective to the analysis of data in business processes (van der Aalst
2012; vom Brocke et al.
2021). While usual learning analytics primarily focuses on data dependencies and pattern predictions of single-learner activities (Romero and Ventura
2020), process mining can consider existing event logs beyond a single activity (Bogarín et al.
2018; Söllner et al.
2018; Juhaňák et al.
2019; Cerezo et al.
2020). We refer to this learner-centered process viewpoint to analyze and improve digital learning activities as
educational process mining. Based on the previously mentioned event logs, educational process mining can provide additional benefits for designing and analyzing learning processes based on data by crossing the boundaries of single events or tasks and enriching the analysis from a learner’s process perspective (Söllner et al.
2018; Roth
1970).
While process mining is a well-established field of research and the theoretical benefits of educational process mining appear to be clear, the empirical exploration of applications and their practical implementation is not as straightforward (Ghazal et al.
2018; Rogiers et al.
2020). Thus, our understanding of educational process mining lacks a conceptualization of relevant and varied learning scenarios, including process mining’s implications for the different, relevant stakeholders, including educators, institutions, and learners. This gap highlights the need for a detailed exploration of educational process mining for the different
stakeholders (e.g., educator, institution, or learner), the
technology (e.g., the applied discovery algorithm), the
pedagogical context (e.g., in which educational domain or for which learning objective and task educational process mining is analyzed) and its effect on individual
learning outcomes (e.g., Okoye
2019; Cerezo et al.
2020). An interdisciplinary information systems viewpoint is important to systematically design, analyze, and compare the various configurations of process mining that extends to the particular application domain of education (Sidorova et al.
2008; O’Neill et al.
2011; Matook and Brown
2017; vom Brocke et al.
2021). This research aims to advance the field by aggregating knowledge about the dimensions and characteristics of educational process mining (Nickerson et al.
2013), enabling more effective discovery, monitoring, and enhancement of student learning processes. Our study seeks to address this need by answering the following research question (RQ) on the design and analysis of digital learning processes through process mining:
In a subsequent step, we are interested in equipping information systems researchers and educational practitioners with useful information and tools to translate knowledge on educational process mining into tangible design steps of leveraging digital learning data. According to Schoonderwoerd et al. (
2022), design patterns are proven solutions for recurring problems that make complex domain knowledge accessible and applicable for non-domain experts (e.g., providing educators or educational designs with support to design and analyze digital learning processes). Design patterns have been proven to be a feasible way to communicate design knowledge on IT artifacts in general (e.g., Dickhaut et al.
2022), as well on digital learning processes, specifically (e.g., Weinert et al.
2021). We aim to build upon the design pattern paradigm to address the challenges of designing and analyzing learning process with educational process mining and pose the following, subsequent question:
Consistent aggregation of the characteristics and dimensions of the design and analysis of learning processes based on educational process mining will help researchers and practitioners to systematically use data to discover, monitor, and enhance students’ learning paths. The design patterns, derived from our taxonomy and informed by expert interviews, enable educational designers to enhance learning processes within TML environments. Our research emphasizes the need for stringent data regulation to ensure the ethical use of digital educational data. Overall, our findings contribute to improved design, analysis, and evaluation practices in digital learning.
4 Taxonomy of Digital Learning Process Design and Analyses
In this section, we present the results of our taxonomy development process. We derived a taxonomy represented as a matrix that provides clear classifications of related characteristics and dimensions for designing and analyzing digital learning processes with process mining (see Fig.
6). An important element of our taxonomy and theorizing is that valuable elements may be associated with the design and analysis of student learning processes. TML informs the taxonomy and patterns in the next section, adding an important aspect to the discussion of how a digital learning scenario can be designed and orchestrated.
4.1 Dimension 1: Purpose of Designing and Analyzing Digital Learning Processes
Dimensions referring to the design options of digital learning processes are concerned with the purpose that a digital learning process should achieve. In this light, we distinguish between application focus, learning mode, learning outcome, and learning task.
The
application focus dimension outlines the learning behaviors and preferences that an instructor hopes to reveal and analyze using an educational process mining application.
Learner monitoring refers to situations in which an instructor examines a learning process to acquire insights. An example is Uzir et al.’s (
2020) use of educational process mining to monitor whether learners comprehend the offered learning techniques. In that sense, educational process mining can offer valuable feedback on learner development, oftentimes on an aggregated group level.
Learner evaluation refers to situations in which educational process mining provides information on and assesses individual learning processes (e.g., bad task performance, clarification issues). In that sense, the evaluation looks back on an individual’s past performance or identified issues in the past process to create options for process-based feedback. Lira et al. (
2019) provide an example by investigating process-based feedback during medical training. An application of process mining in the domain of
learner recommendations is presently not addressed by the educational literature but would define a system that provides learners with actionable suggestions on how to proceed with their learning process (e.g., based on scaffolding, see Winkler et al.
2021). Hence, different from a learner evaluation purpose, a learner recommendation looks forward into the digital learning process before it is completed. The purpose is to provide learning-process-integrated recommendations.
The
learning mode dimension outlines the (scope of the) unit of analysis of digital learning processes in more detail. We discovered that application scenarios of digital learning processes that are suitable for educational process mining focus on
individual learning, such as when learners study on their own and simply engage with learning resources (Saint et al.
2020) or
collaborative learning, which occurs when learners interact, collaborate, and coordinate learning activities with other learners (Schoor and Bannert
2012). This distinction is seen as a component of the learning mode dimension. The
learning outcome dimension refers to the knowledge type that should be trained or created while completing various activities of a digital learning process. Therefore, this dimension recognizes that educational process mining is usable for digital learning processes that focus on
factual,
conceptual,
procedural, and
metacognitive knowledge. The characteristics are based on Krathwohl (
2002), who defined factual (knowledge) as knowledge of terminology and precise information (e.g., Cerezo et al.
2020), conceptual (knowledge) as an understanding of theories and models, and procedural (knowledge) as subject-specific skills and techniques (e.g., Lira et al.
2019). Metacognitive (knowledge) refers to both strategic and cognitive knowledge, such as debating abilities (Wambsganss et al.
2021). The
learning task dimension refers to the learning objectives that learners should achieve when following a digital learning process. We employed characteristics influenced by Krathwohl’s (
2002) classifications to differentiate among different difficulty levels of a learning assignment in the form of learning objectives. We used Krathwohl’s concept and differentiated between
remembering, understanding, applying, analyzing, evaluating, and
creating as learning objectives that a learner can attain during a learning process (Krathwohl
2002). We assume that it is easier to capture data for simpler and objectifiable objectives, such as remembering and understanding.
4.2 Dimension 2: Users When Designing and Analyzing Digital Learning Processes
The second set of dimensions refers to design options of digital learning processes and is concerned with the user who will benefit from analyzing a digital learning process. Despite the field’s infancy and the difficulty in precisely distinguishing and identifying the end user for each case at hand, three primary end users may be established. In this light, we consider the intended main end user and the learning context in which the learning process is embedded.
We opted to separate scenarios by the
intended main end user, who will have the most insight into the information that educational process mining will provide. A
learner would, in most circumstances, be the one to (1) generate the data of the learning process to be analyzed, (2) receive individualized reports and insights, and (3) use the data analysis output for adjusting their learning strategy. Having a
student as the intended primary end user implies that the learner may utilize the insights to enhance their learning or course-taking process by receiving insights on their study input quality, learning performance, or study patterns over time (e.g., Cameranesi et al.
2017). The
instructional designer (educator) characteristic refers to an end user who primarily benefits from the insights of analyzed learner data. Instructional designers may use such reports to enhance the instructional design or to intervene in real time to provide students with guided feedback and support them in adapting their learning strategy (e.g., Lira et al.
2019). The primary beneficiary of the
organization (learning institution) characteristic is the provider of the educational environment. Learning institutions include universities, MOOC providers, vocational training schools, and continuous training platforms, among others. On the university level, this category of main end users refers to course coordinators, considering learning behavior across courses, years, and individual learners. On an open educational level, it refers to platform providers of MOOC courses. Process mining can be used to analyze MOOCs to reduce dropout rates, for example (e.g., Rizvi et al.
2018). Analyzing learner data in terms of learner course completion, fulfillment, and dropout behavior has the potential to support such users with valuable insights for their course-taking behavior. For this type of main user, analyzing data on an aggregated level (as opposed to the individual learner or course level) is most suitable and insightful.
Finally, the
learning context explains the learning environment in which digital learning processes will be deployed. We distinguished between
kindergarten–high school (e.g., Gomez et al.
2021),
higher education (Engelmann and Bannert
2019), and
continuous education (Ariouat et al.
2016), which includes workforce training and programs for personal improvement and
vocational training. The learning context can have important implications regarding the type of data available, as well as the legal and ethical treatment of data (i.e., using minors’ data).
4.3 Dimension 3: Data for Analyzing Digital Learning Processes
The third set of dimensions refers to analysis options of digital learning processes and is concerned with the data dimension (i.e., data input and data collection interface). This input dimension refers to the origin and characteristics of the to-be-analyzed data.
The
data input characteristics refer to data prerequisites that need to be created as the basis for further educational process mining-based analysis. Although most applications that analyze digital learning processes use pre-existing event data, such as automatically collecting data through a system, we discovered other data, such as
audio (Nguyen et al.
2021),
video (Lira et al.
2019) or
text (Mittal and Sureka
2014), from which event data can be manually or automatically derived or supplemented.
Image data might potentially be used, although no studies presently use it.
The
data-collecting interface specifies the tools used to collect the event data. Most applications that aim to analyze digital learning processes collect event data by using an
internal system (Juhaňák et al.
2019) or a
MOOC platform (Rizvi et al.
2018),
other web-enabled learning tools (i.e., accessible and distributed through a web browser from any kind of device),
non-web-enabled tools (software accessed as an application on specific devices
, e.g., Doleck et al.
2016) and
automatic and
manual data coding, for example, through the coding of video data, describing the remaining cases.
4.4 Dimension 4: Techniques for Analyzing Digital Learning Processes
The fourth set of dimensions refers to analysis options of digital learning processes and is concerned with analysis techniques and the analysis output format. This dimension differentiates between process mining type, analysis beyond process mining, and output presentation. It covers the methods used to examine digital learning processes.
Although various techniques are possible in this area, we opted to focus on distinguishing between the fundamental
process mining types employed and the type of analysis that goes beyond the standard process mining functions. Only applications that expressly indicated the usage of discovery and conformance were found in terms of process mining categories.
Discovery employs process mining to build a process model by analyzing event log data. An example of this would be creating a process model of a learning process using an event log from a system (e.g., Rogiers et al.
2020). We discovered
conformance in scenarios in which process mining is used to compare the mined model based on the event log to an existing process model. One example is checking adherence to course order suggestions (e.g., Cameranesi et al.
2017). Process
enhancement was not expressly employed in any of the evaluated articles, yet its applications are conceivable, such as extending a learner’s learning process model using information extracted from specific event logs.
Analysis beyond process mining refers to the many sorts of analyses that are done to either precede or extend typical process mining analysis. This excludes supplemental analyses, which are, in essence, independent of the process mining application. Clustering may be used with process mining to separate learners based on characteristics such as grades. Approaches based on rules can be utilized similarly. Furthermore, the classification may be used to forecast learner success based on previously mined processes. Other approaches resemble the combination of unsupervised and supervised learning or time-series analysis. None entails applications that did not employ analysis beyond process mining techniques.
Finally, the output presentation dimension specifies how the process mining findings are displayed to the intended main end user. The fundamental issue here is that nonexpert users require a greater level of abstraction of information to derive useful insights from the data. Beyond the identified model using process mining, the raw model implies no other form of presentation. A graphical presentation defines how findings are shown, for example, by graphs. In numerical presentation, results are displayed in the form of numbers or tables, such as the fitness scores of identified models. Finally, textual presentation outlines instances in which the system converts the knowledge obtained from process mining into readable information, such as suggestions or automatically created reports.
6 Discussion
Educational process mining offers the opportunity to reap the benefits that data in educational scenarios provide (Gupta and Bostrom
2009; Bogarín et al.
2018; Juhaňák et al.
2019; Cerezo et al.
2020). Its potential to enhance the personalization of learning through adaptive feedback and scaffolding underlines the importance of a solid theoretical and practical framework. Our taxonomy, design patterns, and the research agenda outlined in the next section collectively contribute to this need by expanding the knowledge base of design characteristics crucial for embedding process mining in educational contexts. Based on the findings of this study, we discuss theoretical and practical implications and suggest future research avenues for educational process mining.
6.1 Theoretical Contributions
In this study, we conceptualized educational process mining as a new perspective and set of techniques to leverage the potential of learner data in order to improve the individualization of education at scale. This can be done by creating synergies between learning analytics and process mining.
First, our study offers a nuanced understanding of what must be considered when designing and analyzing digital learning processes based on student data in order to discover, monitor, and enhance individual learning based on process mining. We synthesized existing research, including TML, individualization of digital learning, and literature reviews by creating a taxonomy that structured and grouped design characteristics of educational process mining applications (Gupta and Bostrom
2009; Kundisch et al.
2021). Past literature has mostly approached the use of process mining for educational purposes from a technical perspective (e.g., Mouchel et al.
2023, Ludwig et al.
2024, He et al.
2021). Thus, our study contributes to and extends the theoretical foundation of educational process mining by establishing a taxonomy that captures the design characteristics of process mining applications for the specific context of educational scenarios from both a technological and socio-technological viewpoint. Informed by theoretical frameworks in the information systems field (Bostrom and Heinen
1977; Gupta and Bostrom
2009), the taxonomy offers a comprehensive framework to evaluate, design, and compare educational process mining applications in various educational scenarios that become distinguishable through the learning mode or the intended main user, for instance. We uncovered and categorized new dimensions and features that are part of a TML scenario and play a critical role in student learning success beyond the technical perspective of process mining. Specifically, this includes embedding and evaluating algorithmic approaches, as well as incorporating the learner and his or her activities from a pedagogical perspective. These elements are crucial within a pedagogical learning scenario and are instrumental in fostering student learning success. Integrating the TML perspective (Alavi and Leidner
2001), our work offers a holistic view of the role of process mining in education, thereby bridging the gap between technical potential and pedagogical application (Bostrom and Heinen
1977; Gupta and Bostrom
2009).
Second, our study aims to contribute to the understanding of learning analytics and TML in information systems research that could incorporate process mining and improve individual digital learning. Existing literature in the field of TML has largely focused on using digital learning processes for process discovery and conformity approaches (e.g., Rogiers et al.
2020; Cameranesi et al.
2017). Our interview results suggest that current learning scenarios may explain this pattern. Gaps in data traces, for instance, can complicate applying a conformance analysis. Such data traces also provide promising potential for tailor-made individualized course recommendations, well-balanced course allocation for high-quality teaching, or the extraction of new students (Cerezo et al.
2020; Han et al.
2021; Nguyen et al.
2020b).
Third, we used the concept of design patterns to illustrate three use cases of educational process mining. We view patterns as both guidance for data usage and analysis and an illustration of relevant contextual factors to consider (Dickhaut et al.
2023). The three design patterns in this paper suggest data requirements that can enable the realization of respective educational process mining solutions. At the same time, the design patterns also illustrate intended goals and potential challenges to consider in relation to the respective application. We thereby emphasize and challenge the underlying assumptions of effective educational process mining. Understanding and expressing the necessary requirements and potential challenges as well as offering space for solutions is important to build a cumulative body of design knowledge, to show the limits of generalizability, and to prepare the ground for realizing educational process mining applications.
6.2 Practical Contributions
From a practical perspective, our findings on educational process mining enable a more targeted and effective implementation, as well as the analysis and effective use of technology in education. Researchers and practitioners can more effectively design, evaluate, compare, and theorize how different technological embeddings of the young field of process mining impact student learning outcomes in a specific learning scenario and task, thanks to this systematic classification of learning scenarios.
Our taxonomy not only categorizes educational process mining applications but also serves as a practical tool for enhancing the design, delivery, and evaluation of education. It enables educators, administrators, and designers to make informed decisions that directly improve student learning experiences and outcomes. More specifically, the taxonomy aids the identification of patterns and anomalies in student learning behaviors. For instance, by analyzing the process data categorized under different educational scenarios, practitioners can pinpoint areas where students repeatedly encounter difficulties. This identification process is crucial for adapting instructional materials and interventions that directly address these learning gaps. In a similar vein, the taxonomy facilitates the ongoing monitoring of student engagement and progression through their educational activities. By providing a framework to compare students' actual learning paths against optimal process models, educators can intervene in real-time to offer support or adjustments to the course trajectory. This real-time monitoring is particularly beneficial in large-scale learning environments like MOOCs, where individual attention is challenging yet critical for student retention and success.
By enabling the detailed analysis of how different educational processes affect learning outcomes, our taxonomy also guides the design of more effective educational interventions. For example, the taxonomy can help institutions experiment with and refine various teaching methodologies, such as flipped classrooms or blended learning, by providing a structured way to assess their impact on student performance and engagement. The taxonomy thereby supports educational administrators and policy-makers make data-driven decisions about curriculum development, resource allocation, and student support services. By understanding the types of process mining applications that are most effective in various educational contexts, decision-makers can allocate resources more efficiently and develop policies that promote optimal learning environments.
The design patterns of this study are relevant to the educational technology area and associated applications from a practical perspective. The patterns, in combination with the taxonomy for the design and analysis of digital learning processes, serve as a personal guide to studying, designing, and evaluating the individualization of digital learning at scale. We argue that design patterns can provide an actionable space for practitioners to imagine potential use cases of educational process mining for their own scenarios. Particular emphasis is hereby placed on making explicit important underlying assumptions, such as associated data challenges that are fundamental to the effectiveness of educational process mining.
In particular, our three design patterns demonstrate the variety of potential use cases educational process mining offers to TML by improving the individualization of digital learning. Different stakeholders, including instructional designers (design pattern 3), educational organizations (design pattern 1), and individual learners (design pattern 2), can benefit from different interventions. Depending on the user, different types of analyses and output presentations must be considered when deploying process mining for learning analytics. Across the three design patterns, educational process mining offers great opportunities to the higher education domain and individual learning mode, with other levels and modes (e.g., collaborative learning) of education to be further explored as part of future research.
6.3 Limitations and Future Research
This study and its findings should be interpreted in the light of certain limitations. First, the taxonomy and proposed design patterns and usage implications depend on the literature and data we reviewed. Both qualitative and quantitative empirical data about process mining systems in educational contexts are lacking, and obtaining those data is an overall research need. While much of the present research focuses on the theoretical aspects of process mining in education, few actual field assessments of process mining with deployed systems and users exist (e.g., Mouchel et al.
2023). More specifically, contemporary empirical research looks at process mining systems that may not be tied to real deployment environments (Bogarín et al.
2018; Juhaňák et al.
2019; Cerezo et al.
2020).
In this vein, we are aware that learning data alone may be insufficient to provide a comprehensive overview of learning activities (e.g., Baker and Hawn
2021). We acknowledge that other sources of data, such as tasks and assignments that occur outside of the TML through Moodle, for instance, should be considered to paint a more holistic and complete picture of data in a particular learning context. Also, not all learning activities are captured within digital platforms, which poses a limitation to the completeness and representativeness of the data (Dahlstrom et al.
2014). Data quality issues such as incompleteness, inaccuracy, or inconsistency can severely undermine the insights derived from process mining techniques (Baker and Hawn
2021). In educational contexts, the variability in data entry, the reliance on digital platforms for capturing student interactions, and the absence of standardized data collection protocols across different learning management systems further exacerbate these challenges. Consequently, the effectiveness of process mining in unveiling meaningful patterns and supporting pedagogical decisions may be compromised, necessitating robust data preprocessing and validation methods to ensure reliability and validity of the findings.
In a similar fashion, the comparison with other data sources, such as enrollment or systems data, highlights additional future research avenues (e.g., Dahlstrom et al.
2014). Incorporating such external, contextual data – often referred to as digital traces – can significantly enrich the process mining analysis by providing additional context to student behaviors. The integration of these external data sources requires careful consideration of prerequisites, such as data accessibility and privacy concerns, as well as a thorough understanding of the benefits they can bring to enhance the comprehensiveness of the analysis. Also, our study’s applicability is limited by regional differences and variations in digital maturity among educational providers. The digital transformation of education is unevenly distributed, with significant disparities in the adoption of digital technologies and process mining capabilities across regions and institutions. These variations affect not only the availability and quality of data but also the relevance and applicability of process mining solutions. Consequently, our findings may not be universally applicable, necessitating further research that considers these regional differences and seeks to understand how process mining can be adapted to diverse educational contexts and levels of digital maturity.
While process mining offers unique insights into the flow of learning activities, it is but one of several tools available for analyzing educational data. As mentioned earlier, digital course directories and existing learning systems (e.g., for course registrations or student profiles) exist in practice and offer rich data traces in theory. Such systems provide complementary perspectives, capturing different facets of student engagement and academic progress. The reliance solely on process mining might overlook insights that could be derived from these other sources, suggesting a more integrated approach to data analysis that leverages the strengths of each system to provide a holistic understanding of student learning. Hence, with our study, we aim to establish process mining in education as an additional technique that leverages the experience and research on business process management for educational learning processes. Nevertheless, we believe this should be done in combination with other proven learning analytics methods rather than replacing them. While a large body of reviewed literature focuses on generic learning processes, practical training processes and course-taking sequences provide opportunities for additional empirical investigation, especially when coupled with process mining techniques. An interdisciplinary lens that allows for a multiplicity of data analysis tools offers novel research avenues. These include but are not limited to (1) investigating learner knowledge and skill levels, (2) personalizing systematic learning recommendations for students, and (3) considering learning scenarios in continuous and vocational education, such as MOOCs and practical training.
Exploring these research avenues also requires addressing more fundamental questions. While we have considered the potential of educational process mining as a point of departure, it is important to evaluate its appropriateness compared to other analysis methods and tools. These alternatives might be more suitable to address TML scenarios, for example, in terms of cost-efficiency considerations or questions of data availability. While process mining can offer deep insights into student learning processes, the financial and resource implications of adopting such technologies are not trivial. The costs associated with procuring, customizing, and maintaining sophisticated process mining software, alongside the need for training staff to effectively utilize these tools, can be prohibitive, especially for smaller or resource-constrained educational providers. This limitation underscores the importance of future research on conducting comprehensive cost–benefit analyses to ascertain the feasibility and potential return on investment of process mining initiatives in educational settings. It is necessary to mention and highlight ethical challenges that arise when using digital trace data, particularly in the context of education (Hakimi et al.
2021). As part of our reasoning, an important underlying assumption is that students have consented to the use of their data for TML. Informing and obtaining consent for digital trace data is an extensively discussed ethical challenge. According to Johnson (
2019), new personal information might be inferred through already-gathered data. The aggregation and combination of data sources risk de-anonymizing individuals. Similarly, in the context of educational process mining, data might be reused and decontextualized to answer certain questions, thereby potentially developing proxies for certain variables (i.e., gender, race, age) even though those are not explicitly collected. Regulatory considerations also pose a significant limitation to the application of process mining in education. The management and analysis of student data are subject to a complex landscape of privacy laws and regulations, which vary significantly across jurisdictions. Compliance with the General Data Protection Regulation (GDPR) in the European Union, for instance, requires meticulous attention to how student data is collected, processed, and stored. Educational institutions must navigate these regulations carefully, ensuring that process mining activities are conducted in a manner that respects students’ privacy rights and complies with all applicable legal requirements, adding another layer of complexity to the adoption of these technologies. The question of data ownership and consent must be discussed in consideration of the unintended impact educational process mining may have on student learning and social development. Future studies, as well as organizations deploying educational process mining, are required to define privacy notions, such as the extent to which the use of personal data should be accepted by individual learners and the society at large. When turning towards the commercialization of student data, sharing such data with technology vendors also introduces novel issues of inadequate security controls (i.e., Russel et al.
2018). While extant research has explored ethical issues and related solutions for digital trace data in education, research bodies in this realm are fragmented and lack consideration of the education of younger children and informal learning scenarios outside the traditional classroom (Hakimi et al.
2021). This becomes particularly important considering the variety of online learning formats and the increasing availability of various target groups. We urge future research to look further into the ethical challenges of educational process mining and consider the unique affordances of the educational setting when discussing ethical and societal implications.
Table
4 summarizes avenues for future research along our discussions around (1) general questions about the usefulness and use of educational process mining, (2) the analysis of digital learning based on our taxonomy, (3) the use of design patterns for educational process mining, and (4) ethical challenges that can arise in the context of educational process mining.
Table 4
Avenues for future research and related questions
Application of Educational Process Mining | How can we assess the usefulness of educational process mining compared to other tools and methods (e.g., simple queries against existing databases)? How can we ensure data quality and completeness, e.g. when using data sources outside of TML? |
Analysis of Digital Learning | How can learning institutions best identify and leverage existing and potential data traces? How can process mining be used to enable individualized course recommendations, course allocation, and the extraction of new students in real-world applications? How can unique affordances of educational providers, for example, regarding digital maturity and regional requirements, be considered in educational process mining? |
Design Patterns | How can we empirically validate the usefulness of a specific design pattern for the methodological approach of educational process mining? How can we embed educational process mining in contemporary learning scenarios through the application of the three design patterns? How can we empirically validate the usefulness of a specific design pattern for the practical impact for learning scenarios? |
Ethical Challenges | How can user anonymity be assured, especially regarding the aggregation and combination of data sources? How can user data be protected against the commercialization of learning data? How do we protect younger learners’ data and data inferred from learning scenarios outside the traditional classroom? |
7 Conclusion
Educational process mining is a crucial perspective for the design and analysis of digital learning processes as educational institutions strive to provide ever-personalized educational settings for their students and improve educational and administrative processes around course-taking and learning-related behaviors. In this paper, we set out the design and analysis characteristics for digital learning processes through the development of a taxonomy and design patterns on educational process mining applications. This research not only progresses the academic discourse surrounding educational process mining but also serves as a practical guide for its application in large-scale, TML environments.
The implications of our findings are twofold. Firstly, the refined taxonomy and the identified design patterns provide a structured approach that can guide educational designers and practitioners in enhancing digital learning scenarios. This framework helps in systematically leveraging process mining to discover, monitor, and enhance the learning processes of students, thereby promoting more effective and personalized educational experiences. Secondly, our research contributes to a more comprehensive understanding of the challenges and opportunities associated with educational process mining, which includes the need for strict data regulation to ensure ethical practices in handling and analyzing educational data. Overall, our study emphasizes the potential of educational process mining to facilitate significant advancements in digital education by enabling a deeper and more actionable understanding of student learning behaviors and educational processes. As educational technologies and methodologies continue to evolve (e.g., through generative Artificial Intelligence), the insights might play a crucial role in shaping the future of education, ensuring that learning environments are both effective and adaptable to the needs of all students.