Top

Software and Systems Modeling

Published in:

Open Access 13-10-2022 | Regular Paper

Modeling difficulties in creating conceptual data models

Multimodal studies on individual modeling processes

Authors: Kristina Rosenthal, Stefan Strecker, Monique Snoeck

Published in: Software and Systems Modeling | Issue 3/2023

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Patentsearch

Off

Abstract

Conceptual modeling is a learning task essential to students of computer science, software engineering and related programs. Construed as a complex task, surprisingly little is known about the actual act of conceptual modeling, and about modeling difficulties learners experience. Combining complementary modes of observation of learners’ modeling processes, we study modeling difficulties encountered while performing a data modeling task. Using the concept of cognitive breakdown, we analyze audiovisual protocols of the individual modelers’ modeling processes, recordings of their interactions with the employed modeling software tool and survey data of modelers about their perception of encountered modeling difficulties. In an exploratory study and a follow-up study, we identify eight types of modeling difficulties related to modeling entity types, generalization hierarchies, relationship types, attributes and cardinalities. The identified types of modeling difficulties contribute to a better and more complete understanding of data modeling processes intended to inform design science research on modeling assistance for data modelers at different stages of their learning and mastering of conceptual data modeling.

Communicated by Timothy Lethbridge.

This article extends and revises a previously published conference article on the exploratory study [57].

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

1 Introduction

Conceptual modeling is a recurring activity during software engineering pursuing purposeful reconstructions of statements about a domain of discourse using a modeling language, e. g., for data or process modeling [18, 29, 33, 34, 37, 80]. Thence, conceptual modeling is a learning task faced by most students of computer science, software engineering and related programs as it is mandated by curricula standards, e. g., by the joint standard curriculum for Information Systems of the Association for Computing Machinery (ACM) and the Association for Information Systems (AIS) [2]. As an activity, conceptual modeling—e. g., when constructing a data model as entity–relationship diagram [23]—involves an intricate array of cognitive processes and performed actions including goal setting, abstracting, conceptualizing, associating and contextualizing, interpreting and sense-making, evaluating and judging, anticipating and envisioning and thinking ahead, drawing and visualizing and, in group settings, communicating, discussing and agreeing [57, 58]—marking the actual act and process of conceptual modeling. Creating a conceptual model is, hence, construed as a complex task involving codified and tacit knowledge [54, 74], a task that requires mastering theoretical foundations, modeling languages, modeling methods and modeling tools, applying them to practical problems and, along the way, critically thinking and reflecting upon the application domain in terms of its technical languages [35, 62].

Despite its relevance, the process of conceptual modeling has for long received limited attention in conceptual modeling research, with human factors and cognitive aspects having received only little attention [73]. Only recently, processes of conceptual modeling have seen increasing interest from researchers (e. g., [13, 25, 39, 52, 64, 84]). How conceptual data modeling is performed by modelers, how modeling processes proceed, which modeling difficulties modelers encounter and why and how to overcome these difficulties has been subject to studies on the cognitive processes and performed actions constituting conceptual modeling (e. g., [6, 11, 21, 68, 74]). However, at large, surprisingly little is known about the actual act of conceptual data modeling, about the reasoning of modelers and their deliberations (e. g., about modeling decisions), and whether different (idealized) types of modelers can be identified, e. g., by identifying patterns of modelers’ modeling difficulties, and whether these modeler types benefit from modeling assistance tailored to overcome their specific difficulties.

In the presented research, we integrate complementary modes of observation to study modeling difficulties in sixteen modeling processes of learning modelers following a mixed methods research design [26, 75]. We pursue the research objective of identifying and classifying modeling difficulties these modelers face while constructing a conceptual data model based on a natural language task description using a modeling software tool. The present research focuses on a learning context in which learners of data modeling are expected to model a universe of discourse as described in a textual description of a modeling tasks—a common learning scenario, e.g., in higher education courses on data modeling. Hence, the main focus is on modelers’ capability of conceptualizing a domain—the modeling dialogue between modelers and domain experts is excluded from the present study (cf. e.g., [38, 39]). In our pursuit of the research objective, we operate on the basic assumption that modelers’ individual modeling processes demand study from multiple complementary perspectives to account for the richness and complexity of the task and individual process of conceptual modeling—following Berger and Luckmann’s inspiring insight that “the object of thought becomes progressively clearer with this accumulation of different perspectives on it” [14, p. 10]. We use the concept of cognitive breakdown [11, 47] to identify modeling difficulties in verbal protocols (think aloud protocols, see [30]) and complement difficulty identification by visually inspecting recordings of modeler–tool interactions as well as video recordings of individuals’ modeling processes. We then complement difficulty identification by surveying the individuals about performing the modeling task. Here, we approach difficulty identification in two studies: An exploratory study with eight non-experienced modelers identifying five types of modeling difficulties relating to different aspects of constructing conceptual data models, i.e., entity types, relationship types, attributes and cardinalities (reported in [57]), and a follow-up study with eight medium-experienced modelers extending the findings by three further types of modeling difficulties.

Viewed as a process, conceptual modeling is a cognitively challenging undertaking bound to natural language and to (an) artificial modeling language. It is for this reason that we follow a mixed methods research design with multimodal observations of individual modeling processes to account for the intricacies of the process of conceptual modeling: Specifically, analyzing think aloud protocols has shown promising results for understanding cognitive processes of subjects working on problem-solving tasks in general and on modeling tasks in particular [6, 68], and we consider verbalization of thoughts as the best available means of expression for achieving insights into modelers’ reasoning as our spoken language provides a rich and flexible tool to express our thinking. However, to ask subjects to think aloud is a second-best approach, warranted only because it is not possible to directly access and capture cognitive processes and, thus, modeler reasoning while modeling. Modelers may have difficulties verbalizing their reasoning while modeling [15] on principle accounts (because verbalizing own thoughts can be difficult) or on modeling-related accounts (e. g., because of the difficulty of finding the right words to express oneself). Nonetheless, among all possible alternative modes of observation, think aloud verbalization promises the richest insight into non-directly observable cognitive processes of individual modeling processes. However, modeling difficulties will not always be observable from verbal protocols alone but from interactions of modelers with the software tool, with pen and paper or simply from modeler movements, e. g., erratic changes between looking at the graphical editor on screen and the modeling task provided on paper. Hence, multimodality of observations is assumed to provide a more complete picture of the phenomenon under investigation [75, 76]: Complementing different modes of observation is a research strategy common to mixed methods research designs [26]. We purposefully complement these different modes of observation on modeling processes to allow us to identify a wide range of modeling difficulties by enabling us to recognize and identify cognitive breakdowns more precisely than through a single mode of observation, and to allow us to obtain insights into the causes for cognitive breakdown.

The (meta) objective above the primary research objective of identifying and classifying modeling difficulties modelers face is to inform design science research on developing (tool) assistance for conceptual modelers at different stages of their learning and mastering of conceptual modeling aimed at mitigating modeling difficulties: By identifying modeling difficulties and by developing a taxonomy of such difficulties [36, 48] over the course of multiple studies, the present research aims to contribute to establishing a theoretical foundation for developing assistance for conceptual modelers tailored to help them to overcome their specific modeling difficulties.

After introducing the theoretical background (Sect. 2), related work is described (Sect. 3). Next, the mixed methods research design with the multimodal observation setup and the study conduct of the follow-up study—in comparison with the exploratory study reported in [57]—is explained in Sect. 4. Section 5 presents the findings, followed by a discussion of the findings and future research directions (Sect. 6). The paper closes with a discussion of limitations (Sect. 7) and a conclusion (Sect. 8).

2 Conceptual modeling as cognitively complex task

Prior research has conceptualized conceptual modeling as ill-structured problem solving [6, 52, 68]: A modeling task (e. g., a data or business process modeling task) does not imply a clear path to a conceptual model (e. g., a data or process model)—similar to ill-structured problems where a problem representation does not imply a clear path to a solution of the problem [47, 55]. Rather, a modeling task starts from a problem representation in textual form (using natural language) and/or graphical and other visual forms and requires the application of modeling concepts of a modeling language to create a conceptual model by purposefully reconstructing the problem representation by means of modeling concepts of the chosen modeling language. The aspired artifact as result of this problem-solving process is the conceptual model.

A cognitive view on conceptual modeling processes as problem-solving processes allows us to better understand modeling processes in terms of modeling difficulties and underlying cognitive processes: Cognitive Load Theory (CLT) models cognitive resources of humans and how these resources are used in problem solving and learning [69]. At its core, CLT differentiates between a limited working memory and a comparatively unlimited long-term memory [50]. Following CLT, humans are assumed to possess a limited cognitive capacity in performing complex tasks as the capacity of the working memory, i.e., humans’ cognitive resources, is limited at a given time. Three types of cognitive load can be distinguished: (1) intrinsic load, i.e., the inherent difficulty determined by the complexity of the problem solving task; (2) extraneous load, i.e., the extra load originating from the problem representation; and (3) germane load, i.e., the load needed to relate information with long-term memory [13, 50, 52]. Hence, for conceptual modeling, a modeling process is influenced by modeling task characteristics referring to intrinsic load, characteristics of the task’s representation including tool and modeling language characteristics referring to extraneous load as well as the modeler’s individual cognition that refers to germane load [19, 52, 66].

If the overall cognitive load on a subject performing a problem-solving task overstrains the subject’s cognitive resources, difficulties are likely to occur—potentially leading to a cognitive overload [69]. Problem solving research stipulates three general “processes” or “components” involved in problem solving [42]: search, recognition and inference with search and recognition aimed at handling information of rather low complexity. As conceptual modeling usually requires several complex cognitive processes referring to making inferences—beyond search and recognition which imply handling information of rather low complexity [52]—creating conceptual models constitutes a complex problem-solving task that is assumed to lead to cognitive difficulties [3, 11, 20, 52].

To identify the difficulties which modelers experience while constructing a conceptual data model, we rely on the concept of cognitive breakdown [11, 47]. Following problem-solving research [47] and prior work on cognitive difficulties in problem-solving processes [11, 78], we conceptualize a cognitive breakdown as a difficulty a modeler experiences when constructing a conceptual model based on a natural language description [11]—a situation when the overall cognitive load overstrains a subject’s cognitive resources: ”when a line of thought fails” [20]. In their foundational work [47], Newell and Simon suggest that a cognitive breakdown during problem solving leads a subject to return to the problem presentation or, if unable to overcome the difficulty, to abandon the problem. Hence, a cognitive breakdown manifests itself in a modeler explicitly verbalizing a difficulty while modeling or in interrupting or terminating a modeling activity, e. g., a modeling activity which is not completed but instead the modeler switches to another activity [11].

This section provides an overview of main strands of related research, i.e., related prior work investigating individual (data) modeling processes as well as prior research focusing on difficulties and errors in conceptual (data) modeling.

Early contributions investigate data modeling processes: Batra et al. [7, 8] report a laboratory study that compares conceptual data models constructed by students using the relational model and the extended entity–relationship (EER) model. The data models are evaluated in terms of correctness with regard to modeling entity types and relationship types, compared to a solution developed by the researchers. Regarding modeling difficulties, the study suggests that difficulties do not primarily occur in modeling entity types but in modeling relationship types, leading to the conclusion that the complexity in data modeling is mainly related to relationship types. Specifically, the study indicates that the number of modeling difficulties in creating a relationship type increases with the degree of a relationship type. In their study on similarities and differences between non-experienced and experienced modelers [6], Batra and Davis derive a process model of conceptual data modeling from analyzing verbal protocols which distinguishes three distinct levels of abstraction, e. g., the enterprise level, the recognition level and the representation level, as well as the iterations between the levels. The process model is then used to identify similarities and difference between expert and novice modelers. It is concluded that experts focus on developing a holistic comprehension of the problem, whereas novices are largely unable to integrate parts of the problem description resulting in more errors in their models. A further study by Srinivasan and Te’eni targets the behavior of modelers while data modeling [68]. Considering conceptual data modeling processes as problem-solving processes, the study reports on two laboratory studies using think aloud protocols. The research design focuses on the problem representation and problem-solving heuristics, i.e., strategies for controlling cognitive activities, applied by the modelers to overcome cognitive limitations. First, a cognitive model of data modeling is developed including problem representation, cognitive activities, heuristics and constraints on the effectiveness and efficiency of the cognitive activities as well as their interdependencies. Analyzing the verbal protocols based on the cognitive model leads to insights into how individuals use heuristics to control complexity in their modeling processes suggesting that modelers would benefit from support in moving across levels of abstraction. Studying resulting data models, Shanks investigates differences between expert and novice modelers along several dimensions [65]. Evaluating the conceptual data models involving experienced reviewers in terms of correctness, completeness, innovation, flexibility, understandability and overall quality leads to the insight that the data models constructed by expert modelers are more correct, complete, innovative, flexible and better understood in comparison with data models constructed by novice modelers. Research by Hoppenbrouwers et al. builds on communication theory and identifies modeling strategies exhibited by individuals based on linguistics analyses—viewing conceptual modeling as a dialogue and coining the term “modeling dialogue” [38, 39]. Further work investigates cognitive mechanisms of conceptual business process modeling with a focus on collaborative modeling [82‐84], proposing relational reasoning and abstraction as key cognitive processes in modeling and suggesting a method for analyzing collaborative process modeling behavior, aimed at generating insights into psychological mechanisms of modeling skills and related cognitive processes [84]. Another stream of related research investigating the process of process modeling identifies distinct modeling styles [52] as well as cognitive process modeling techniques applied by modelers [24]—resulting in the so-called Structured Process Modeling Theory (SPMT) aimed at explaining how the probability of an occurrence of cognitive overload in process modeling processes can be reduced [24].

Complementing this prior research on individual modeling processes, research has investigated the frequency of certain error types in conceptual modeling for a long time. In an early work, Batra and Antony investigate errors of novice data modelers in two laboratory experiments [5]. The study evaluates errors in novices’ data models complemented with analyzing the modeling processes to achieve insights into why the errors have been committed. The errors committed by the novice modelers are classified in three categories referring to the cognitive dimension of the errors: (i) literal translation, i. e., a modeler “mechanically translating” a natural language sentence into a relationship type, (ii) anchoring, i. e., a modeler staying with an initial—but incorrect—starting point (anchored to an initial assumption) and (iii) incomplete knowledge, i. e., a modeler failing in the attempt to improving a model that is based on an initial incorrect assumption. In the conducted experiment, the three categories of errors are observed with approximately equal frequency. Please note that the study by Batra and Antony only considers errors relating to relationship types. As main causes of the errors, the complexity of the modeling task in terms of the number of possible relationship types increasing at a combinatorial rate with the number of entity types, misapplication of modeling heuristics and a lack of knowledge about database design are identified—leading to suggestions for supporting novices in data modeling with immediate feedback in supportive tools. A further related study by Leung and Bolloju investigates typical errors committed by novice systems analysts in creating domain models using the UML [43]. Based on Lindland et al.’s framework relating to quality in conceptual models and modeling processes [44], the errors are distinguished into different categories related to syntactic, semantic and pragmatic quality. In analyzing the class diagrams, the most common errors are observed in determining cardinalities for associations—relating to the semantic quality—and the presence of unexpected features (i. e., creating unnecessary entities or attributes)—relating to pragmatic quality. In [41], typical error types in creating class diagrams in a specific notation based on the UML made by novice learners are identified by Kayama et al. Four categories of errors are detected, i. e., syntactic errors, errors related to attributes, related to associations and related to classes. As a result, it is observed that errors related to associations constitute the most common type of errors made by novice learners with these errors referring to the associations’ labels and cardinalities. A recent study by Bogdanova and Snoeck investigates frequent errors in students creating domain models using UML class diagram [16]. Consistent with the findings of previous work [41, 43], the study finds that errors relating to modeling classes are less numerous than errors in modeling associations. Regarding classes, especially wrongly named classes and missing classes are observed while the most common error regarding associations is observed in determining multiplicities. A further study by Bogdanova and Snoeck [17] presents insights into novices’ errors in creating class diagrams in the MERODE modeling notation [17]. Again, in line with previous research, errors relating to modeling associations are identified as the most frequently occurring errors besides missing classes. Moreover, it is concluded that the most common errors relate to the novice modelers misinterpreting requirements during requirements analysis.

Different from earlier work, the objective of the present research is to identify and classify modeling difficulties in data modeling processes following a cognitive view on conceptual modeling using the notion of cognitive breakdown. Furthermore, the present research design stands out from extant studies by following a mixed methods research approach that combines multiple perspectives on modeling processes.

4 Research design

4.1 Mixed methods research design

We approach our objective in an exploratory study with eight non-experienced modelers that is complemented with a follow-up study with eight medium-experienced modelers conforming and extending the findings of the exploratory study reported in [57]. Both studies follow a mixed methods research design [26] characterized by “mix[ing] or combin[ing] quantitative and qualitative research techniques, methods, approaches, concepts or language into a single study” [40, p. 17]. The chosen mixed methods research design (see Fig. 1 for an overview) is intended to compensate the respective weaknesses, e. g., when restricting the observations to modeler–tool interactions neglecting the reasoning of modelers, associated with the prospect of insights going beyond results from either type of data separately [26, 40]. The present work builds on a mixed methods research design in the light of two considerations: First, due to the complexity of conceptual modeling, modeling processes deserve study from multiple complementary perspectives—a mixed methods design allows to integrate these perspectives. Second, the investigated phenomenon, i.e., modeling difficulties and underlying cognitive processes in individual data modeling processes, has received only limited attention so far in research on conceptual modeling [59, 73]. We apply an original multimodal observation setup and an accompanying data analysis strategy combining multiple perspectives on modeling processes, in which open (narratives, verbal protocols, video recordings) and closed, (more) standardized (tracking data, survey data) modes of observation are combined to obtain a more complete picture of the phenomenon under investigation [26] (described in detail in [60]). Opting for a mixed methods research design in the present study pursues the objective of diversity of views (following, e. g., [76, p. 442]). In line with this purpose, a convergent research design with concurrent data collection with all data provided by the same data sources (subjects) in a data-transformation variant allowing a merge of the data bases to analyze the data together is applied [26, pp. 65-73]. Thereby, the research design has a clear focus on qualitative data analysis standing in the tradition of hermeneutic research aiming at an in-depth understanding (e. g., [1, 79]). The research design includes two points of integration, i.e., the integration of quantitative and qualitative data, one during the observations and one during data analysis [61, pp. 115-117] (see Fig. 1).

In both the exploratory and the follow-up study, subjects are observed constructing a conceptual data model using a variant of the entity–relationship model (ER model) [23]. The ER model specifies a modeling language for data modeling widely accepted as de facto standard for conceptual data modeling [28]. The used variant of the ER model aims at simplifying the learning process for modelers. For example, it does not allow for attributes of relationship types. Starting from a data modeling task described in natural language, the subjects are instructed to construct a conceptual data model (as ER diagram) reconstructing the statements of the problem representation using a browser-based modeling tool [70, 72]. The modeling tool has a standard interface including a palette of notation symbols representing the implemented variant of the ER model. Moving symbols from the palette to the modeling canvas follows the drag and drop interaction mode. The modeling tool is designed to be easy to learn for beginning modelers as well as easy to explain in a few minutes. In addition, the modeling tools offers an ad-hoc syntax validation to assist learning modelers. Furthermore, the modeling tool is integrated with a modeling research observatory supporting multimodal observations and analysis of the collected data [70, 72].

4.2 Multimodal observations

The multimodal observations constitute the first point of integration, i.e., all types of data are collected from the same sources (i.e., from the same subjects) concurrently [61, pp. 114f]. With the aim to go beyond approaches solely considering modeler–tool interactions, complementary modes of observation are combined to take different perspectives on the modeling processes (see Fig. 1 and Fig. 2)—complementing and tying in with prior approaches to investigating individual modeling processes [53, 63, 84]:

(a) Recording verbal protocols: Recording think aloud (verbal) protocols during conceptual modeling by a subject, i. e., while working on a modeling task, aims to obtain insights into the subject’s modeling difficulties and underlying cognitive processes during modeling, e. g., into the modeler’s reasoning and deliberations toward modeling decision. This mode of observation is chosen because its application in problem-solving research has shown promising results [30, 67]. Subjects are instructed to verbalize all their thoughts while modeling. The subjects’ utterances while modeling are audiotaped.

(b) Videotaping modelers: This mode of observation targets the modeler’s overall interaction with written material and the software tool for modeling by videotaping the modeler from an ’over-the-shoulder’ perspective. The aim is to capture non-verbal cues on the individual modeling process, in particular on modeling difficulties, conveyed by body language and movement, e. g., when switching between media from computer screen to paper and back or when interrupting the modeling flow as indicated by gestures. Modelers may peruse the written material to draw an initial model before interacting with the software tool—the modeler’s behavior outside of the modeling tool can support resolving ambiguous situations in think aloud protocols (e. g., [85]).

(c) Tracking modeler–tool interactions: Recording modeler interactions with the modeling canvas is aimed at observing the modeler’s modeling decisions, in particular, decisions with respect to placing a new model element on the modeling canvas (e. g., a rectangular graphical symbol representing an entity type, say “Customer,” in an entity–relationship diagram), to change an existing model element, to element repositioning, to deletion of model elements and to renaming a model element (giving it a new label). Every modeler interaction with the graphical editor during the construction of the conceptual model is recorded as a time discrete event (see [70, 72]). This mode of observation is supported by the modeling observatory integrated with the modeling tool with which the subjects construct the conceptual data model (see [70, 72]).

(d) Surveying modelers pre- and post-modeling: Subjects fill in a survey comprising closed-ended and open-ended questions before and after modeling. The aim is to collect information on modeler demographics and to obtain self-disclosed information on modeling experience, perceived modeling difficulties, the perceived familiarity and difficulties with the domain of the modeling task and a self-assessment regarding think aloud. This information is aimed at achieving an overview of the sample of subjects and to identify peculiarities and outliers.

4.3 Exploratory study

The exploratory study was conducted in January 2019 with eight bachelor and master students of business informatics or business administration at the University of Hagen, Germany with little to no experience in conceptual data modeling. The subjects participated individually and can be characterized as non-experienced modelers. Choosing students as subjects is in line with the educational context of the present research. Students are learners who can provide in-depth information about modeling difficulties that are experienced during data modeling. Participation was voluntary and the subjects were offered no other incentives than the opportunity to participate in the study. Study conduct including the study material and data analysis in the exploratory study are reported in detail in [57]. In the following, the follow-up study is reported and elements in the study conduct and data analysis that are different from the exploratory study are made explicit.

4.4 Follow-up study

4.4.1 Study conduct

The follow-up study was conducted in February 2020 with eight subjects participating individually. We recruited eight bachelor and master students who took courses on data modeling or data(base) management at KU Leuven, Belgium—subjects we expect to have some experience in conceptual data modeling and who we would characterize as medium-experienced modelers. As in the exploratory study, choosing students as subjects is a deliberate choice in line with the context and objectives of the present research. Participation was voluntary and the subjects received 10 Euros as monetary incentive for participating in the study. As in the exploratory study, the sample size of eight is selected in the light of applying think aloud protocol analysis aimed at in-depth insights [49]. The data modeling task employed for the follow-up study (see Appendix) refers to a car rental case (Car, Rental, Customer etc.). We assume participating subjects have sufficient knowledge about renting cars and chose this domain for the modeling task in order to reduce effects of varying prior domain knowledge (cf. [12, 55]). A reason for choosing the car rental task instead of the modeling task from the library domain used in the exploratory study is a prior exposure of participating subjects to a library modeling exercise in a completed course during their studies. Hence, participants in the follow-up study have some prior experience in working on a modeling task that refers to lending items. The task performed by the modelers is deliberately designed to balance demand on the participating subjects, time to perform the task and modeling complexity. Compared to the exploratory study, the modeling task additionally requires applying the modeling concepts of recursive relationship types and generalization hierarchies. No particular domain knowledge is required to perform the task and a medium-experienced modeler shall be able to complete the modeling in no more than 30 minutes.

Based on experiences in the exploratory study, adjustments were made to the data collection procedure in terms of the sequence of steps performed, i. e., the data collection in the follow-up study only begins after an introductory part. In addition, adjustments were made to the used study material, e. g., the description of the semantics of the modeling concepts and the graphical notation of the ER model was complemented by introducing the modeling concepts recursive relationship type, integrity constraint and generalization hierarchy. The standardized data collection procedure for the follow-up study is displayed in Fig. 3. Each modeling session took place in a quiet environment, with the first author present as observer who is familiar with the modeling task and tool as well as the multimodal observations.

To foster comparability, we ran a standardized data collection procedure for all participants: After completing a consent form, we started with an introductory part: Each individual received a short description of the semantics of the modeling concepts and the graphical notation of the variant of the ER model implemented in the used modeling tool (1). This was followed by watching a short video introduction into the used modeling tool of ca. 3 min that explains how to create a conceptual data model with the tool (2). Please note that this was the only introduction that the participants received for operating the modeling tool. As next step, the subjects were provided think aloud instructions as in the exploratory study (3) and were asked to construct a conceptual data model comprising two entity types and one relationship type from the university domain in a warm-up modeling task (4)—to become familiar with the modeling tool and to practice verbalizing their thoughts while modeling. This was followed by the data collection part: As in the exploratory study, each individual was required to (5) fill in a pre-modeling survey asking closed-ended and open-ended questions on prior conceptual (data) modeling experience as well as closed-ended questions on domain knowledge. Again, the questionnaire included a test with six yes/no-type questions on theoretical knowledge of conceptual data modeling with the ER model. As next step, the subjects were given the natural language main modeling task (6) referring to a car rental (see Appendix). The instructions for performing the modeling task and the data collection during modeling were similar to the exploratory study. The participants were requested to let the observer know when they had finished the task. The observer terminated the modeling at a convenient moment after about 30 minutes. As last step, each participant was required to (7) fill in a post-modeling survey on encountered modeling difficulties, domain knowledge and difficulties with think aloud. In contrast to the exploratory study, the post-modeling questionnaire asked for difficulties with think aloud in an open-ended question instead of a self-assessment in closed-ended questions—with the aim to achieve deeper insights into the subjects’ encountered difficulties with verbalizing their thoughts while modeling. In the follow-up study, all material was in English as well as the verbalizations of the modelers. The questionnaires can be found in Appendix. The supplementary material used in the study is available upon request from the authors.

4.4.2 Data analysis

Data analysis in the follow-up study largely follows the data analysis strategy employed in the exploratory study (explained in [57]) aimed at integrating the data collected in the multimodal observations. As first step, information on open- and closed-ended questions from the pre- and post-modeling surveys are integrated into a description characterizing the sample of subjects (discussed as “qualitizing” in literature on mixed methods research [9, 76]). For analyzing the individual modeling processes as second step, the audiovisual protocols of the modeling processes are analyzed using MAXQDA [77] which allows for coding of integrated audio and video segments (see Fig. 4 for a screenshot from a pre-test). As coding strategy for the follow-up study, we start with the coding scheme developed in the exploratory study (see [57]) as starting point (deductive coding)—to build on the preliminary classification of modeling difficulties achieved in the exploratory study and to refine and extend the intermediary findings. We mark segments in which the subject encounters a difficulty or an obstacle, i.e., when the subject explicitly verbalizes a difficulty experienced during modeling or when the subject interrupts or terminates a modeling activity, as one of the existent types of modeling difficulties in the coding scheme or complement a further type of difficulties if needed (inductive coding)—in order to iteratively develop and refine the modeling difficulties inducing the observed breakdowns as sub codes for the code “Cognitive breakdown.” The same applies to codes and sub codes generally anticipated in think aloud protocols [67] that are also open for refinements and extensions during coding in the follow-up study. Table 1 presents the entire coding scheme for analyzing the follow-up study.

Table 1

Final coding scheme for coding the audiovisual protocols after the follow-up study. Codes marked with one asterisk (*) are codes which emerged during the analysis of the exploratory study (cf. [57]) while codes marked with two asterisks (**) are codes which were complemented during the analysis of the follow-up study

Category	Cognitive breakdowns	General codes
(Sub) Codes	Cognitive breakdown	Actions outside of the modeling tool
	–Differentiate between entity types*	–Reading the modeling task*
	–Choose data type of attribute*	–Marking the modeling task*
	–Decide between entity type and relationship type*	–Paper-based modeling*
	–Develop label for relationship type*	Non-task-related issues
	–Determine cardinalities*	–Modeling tool*
	–Decide between attribute and entity type**	–Think aloud*
	–Specify generalization hierarchy**	Evaluation of the task at meta-level
	–Establish relationship type**	Silent periods

As in the exploratory study, evaluating and interpreting the combined audiovisual protocols is complemented with an analysis of the recorded modeler–tool interactions: In case of an unclear or deviant situation, the segment is submitted to closer inspection by analyzing the recorded modeler–tool interactions in the respective time period to better understand the observed situation, and to decide on assigning a code. Vice versa, anomalous data in the recorded modeler–tool interactions are further investigated through analyzing the audiovisual protocols.

For analyzing modeler–tool interactions, the interactions in the specific time frame are stepwise visually replayed as performed by the modeler—in a single or multiple replay—and, hence, visually analyzed (Fig. 5 and Fig. 6). Each modeler–tool interaction is also plotted in a dot diagram as a time discrete event on a timeline as horizontal axis (see Fig. 7) to allow for quick inspection of the type of change of the data model. The dot diagrams visualizing modeling processes have been further developed since the exploratory study: For the consecutively numbered model elements, the dot diagrams now visually differentiate five types of specific modeling decisions, i.e., if a model element is added (green circle), changed (blue circle), moved (gray), deleted (red circle) or relabeled (orange circle). Extending the analysis of modeler–tool interactions in the exploratory study and thus refining the data analysis strategy, we further complement analyzing modeler–tool interactions with heatmaps of mouse pointer position dwell time and mouse clicks (Fig. 8). Analyzing data on mouse pointer position and clicks is based on the assumption that the modeler’s gaze follows mouse activities [22, 56]. Hence, heatmaps allow to identify spatial areas of the modeling canvas in focus of the modeler (e. g., indicated by uncontrolled mouse clicking) suggesting peculiar situations requiring further inspection. The diagrams visualizing the modeling processes are used for further exploring situations identified as deviant or unclear in the audiovisual protocols and for identifying anomalous modeler–tool interactions by manual inspection of the diagrams (e. g., searching for a noticeable number of deleting model elements or changing one model element strikingly frequent).

Supplementing the analysis of audiovisual protocols with timed modeler–tool interactions allowed us to identify unclear situations, e. g., when a modeling process strongly deviates from the other displayed modeling processes. In addition, reviewing the answers to the post-modeling survey about perceived modeling difficulties supplements difficulty identification. This coding step proved valuable especially as the perceived difficulties served as indication for closer inspecting and deciding on assigning a code in the audiovisual protocols.

4.4.3 Participant characteristics

Of the eight participants in the follow-up study (P9–P16), four subjects were female and four male with an age ranging between 20 and 32 (with a median of 22,5 years and a mean of 23,9 years). As first language, three subjects stated Dutch and one subject each stated Arabic, Czech, Spanish, Russian and Yoruba. Seven subjects were master students, two of Business and Information Systems Engineering and one each of Bioinformatics, Biology, Business, Data Science and Information Management as well as one subject a bachelor student of Business and Information Engineering. Regarding professional background, five subjects stated work experience of one years and six months to 10 years with a median of 3 years. The professional experience was acquired in the application areas of administration, consulting, IT, as well as the pharmaceutical industry and testing engineering and, hence, in a wide range of application areas.

All eight subjects had completed one to three courses during their studies that teach conceptual modeling (with a median of 2) with seven subjects who attended an introductory course on “Principles of Database Management” that includes data modeling with the ER model. Further completed courses relate to, for example, information systems modeling (i.a., including process modeling with the Business Process Model and Notation, BPMN) or business information systems (i.a., including modeling with the UML). In the test on theoretical knowledge of conceptual data modeling with the ER model, the number of correct answers to the six yes/no-type questions ranged from four to five with a median of five. All subjects explained prior experience in conceptual data modeling ranging from four months with constructing five conceptual data models and reading ten models to four years and six month experience with constructing 50 conceptual data models and reading 200. The median of experience in conceptual data modeling is one year and six month with a median of constructing 17,5 conceptual data models and reading 42,5 models. In this sample of subjects, P10 constitutes an exceptional case with more than four years of experience with constructing 50 conceptual data models and reading 200. This outlier is included in further analyses, demanding a special consideration. Altogether, it is indicated that the subjects besides the outlier have some experience in conceptual data modeling with substantial differences of knowledge and, particularly, experience in conceptual modeling between the subjects—suiting the intention to study subjects which can be characterized as medium-experienced modelers.

With regard to domain knowledge, five subjects stated to have rented a car at least once, ranging from not at all to often with a median of rarely (on a scale not at all–rarely–sometimes–often–very often). But all subjects stated that they basically understand the service that a car rental company offers.

Table 2 gives an overview of the participants of both the exploratory and the follow-up study as basis for the presentation of findings in Sect. 5. Please note that the university where we recruited the students for the exploratory study is characterized by a heterogeneous student body where the reported ages are not exceptional.

Table 2

Overview of participants in the exploratory and the follow-up study (Gender: male/female/diverse, age presented in years)

	Gender	Age	First language
Exploratory Study
P1	f	46	German
P2	m	36	German
P3	f	52	English
P4	f	42	French and German
P5	m	41	German
P6	m	46	German
P7	m	27	German
P8	m	34	German
Follow-up Study
P9	m	32	Yoruba
P10	m	23	Czech
P11	m	27	Spanish
P12	m	25	Arabic
P13	f	22	Dutch
P14	f	21	Dutch
P15	f	21	Russian
P16	f	20	Dutch

5 Findings

In this section, we first report on the modeling processes in the follow-up study in terms of peculiarities, domain knowledge of the participants and observations regarding verbalization skills. Subsequently, the types of modeling difficulties inducing cognitive breakdowns in the modeling processes of both studies are introduced and exemplified, refining and extending the results presented in [57].

5.1 Modeling processes in the follow-up study

Regarding the length of the modeling processes, we observe a range from 21 minutes to 35 minutes with a median of 32 minutes in the follow-up study. Note that P12 terminated the modeling session after about 21 minutes accidentally in the modeling tool, and that due to a software error of the modeling observatory the modeling process of P16 was terminated after 22 minutes.

In the post-modeling questionnaire, participants were asked to self-assess the statements “I understood what the modeling task was about” and “I am familiar with the domain of the modeling task” on a scale from 1 to 7 where 1 corresponds to “I do not agree at all” and 7 to “I agree entirely.” Regarding the first statement, the answers ranged from 6 to 7 with a median of 6,5. For the second statement, the answers ranged from 4 to 7 with a median of 5,5. Hence, it is indicated that the participants understood the chosen modeling domain well enough to perform the task.

Regarding verbalization skills, our observations reinforce that there is wide variation in how well subjects are able to verbalize their thoughts while working on the modeling task. The post-modeling questionnaire asked for difficulties with think aloud with an open-ended question—instead of a closed-ended question in the exploratory study. Only two of the eight participants self-assessed to have encountered difficulties in verbalizing thoughts with one subject explicitly referring to difficulties “to come up with the right vocabulary” (P16) because English is not the subject’s first language and the other stating that he/she prefers putting the own “thoughts on paper before saying them out loud” (P12). Despite six participants who reported no problems and although all eight subjects received think aloud instructions and a think aloud training (see Sect. 4), we observe two participants (P9, P11) exhibiting problems in verbalizing their thoughts: In parts, the participants describe what they are doing rather than verbalizing thoughts. In addition, the modeling processes of these same participants include two silent periods of more than 30 seconds each. However, as in the exploratory study, we conclude that, altogether, the provided think aloud instructions and training initiated the intended behavior.

5.2 Modeling difficulties

We observe cognitive breakdowns as indication for modeling difficulties in 15 of 16 modeling processes. In the exploratory study, modeling difficulties occur in seven of eight modeling processes with a wide range of numbers of breakdowns, ranging from zero to six observed breakdowns. However, only three of the eight participants explain they encountered difficulties in the post-modeling questionnaire. Note that Participant 8 in the exploratory study marks an exceptional case exhibiting no breakdowns during the modeling process and constructing a straightforward solution to the modeling task in only nine minutes—P8 is an outlier regarding prior modeling experience in the exploratory study with several years of experience (see [57] for more details). This observation confirms the deliberate design of the modeling task as demanding for modelers with little to no experience, but solvable in a straightforward manner for more experienced modelers. In the follow-up study, we observe cognitive breakdowns in all eight modeling processes, with numbers of breakdowns ranging from two to eight. Participant 10 as an outlier in terms of prior modeling experience in the follow-up study does not exhibit a modeling process that is recognizably different from those of the other participants—confirming the decision to include the outlier in the analysis. Five of the eight participants also explain to have encountered difficulties in the post-modeling questionnaire.

The observed breakdowns in both studies are categorized into eight types of modeling difficulties inducing the breakdowns—the types were developed during the coding of the audiovisual protocols as emerging and refined sub codes in the coding scheme (see Sect. 4). The types of difficulties relate to different aspects of constructing conceptual data models, i.e., entity types, generalization hierarchies, relationship types, attributes and cardinalities (e. g., [28]). In the exploratory study, we identify five types of modeling difficulties. These are complemented with three additional types of modeling difficulties that have been observed only in the follow-up study. Re-watching and re-analyzing the data of the exploratory study confirmed that these three new types of difficulties do not occur in the modeling processes of the non-experienced modelers. An overview of the lengths of the modeling processes, the overall numbers of breakdowns and the numbers specified by type of modeling difficulties inducing the breakdown in each modeling process in the exploratory study is presented in Table 3 and for the follow-up study in Table 4. Table 5 displays the total numbers of occurrences of all types of modeling difficulties in both studies combined. In the following, each type of modeling difficulties is explained and exemplified by providing transcribed examples from the think aloud protocols of the follow-up study. For examples from the exploratory study, please see [57].

Table 3

Completion times (in minutes), types and numbers of breakdowns in the exploratory study

Participant	P1	P2	P3	P4	P5	P6	P7	P8	#
Completion time	23	20	35	18	15	17	35	9	172
Breakdowns	3	2	6	3	2	3	6	0	25
Differentiate between entity types			1						1
Choose data type of attribute					1		1		2
Decide between entity type and relationship type	2		3	2					7
Develop label for relationship type	1	2	2			1	2		8
Determine cardinalities				1	1	2	3		7

Table 4

Completion times (in minutes), types and numbers of breakdowns in the follow-up study. Types of difficulties marked with an asterisk (*) emerged only in the follow-up study

Participant	P9	P10	P11	P12	P13	P14	P15	P16	#
Completion time	32	22	32	21	33	32	35	22	229
Breakdowns	3	3	3	2	5	4	8	2	30
Differentiate between entity types									0
Choose data type of attribute			1		1		2	1	5
Decide between entity type and relationship type				1	1	1			3
Develop label for relationship type		1				1	1	1	4
Determine cardinalities	1	2		1	2	1	3		10
Decide between attribute and entity type*	2						2		4
Specify generalization hierarchy*			1		1	1			3
Establish relationship type*			1						1

Table 5

Overview of types of modeling difficulties in both studies combined with total numbers of occurrences

Type of difficulties	#
Differentiate between entity types	1
Choose data type of attribute	7
Decide between entity type and relationship type	10
Develop label for relationship type	12
Determine cardinalities	17
Decide between attribute and entity type	4
Specify generalization hierarchy	3
Establish relationship type	1
Overall number of observed modeling difficulties	55

Differentiate between entity types: This type of difficulties occurs in the exploratory study only with one participant (P3) encountering a difficulty related to creating entity types. The participant terminates the modeling activity by switching to another one. However, this type of difficulties is observed only once.

Choose data type for attribute: Encountered by two participants in the exploratory study (P5, P7) and four participants in the follow-up study (P11, P13, P15, P16) with a total of seven occurrences, this type of difficulties relates to attributes of entity types (note that the chosen variant of the ER model and its notation does not allow for attributes of relationship types, to simplify the learning process for modeling beginners). The participants face the difficulty of choosing a data type for an attribute that is adequate in the context of the modeling task. A full list of predefined data types was included in the instructions and available to subjects throughout the entire modeling process. An example for this type of difficulties is exhibited in the modeling process of P13 regarding the attribute IBAN of the entity type IndividualCustomer: “IBAN number... That is a... integer? Yes, that’s not a string it’s an integer... No, it’s not a number, it’s more a string... than an integer I guess... An integer is just more like numbers that you can count, and string is just more general” (P13). Participant 16 also faces a difficulty of this type in choosing a suitable data type for the attribute Range of the entity type Car: “And the range... um... um... I am not sure... um... I will just make it an integer?” (P16).

Decide between entity type and relationship type: We observe three participants (P1, P3, P4) in the exploratory study and three participants (P12, P13, P14) in the follow-up study facing difficulties related to modeling decisions as to whether to model an entity type or a relationship type to reconstruct a given statement of the problem representation (with 10 occurrences). In each study, all difficulties of this type refer to one and the same entity type. In the follow-up study, this is the entity type Rental. For example, P14 encounters this difficulty: “Um.. a rental... wait... hmm... a rental always refers to exactly one car... um... Yes, i don’t think I have to do something with this... no... I think it’s already embedded in the relationship rents. And then for a rental the date of the rental... oh... and the due date are recorded in order to be able... oh... ok... so... um...” (P14). The subject interrupts the modeling activity and switches to modeling another relationship type, before returning to the modeling activity and solving the problem: “I think that I should create another entity type called rental with attribute type... date of rental which is a date data type and then... hmm... due date which is also a date... and then I... delete the previous relationship type I made called rents because it is not longer relevant, yes” (P14). Difficulties of this type cause long and severe periods of uncertainty in the modeling processes (of up to more than 4 minutes) and, in this sense, are particularly remarkable—especially because this type of difficulties occurs ten times in six modeling processes and because two of the respective participants were unable to find a solution to this difficulty.

Develop label for relationship type: Difficulties of this type occur in five modeling processes in the exploratory study (P1, P2, P3, P6, P7) and three in the follow-up study (P10, P14, P15), referring to a modeler who creates a relationship type and encounters a problem with finding a descriptive and sensible label for the model element. This type also includes difficulties relating to problems in developing suitable role designators when modeling a recursive relationship type. Participant 14 encounters such a difficulty in labeling the relationship type between the entity types RentalLocation and Employee: “So i put a relationship type between rental location and employee... and I call it... um... assigned... um... how do I call it... hmm... um... employees because... um.. a rental location... employees but no... not employees... um... so rental location... so an employee works... yes works at a certain rental location” (P14). Participant 15 as further example faces a difficulty in developing a sensible label for the relationship type between the entity types Rental and RentalLocation: “So, I create the relationship type and call it... um... hmm... I am not sure how to call it... so... hmm.... has like has... but it’s not nice... [...] ok, maybe car rental is assigned to location, let it be like that” (P15). Furthermore, we observe a difficulty in modeling the recursive relationship type supervises in one modeling process when a participant (P16) encounters a difficulty in developing suitable role designators: “And as role designation... no, role designator... I will name this... um... supervisee... which means that you have a supervisor... I think... or hold on... let me think... um... so this is not really clear for me... so I will make this supervisor... because in the notation sheet.. um.. it states like this... so I will just take this...” (P16). This type of difficulties related to labeling relationship types is remarkable as it constitutes the second most frequent type of difficulties in terms of the total number of occurrences (12) and number of participants concerned (9).

Determine cardinalities: We identify difficulties with regard to determining cardinalities for relationship types in four modeling processes (P4, P5, P6, P7) in the exploratory study. This difficulty reappears in six processes in the follow-up study (P9, P10, P12, P13, P14, P15) with a total of 17 occurrences—constituting the most frequent type of difficulties in both studies combined, regarding the total number of occurrences and number of participants concerned (10). In the exploratory study, it is remarkable that five of the seven occurrences of this type of difficulties pertain to a relationship type with a one-to-many cardinality [57]. In the follow-up study, it is noticeable that this type of difficulties is primarily encountered for the two relationship types specified between the entity types Car, Rental and Customer, both relationship types with a one-to-many cardinality. For P13, determining the cardinalities for the relationship type between the entity types Rental and Car causes a severe difficulty and period of uncertainty (over 3 minutes): “And a... car rental... any number of cars... hmm... a car belongs to... zero... minimum zero... because not always a car has to belong to a rental or maximum... many... because a car can be rent multiple times... but not at the same time... hmm... a customer... a rental always refers to exactly one car... a rental consists of one car but a car... no.. it is the other way around... a car belongs to one rental... rental... hmm... customers of the car rental can rent any number of cars... or maybe... no... um.. think, think, think... a rental consists of one car... um... no, you can... customers of the car rental can rent any number of cars... so, it is the other way around...” (P13). Participant 15 also encounters a difficulty of this type, again for the relationship type between the entity types Rental and Car: “But cars... hmm... can have... I think... um... now I am a little bit confused because I don’t know whether we are talking about... no, I think yes.. it is ok... so cars itself, they can have.. hmm... so, if the company just purchases this car, then this car may participate in zero rentals because it was just bought, but we should already insert information about this car in our database. So it’s from zero to infinitely many.... For now I leave it like that, I am not sure whether it is correct, but I hope so” (P15). As further example, for P14, determining the cardinalities for the relationship type between the entity types Rental and Customer results in a difficulty: “A rental... involves exactly one customer, I have to check.. um... oh no... customers of the rental can rent any number of cars... so, I assume that it’s also at one point in time that a customer can rent several cars, so a rental involves one to many customers... and a customer is involved in... no, no, no, no .. a rental involves exactly one customer, because a rental is about one car” (P14).

Decide between attribute and entity type: Relating to modeling decisions as to whether to model an attribute or an entity type to reconstruct a statement in the modeling task description, this type of difficulties is faced by two participants in the follow-up study (P9, P15) with four occurrences. Participant 15, for example, faces a difficulty of this type relating to the attribute TypeDesignation of the entity type Car: “Yes, each car is described by a car type... I think... yes... hmm.... from my experience I remember that car types, they were like different entities. But for now I think it would be just an attribute of car. So, car type. I don’t know...” (P15). A further difficulty of this type is encountered regarding the entity type RentalLocation: “So, the next one is car rental and car rental has an attribute of location... maybe... um, for now I leave it, um, is an... oh, no... I think that... um, location, location, cars can be rented, a name to each location, employees... oh, no, I think that car rental does not have a location attribute, location will be the next entity type” (P15).

Specify generalization hierarchy: This type of difficulties relates to specifying generalization relationships between entity types and, hence, constitutes a type that only occurs in the follow-up study, faced by three participants (P11, P13, P14) with three occurrences. Participant 11 faces difficulties relating to the generalized entity type Customer with the sub types IndividualCustomer and CorporateCustomer: “What I am trying to figure out is... hmm... how to model this generalization because the customer can be individual or corporate ... but... corporate do not have a last name... hmm... I am trying to be more efficient, but do not want to restrict it or repeat attributes...”. Remarkably, the difficulty causes a long period of uncertainty in the modeling process of P11 (about 4 minutes) including a silent period of about 30 seconds with the participant finally solving the problem: “Ok, now I know how” (P11). Participant 14 also faces a difficulty relating to this generalization relationship: “Um.. this generalization is.... um... hmm... I think that it is... hmm... total because... um.. every customer is or an individual customer or a corporate customer. It cannot be something else” (P14).

Establish relationship types: This type of difficulties encountered by one participant (P11) in the follow-up study refers to establishing relationship types between the entity types Rental, Customer and Car and results in an uncertainty how to model the given statements: “Hmm... I am thinking how to put the relationships between these entities and the rental agreement” (P11). However, the participant finally solves the problem and this type of difficulties is observed only once.

6 Discussion

Integrating complementary modes of observation of individual data modeling processes of 16 learning modelers and an analysis using the concept of cognitive breakdowns leads us to identify a wide variety of types of modeling difficulties these subjects faced while modeling. In an exploratory study with eight modelers with little to no modeling experience, we identify five types of modeling difficulties, complemented with three further types of difficulties encountered in a follow-up study with eight medium-experienced modelers. In the following, we discuss our findings in both studies and fruitful paths for future research, particularly implications for design science research on developing modeling assistance for data modelers at different stages of their learning and mastering of conceptual modeling.

Our findings suggest that the majority of difficulties encountered by the participants in the modeling processes relate to modeling relationship types: In the exploratory study, difficulties of the types Develop label for relationship type, Determine cardinalities and Decide between entity type and relationship type constitute the most frequent types of difficulties by far. Findings of the follow-up study confirm that difficulties are mainly experienced in modeling relationship types (difficulties of the types Develop label for relationship type, Determine cardinalities, Decide between entity type and relationship type, Establish relationship type) with the type Determine cardinalities being the most frequent one. Combining the findings of both studies (see Table 5), it appears that cognitive load overstraining a modeler’s cognitive resources can be observed most frequently in modeling relationship types: More specifically, cognitive overload leading to the most frequent and severe difficulties for the observed modelers occurs in deciding whether to model a relationship type, how to label a relationship type and which cardinalities are sensible for a relationship type.

These observations are in line with prior work on cognitive complexity in data modeling [4, 7, 8] suggesting that modeling problems are not experienced mainly in modeling entity types and attributes, but in modeling relationship types—originating in an extensive load on the modeler’s working memory due to the combinatorial rate with which the number of possible relationship types increases with the number of entity types in a model. Moreover, our findings are consistent with results from prior research investigating the frequency of certain error types in domain modeling: Kayama et al. also identify errors related to associations as the most common type of errors made by novice learners, specifically referring to the associations’ labels and cardinalities [41]. The present observations, in addition, tie in with the recent work by Bogdanova and Snoeck that finds errors relating to modeling associations are the most common types of errors [16, 17] with determining multiplicities to be a specifically frequent error [16]. Furthermore, while our observations confirm Leung and Bolloju’s insight that common errors relate to modeling cardinalities for associations, their study does not identify developing labels to be a frequent error [43]. However, the prior work on error types mainly focuses on domain modeling with the UML [16, 41, 43] and MERODE [17] while the present work targets conceptual data modeling with a variant of the ER model.

Altogether, the present research reinforces the finding that beginning and medium-experienced modelers especially face problems in modeling relationship types when creating a conceptual data model. Our results further specify this finding to the extent that the majority of relationship type-related difficulties evolve around determining cardinalities and developing sensible labels for relationship types. Specifically the difficulty of devising sensible labels for relationship types refers to the core of conceptual (data) modeling and, hence, requires careful attention: Meaningful and purposeful labels for model elements are a prerequisite for comprehensible and usable conceptual models—conveying semantics important to sensible interpretation of a conceptual model by human viewers [10, 27, 45]. Different labels bring up different cognitive association, and thus, it makes a significant difference how a model element—in this case, a relationship type—is labeled. Targeting this type of modeling difficulties requires to assist modelers in developing labels that are sensible with respect to a natural language description of a universe of discourse and its technical terminology, to the chosen modeling objectives and to the model’s intended application context, with the aim to support “identifier comprehension” [8]. While the medium-experienced modelers have more experience in conceptual data modeling and specific prior experience in working on a modeling task on lending items (cf. Sect. 4), the difficulties relating to cardinalities and labels for relationship types remain important in the follow-up study. This observation indicates that the difficulties are not easy to solve and, furthermore, cannot be attributed to modeling experience as a modeler characteristic referring to germane load for the observed modelers. Hence, a fruitful path for future research lies in exploring further modeler characteristics—beside modeling experience—as well as modeling task characteristics referring to intrinsic load and characteristics of the modeling task’s representation referring to extraneous load on the modelers. For example, besides varying modeling tasks and task representations, further studies could expand the surveying of participants, including more detailed questions on the modeler’s studies of conceptual modeling and domain knowledge—in order to shed light on varying modeler characteristics besides modeling experience.

In the follow-up study, the second and third most frequent type of difficulties encountered by the medium-experienced modelers relate to modeling entity types and attributes, i.e., difficulties of the types Choose data type of attribute and Decide between attribute and entity type. Hence, we conclude that—although the faced difficulties primarily relate to modeling relationship types—the cognitive load induced by modeling entity types and attributes also has the potential to overstrain the cognitive resources of medium-experienced modelers. This is surprising as it is contrary to prior work stating that difficulties in data modeling are not mainly experienced in modeling entity types [4, 73]. In addition, it is remarkable that only a small number of difficulties encountered in the follow-up study relates to the additional modeling concepts of generalization hierarchies (Specify generalization hierarchies with three occurrences) and recursive relationship types (with one occurrence related to developing role designators as part of Develop label for relationship type). Although these modeling concepts are generally acknowledged as more advanced [28], the majority of difficulties still relate to the basic modeling concepts of entity types, attributes and (non-recursive) relationship types.

For the meta-objective of informing future design science research on developing (tool) assistance for conceptual modelers at different stages of their learning and mastering of conceptual modeling aimed at mitigating modeling difficulties, the present findings constitute a first step: First, current insights into modeling difficulties faced by beginning and medium-experienced modelers during model creation provide a starting point for the design, implementation and evaluation of tailored tool assistance for modelers that deliberately targets modelers’ difficulties while conceptual data modeling. The majority of modeling difficulties the observed modelers encounter during conceptual data modeling relate to modeling relationship types, particularly, (1) determining cardinalities, (2) developing sensible labels for relationship types and (3) deciding whether to reify a relationship type or not. Hence, design science research on modeling tool assistance that targets devising meaningful labels for model elements and deciding on sensible cardinalities for relationship types promises to result in assistance that deliberately benefits modelers—opening a path for innovative, original contributions to research on modeling tools for end users [35, 51]. A first step in this direction is the Natural Language Processing-based Automated Assistant proposed in [71] that provides beginning data modelers with modeling-time assistance: Based on the insight that devising meaningful and purposeful labels has proven a particular challenge for data modelers, the Automated Assistant provides suggestions on identifying and signifying entity types, relationship types and attributes with meaningful and expedient labels and encourages modelers to rethink their choices of model element labels—and, hence, purposefully supports modelers in mitigating their modeling difficulties. Second, categorizing modeling difficulties contributes to the theoretical foundation of conceptual modeling. The identified types of modeling difficulties serve as a basis for developing a taxonomy of modeling difficulties over the course of multiple studies, in the sense of a classification or taxonomic theory (following, e. g., [36, 48]). Starting from the present results, further studies into modeling difficulties encountered by participants are suggested: Studying modeling processes of modelers with varying characteristics regarding modeling experience and knowledge have to build on, refine and extend the classification system by adding emerging types of difficulties on the basis of characteristics of the actual difficulties observed in modeling processes. In addition, a fruitful path for future research lies in complementing data collection and analysis: To contribute to obtaining a more complete understanding of modeling processes, recording and analyzing eye movement data offers a further promising mode of observation [13, 81]. Further, findings from analyzing the constructed data model as results of the modeling processes offer the opportunity to extend current insights into modeling difficulties. For extending the data analysis strategy to analyze the data models, the conceptual modeling quality framework suggested in [46] provides a starting point. Further, an additional in-depth analysis of cognitive breakdowns in modeling processes with regard to the different types of cognitive load (see Sect. 2) promises to result in rich insights into cognitive processes during modeling.

The observed differences among subjects in the length of the modeling process in both studies is in line with earlier work on prior modeling knowledge and modeling experience of conceptual data modeling (e. g., [6, p. 94]). It is remarkable that the outlier regarding prior modeling experience in the follow-up study (P10) does not exhibit a modeling process that is recognizably different from those of the other participants in terms of length or number of modeling difficulties encountered—in contrast to the outlier P8 who constructs a straightforward solution to the modeling task in a remarkably short modeling process without facing any difficulties in the exploratory study. However, these observations are consistent with practical experience being discussed as only one aspect of being an experienced modeler besides theoretical knowledge and training (e. g., [6, p. 87]; [74, p. 50]). A potential path for future research, hence, lies in further exploring how practical experience in conceptual modeling, knowledge of theoretical foundations of conceptual modeling and modeling training interact in achieving a more advanced level of modeling expertise.

7 Limitations

The present research is subject to limitations with respect to the focus on modelers’ capability of conceptualizing a domain. As it excludes the modeling dialogue between modelers and domain experts, the present work does not claim to address the entire process of conceptual data modeling. In addition, the modeling processes analyzed in the present work assume an individual approach to conceptual data modeling. While this aligns with the focus on a learning context, a fruitful path for future research lies in extending the research to a setting in which modeling dialogues between modelers and domain experts are observed. Recording and analyzing think aloud protocols in such a setting promises to shed light on modeling difficulties experienced in a modeling dialogue.

Principle limitations relate to analyzing think aloud protocols. Modelers may have difficulties verbalizing their thoughts while modeling on principle accounts [15] or on modeling-related accounts—resulting in some verbal protocols not being as complete as others [67]. Generally, it is assumed that thinking aloud does not interfere with thought processes—but as the modeling task includes a visual, non-verbal perceptual component, thinking aloud may slow down the thought processes and/or the modeling performance [30]. We observe that participants with good verbalization skills tend to explicitly verbalize difficulties they encounter, while cognitive breakdowns in modeling processes of participants exhibiting problems in verbalizing thoughts primarily manifest themselves in interrupting or terminating a modeling activity (cf. [11]). Differences in verbalization skills have long been discussed (e. g., [31]), and thus, we included a think aloud training in the data collection procedures of both the exploratory and the follow-up study. Furthermore, all participants in the follow-up study model and think aloud in English, not their first language. However, the participants study in a context characterized by the use of English.

We recruited bachelor and master students from two universities as participants. In line with the educational context in which the present research is embedded and following the objective to achieve insights into modeling difficulties beginning and medium-experienced modelers face while performing a data modeling task, we consider students to be participants that can provide in-depth information about the phenomenon under investigation (e.g., [26]). Furthermore, recruiting students as study participants has a long tradition in related research—starting with, e.g., [6, 8] and continuing with, e.g., [16, 41]—and is acknowledged as viable choice in conducting studies in our field [32]. Beginning practitioners of conceptual data modeling are not explicitly included in the study sample, although several participants reported work experience outside their studies. To achieve further insights into the possible impact of germane load on modeling processes and modeling difficulties, we intend to complement the present studies with further follow-up studies observing subjects including practitioners with various backgrounds, e.g., regarding prior experience and knowledge of conceptual data modeling as well as their first language.

It is important to note that the present work does not investigate the impact of the problem representation—i.e., characteristics of the modeling task’s representation including tool and modeling language referring to the extraneous load on the modelers—on the modeling processes and modeling difficulties encountered by the observed modelers. With regard to the modeling tool, the data analysis differentiated between cognitive breakdowns induced by modeling difficulties and difficulties with the modeling tool. In coding the audiovisual protocols on cognitive breakdowns, we coded for non-task related issues that include difficulties in operating the modeling tool (see Table 1). We carefully examined if certain breakdowns can be attributed to hesitations on how to use the modeling tool. Furthermore, the modeling task and its representation as natural language description are kept constant in the exploratory study, i.e., the natural language description from the library domain, as well as in the follow-up study, i.e., the natural language car rental task. Hence, comparing modeling processes using different modeling tools and modeling on paper as well as varying modeling tasks representations opens a fruitful path for future research into the impact of extraneous load on tool-supported modeling processes.

8 Conclusion

Taking a cognitive view on conceptual modeling processes, we observe data modeling processes of eight learning data modelers with little to no modeling experience in an exploratory study, complemented with observing data modeling processes of eight participants with some experience in conceptual data modeling in a follow-up study. Analyzing over six hours of audiovisual protocols of the modeling processes combined with analyzing modeler–tool interactions and surveying modelers leads us to identify eight types of modeling difficulties these subjects encountered while modeling. Combining complementary observation modes and a corresponding integrated data analysis, we identify and classify eight types of modeling difficulties—relating to modeling entity types, generalization hierarchies, relationship types, attributes with their data types and cardinalities.

Our findings indicate that modeling difficulties of beginning and medium-experienced data modelers primarily refer to modeling relationship types, with a majority of difficulties specifically relating to determining cardinalities and devising sensible labels for relationship types—insights that confirm and extend results from prior research performed by different researchers, with different groups of subjects and varying experimental setups (e. g., [4, 16, 17, 41, 43, 73]. As to the meta-objective of informing design science research on developing (tool) assistance for conceptual modelers, our findings (1) provide a starting point for designing, implementing and evaluating tailored tool assistance for modelers—that deliberately targets modelers’ difficulties while conceptual data modeling, e. g., supporting modelers in deciding on adequate cardinalities or in devising sensible labels for relationship types (e. g., [71]), and (2) constitute a first step toward developing a taxonomy of modeling difficulties over the course of multiple studies (e. g., [36, 48]). In addition, (3) the present insights suggest further research efforts into how practical experience in conceptual modeling, knowledge of theoretical foundations of conceptual modeling and modeling training interact with respect to modeling expertise.

Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

previous article A systematic literature review on IoT-aware business process modeling views, requirements and notations

next article Conflict management techniques for model merging: a systematic mapping review

Appendix

Main modeling task of follow-up study:

As part of a project for the introduction of a new information system in a car rental company specialized in electric cars, you are asked to create a conceptual data model that reconstructs the following facts representing a simplified description of the car rental:

The current fleet of the car rental consists of cars, with a unique vehicle number recorded for each car. To determine the usage period of a car, the acquisition date is recorded for each car. Moreover, the acquisition price is also recorded for each car as well as the color. In addition, each car is described by a car type designation and its range.
The car rental has several different locations (e. g., in different towns). Cars can be rented at exactly one of the rental locations. A name is assigned to each location. Employees of the car rental are assigned to exactly one rental location. At least one employee works at each location.
All employees of the car rental company are assigned an alphanumeric personnel number, and their first name and last name, date of birth and hire date are recorded. Employees can have one supervisor. Thereby, employees can be supervisors for any number of other employees, but not for themselves.
Car rentals can be made by individual customers and corporate customers. Customers of the car rental must be natural persons. When registering as customer, the first name, last name, date of birth as well as the date of registration are entered and a unique alphanumeric customer name is assigned. For individual customers, the International Bank Account Number (IBAN) for delayed returns of cars is recorded, which is not considered necessary for corporate customers. For corporate customers, the name of their organization is recorded.
Customers of the car rental can rent any number of cars. A rental always refers to exactly one car. For a rental, the date of the rental and the due date are recorded in order to be able to determine if a car is overdue.

Ågerfalk, P.J.: Embracing diversity through mixed methods research. Eur. J. Inf. Syst. 22(3), 251–256 (2013). https://doi.org/10.1057/ejis.2013.6CrossRef

Association for computing machinery: curricula recommendations. https://www.acm.org/education/curricula-recommendations/ (visited on 12/20/2021) (2021)

Batra, D.: Cognitive complexity in data modeling: causes and recommendations. Requir. Eng. 12(4), 231–244 (2007)CrossRef

Batra, D.: Cognitive complexity in data modeling: causes and recommendations. Requir. Eng. 12(4), 231–244 (2007). https://doi.org/10.1007/s00766-006-0040-yCrossRef

Batra, D., Antony, S.R.: Novice errors in conceptual database design. Eur. J. Inf. Syst. 3(1), 57–69 (1994)CrossRef

Batra, D., Davis, J.G.: Conceptual data modelling in database design: similarities and differences between expert and novice designers. Int. J. Man Mach. Stud. 37(1), 83–101 (1992)CrossRef

Batra, D., Hoffer, J.A., Bostrom, R.P.: A comparison of user performance between the relational and the extended entity relationship models in the discovery phase of database design. in: Proceedings of the 9th International Conference on Information Systems, ICIS 1988, Minneapolis, Minnesota, USA, 1988

Batra, D., Hoffer, J.A., Bostrom, R.P.: Comparing representations with relational and EER models. Commun. ACM 33(2), 126–139 (1990)CrossRef

Bazeley, P.: Integrative analysis strategies for mixed data sources. Am. Behav. Sci. 56(6), 814–828 (2012)CrossRef

10.

Becker, J., Delfmann, P., Herwig, S., Lis, L., Stein, A.: Formalizing linguistic conventions for conceptual models. in 28th International Conference on Conceptual Modeling. ER 2020, Lecture Notes in Computer Science, vol. 5829, (Springer, Gramado, Brazil, 2009) pp. 70–83

11.

Bera, P.: Situations that affect modelers’ cognitive difficulties: an empirical assessment. in: 5th Americas Conference on Information Systems (AMCIS). Research paper 254. Detroit, MI, 2011

12.

Bera, P., Burton-Jones, A., Wand, Y.: How Semantics and Pragmatics Interact in Understanding Conceptual Models. Inf. Syst. Res. 25(2), 401–419 (2014)CrossRef

13.

Bera, P., Soffer, P., Parsons, J.: Using Eye Tracking to Expose Cognitive Processes in Understanding Conceptual Models. MIS Q. 43(4) (2019)

14.

Berger, P.L., Luckmann, T.: The Social Construction of Reality. Anchor Books, New York, NY (1967)

15.

Blech, C., Gaschler, R., Bilalić, M.: Why do people fail to see simple solutions? Using think-aloud protocols to uncover the mechanism behind the Einstellung (mental set) effect. Thinking & Reasoning 26(4) (2020)

16.

Bogdanova, D., Snoeck, M.: Learning from Errors: Error-based Exercises in Domain Modelling Pedagogy. In: R.A. Buchmann, D. Karagiannis, M. Kirikova (eds.) The Practice of Enterprise Modeling, Lecture Notes in Business Information Processing, vol. 335, pp. 321–334. Springer International Publishing, Cham (2018)

17.

Bogdanova, D., Snoeck, M.: Use of Personalized Feedback Reports in a Blended Conceptual Modelling Course. In: Proceedings of the ACM/IEEE 22nd International Conference on Model Driven Engineering Languages and Systems Companion (MODELS-C), pp. 672–679. IEEE, Munich, Germany (2019)

18.

Brodie, M., Mylopoulos, J., Schmidt, J.W.: On Conceptual Modelling: Perspectives from Artificial Intelligence, Databases, and Programming Languages. Springer, New York et al (1984)CrossRefMATH

19.

Burattin, A., Soffer, P., Fahland, D., Mendling, J., Reijers, H.A., I.Vanderfeesten, Weidlich, M., Weber, B.: Who is behind the model? classifying modelers based on pragmatic model features. In: M. Weske, M. Montali, I. Weber, J. vom Brocke (eds.) 16th International Conference on Business Process Management (BPM), Lecture Notes in Computer Science, vol. 11080, pp. 322–338. Springer (2018)

20.

Burton-Jones, A., Meso, P.: The Effects of Decomposition Quality and Multiple Forms of Information on Novices’ Understanding of a Domain from a Conceptual Model. J. Assoc. Inf. Syst. 9(12), 748–802 (2008)

21.

Chaiyasut, P., Shanks, G.G.: Conceptual data modeling process: A study of novice and expert data modellers. In: T.A. Halpin, R. Meersman (eds.) Proceedings of the 1st International Conference on Object-Role Modeling, ORM-1, Magnetic Island, Australia, 4–6 July 1994, pp. 310–323 (1994)

22.

Chen, M.C., Anderson, J.R., Sohn, M.H.: What can a mouse cursor tell us more? Correlation of eye/mouse movements on web browsing. In: CHI ’01 Extended Abstracts on Human Factors in Computing Systems, p. 281-282 (2001)

23.

Chen, P.P.S.: The entity-relationship model–toward a unified view of data. ACM Trans. Database Syst. 1(1), 9–36 (1976). https://doi.org/10.1145/320434.320440CrossRef

24.

Claes, J., Vanderfeesten, I., Gailly, F., Grefen, P., Poels, G.: The Structured Process Modeling Theory (SPMT) a cognitive view on why and how modelers benefit from structuring the process of process modeling. Inf. Syst. Front. 17(6), 1401–1425 (2015)CrossRef

25.

Claes, J., Vanderfeesten, I., Pinggera, J., Reijers, H.A., Weber, B., Poels, G.: A visual analysis of the process of process modeling. IseB 13(1), 147–190 (2015)CrossRef

26.

Creswell, J.W., Plano Clark, V.L.: Designing and Conducting Mixed Methods Research, 3rd edn. Sage, Los Angeles, CA (2018)

27.

Delfmann, P., Herwig, S., Lis, L.: Unified Enterprise Knowledge Representation with Conceptual Models - Capturing Corporate Language in Naming Conventions. In: Proceedings of the 30th International Conference on Information Systems (ICIS 2019). No. 45. Phoenix, Arizona, USA (2009)

28.

Elmasri, R., Navathe, S.: Fundamentals of database systems, 7th edn. Pearson, Boston (2017)MATH

29.

Embley, D.W., Thalheim, B. (eds.): Handbook of Conceptual Modeling: Theory. Practice and Research Challenges. Springer, Berlin, Heidelberg (2011)MATH

30.

Ericsson, K.A., Simon, H.A.: Verbal reports as data. Psychol. Rev. 87(3), 215–251 (1980)CrossRef

31.

Ericsson, K.A., Simon, H.A.: Protocol analysis: Verbal reports as data, 2nd edn. MIT Press, Cambridge, MA (1993)CrossRef

32.

Falessi, D., Juristo, N., Wohlin, C., Turhan, B., Münch, J., Jedlitschka, A., Oivo, M.: Empirical software engineering experts on the use of students and professionals in experiments. Empir. Softw. Eng. 23(1), 452–489 (2018)CrossRef

33.

Frank, U.: Conceptual Modelling as the Core of the Information Systems Discipline - Perspectives and Epistemological Challenges. In: Proceedings of the Fifth America’s Conference on Information Systems (AMCIS 99), pp. 695–697. Milwaukee, WI, United States (1999)

34.

Frank, U.: Multi-perspective enterprise modeling: foundational concepts, prospects and future research challenges. Softw. Syst. Model. 13(3), 941–962 (2014)CrossRef

35.

Frank, U., Strecker, S., Fettke, P., vom Brocke, J., Becker, J., Sinz, E.: The Research Field “Modeling Business Information Systems”: Current Challenges and Elements of a Future Research Agenda. Business & Information Systems Engineering 6(1), 39–43 (2014). https://doi.org/10.1007/s12599-013-0301-5

36.

Gregor, S.: The Nature of Theory in Information Systems. MIS Q. 30(3), 611–642 (2006)CrossRef

37.

Hirschheim, R., Klein, H.K., Lyytinen, K.: Information Systems Development and Data Modeling Conceptual and Philosophical Foundations. Cambridge University Press, Cambridge, UK (2008)MATH

38.

Hoppenbrouwers, S.J.B.A., Lindeman, L., Proper, H.A.: Capturing modeling processes - towards the MoDial modeling laboratory. In: On the Move to Meaningful Internet Systems 2006: OTM 2006 Workshops. LNCS, vol. 4278, pp. 1242–1252. Springer, Berlin, Heidelberg (2006)

39.

Hoppenbrouwers, S.J.B.A., Proper, H.A., van der Weide, T.P.: A fundamental view on the process of conceptual modeling. In: 24th International Conference on Conceptual Modeling (ER), pp. 128–143. Springer, Berlin, Heidelberg (2005)

40.

Johnson, R.B., Onwuegbuzie, A.J.: Mixed Methods Research: A Research Paradigm Whose Time Has Come. Educ. Res. 33(7), 14–26 (2004)CrossRef

41.

Kayama, M., Ogata, S., Masymoto, K., Hashimoto, M., Otani, M.: A Practical Conceptual Modeling Teaching Method Based on Quantitative Error Analyses for Novices Learning to Create Error-Free Simple Class Diagrams. In: 2014 IIAI 3rd International Conference on Advanced Applied Informatics, pp. 616–622. IEEE, Kitakyushu, Japan (2014). https://doi.org/10.1109/IIAI-AAI.2014.131

42.

Larkin, J.H., Simon, H.A.: Why a Diagram is (Sometimes) Worth Ten Thousand Words. Cogn. Sci. 11(1), 65–100 (1987)CrossRef

43.

Leung, F., Bolloju, N.: Analyzing the Quality of Domain Models Developed by Novice Systems Analysts. In: Proceedings of the 38th Hawaii International Conference on System Sciences, pp. 188b–188b. IEEE, Big Island, HI, USA (2005). https://doi.org/10.1109/HICSS.2005.98

44.

Lindland, I., Sindre, G., Sølvberg, A.: Understanding Quality in Conceptual Modeling. IEEE Softw. 11(2), 42–49 (1994)CrossRef

45.

Mendling, J., Reijers, H.A., Recker, J.: Activity labeling in process modeling: Empirical insights and recommendations. Inf. Syst. 35(4), 467–482 (2010)CrossRef

46.

Nelson, H.J., Poels, G., Genero, M., Piattini, M.: A conceptual modeling quality framework. Software Qual. J. 20(1), 201–228 (2012). https://doi.org/10.1007/s11219-011-9136-9CrossRef

47.

Newell, A., Simon, H.A.: Human problem solving. Prentice-Hall, Englewood Cliffs, NJ (1972)

48.

Nickerson, R.C., Varshney, U., Muntermann, J.: A method for taxonomy development and its application in information systems. Eur. J. Inf. Syst. 22(3), 336–359 (2013). https://doi.org/10.1057/ejis.2012.26CrossRef

49.

Nielsen, J.: Estimating the number of subjects needed for a thinking aloud test. Int. J. Hum Comput Stud. 41(3), 385–397 (1994)CrossRef

50.

Paas, F., Tuovinen, J.E., Tabbers, H., Gerven, P.W.M.V.: Cognitive Load Measurement as a Means to Advance Cognitive Load Theory. Educational Psychologist 38(1), 63–71 (2003)CrossRef

51.

Peffers, K., Tuunanen, T., Rothenberger, M.A., Chatterjee, S.: A Design Science Research Methodology for Information Systems Research. J. Manag. Inf. Syst. 24(3), 45–77 (2007). https://doi.org/10.2753/MIS0742-1222240302CrossRef

52.

Pinggera, J., Soffer, P., Fahland, D., Weidlich, M., Zugal, S., Weber, B., Reijers, H.A., Mendling, J.: Styles in business process modeling: an exploration and a model. Softw. Syst. Model. 14(3), 1055–1080 (2015)CrossRef

53.

Pinggera, J., Zugal, S., Furtner, M., Sachse, P., Martini, M., Weber, B.: The modeling mind: Behavior patterns in process modeling. In: BPMDS 2014 and EMMSAD 2014. LNBIP, vol. 175, pp. 1–16. Springer, Berlin, Heidelberg (2014)

54.

Polanyi, M., Sen, A.: The tacit dimension. University of Chicago Press, Chicago; London (2009)

55.

Pretz, J.E., Naples, A.J., Sternberg, R.J.: Recognizing, defining, and representing problems. In: The psychology of problem solving., pp. 3–30. Cambridge University Press, New York, NY, US (2003). https://doi.org/10.1017/CBO9780511615771.002

56.

Rodden, K., Fu, X.: Exploring how mouse movements relate to eye movements on web search results pages. In: Proceedings of ACM SIGIR 2007 Workshop on Web Information Seeking and Interaction, pp. 29–32 (2007)

57.

Rosenthal, K., Strecker, S.: Toward a taxonomy of modeling difficulties : A multi-modal study on individual modeling processes. In: 40th International Conference on Information Systems (ICIS). Munich, Germany (2019)

58.

Rosenthal, K., Strecker, S., Pastor, O.: Modeling difficulties in data modeling: Similarities and differences between experienced and non-experienced modelers. In: 39th International Conference on Conceptual Modeling, ER 2020, pp. 501–511. Vienna, Austria (2020). https://doi.org/10.1007/978-3-030-62522-1_37

59.

Rosenthal, K., Ternes, B., Strecker, S.: Learning Conceptual Modeling: Structuring Overview, Research Themes and Paths for Future Research. In: 29th European Conference on Information Systems (ECIS). Research Paper 137. Stockholm, Sweden (2019)

60.

Rosenthal, K., Ternes, B., Strecker, S.: Understanding individual processes of conceptual modeling: A multi-modal observation and data generation approach. In: Modellierung 2020, pp. 77–92. Vienna, Austria (2020)

61.

Schoonenboom, J., Johnson, R.B.: How to Construct a Mixed Methods Research Design. KZfSS Kölner Zeitschrift für Soziologie und Sozialpsychologie 69(S2), 107–131 (2017)CrossRef

62.

Sedrakyan, G., Snoeck, M.: Cognitive Feedback and Behavioral Feedforward Automation Perspectives for Modeling and Validation in a Learning Context. In: Hammoudi, S., Pires, L., Selic, B., Desfray, P. (eds.) Model-Driven Engineering and Software Development. 4th International Conference, MODELSWARD 2016, Rome, Italy, vol. 692, pp. 70–92. Springer, Cham (2017)

63.

Sedrakyan, G., Snoeck, M., De Weerdt, J.: Process mining analysis of conceptual modeling behavior of novices - Empirical study using JMermaid modeling and experimental logging environment. Comput. Hum. Behav. 41, 486–503 (2014)CrossRef

64.

Serral, E., De Weerdt, J., Sedrakyan, G., Snoeck, M.: Automating Immediate and Personalized Feedback: Taking Conceptual Modelling Education to a Next Level. In: 10th International Conference on Research Challenges in Information Science (RCIS), pp. 1–6. IEEE, Grenoble, France (2016)

65.

Shanks, G.: Conceptual data modelling: an empirical study of expert and novice data modellers. Australasian Journal of Information Systems 4(2) (1997)

66.

Siau, K., Tan, X.: Improving the quality of conceptual modeling using cognitive mapping techniques. Data Knowl. Eng. 55(3), 343–365 (2005). https://doi.org/10.1016/j.datak.2004.12.006CrossRef

67.

van Someren, M.W., Barnard, Y.F., Sandberg, J.A.C.: The Think Aloud Method: A Practical Guide to Modelling Cognitive Processes. Academic Press, London (1994)

68.

Srinivasan, A., Te’eni, D.: Modeling as Constrained Problem Solving: An Empirical Study of the Data Modeling Process. Manage. Sci. 41(3), 419–434 (1995)CrossRefMATH

69.

Sweller, J.: Cognitive load during problem solving: Effects on learning. Cogn. Sci. 12(2), 257–285 (1988)CrossRef

70.

Ternes, B., Rosenthal, K., Barth, H., Strecker, S.: TOOL - Modeling Observatory & Tool: An Update. In: Short, Workshop and Tools & Demo Papers Modellierung 2020, Vienna, Austria. CEUR-WS, vol. 2542, pp. 198–202. Austria, Vienna (2020)

71.

Ternes, B., Rosenthal, K., Strecker, S.: Automated assistance for data modelers: A heuristics-based natural language processing approach. In: Proceedings of the 31st European Conference on Information Systems (ECIS 2021), pp. 1–16. Marrackech, Morocco (2021)

72.

Ternes, B., Rosenthal, K., Strecker, S., Bartels, J.: Tool–a modeling observatory & tool for studying individual modeling processes. In: Demo Track at the 39th International Conference on Conceptual Modeling, ER 2020, pp. 178–182. Vienna, Austria (2020). ceur-ws.org/Vol-2716/paper18.pdf

73.

Topi, H., Ramesh, V.: Human Factors Research on Data Modeling. Journal of Database Management 13(2), 3–19 (2011). https://doi.org/10.4018/jdm.2002040101CrossRef

74.

Venable, J.R.: Teaching novice conceptual data modellers to become experts. In: International Conference Software Engineering: Education and Practice, pp. 50–56. IEEE, Dunedin, New Zealand (1996)

75.

Venkatesh, V., Brown, S.A., Bala, H.: Bridging the qualitative-quantitative divide: Guidelines for conducting mixed methods research in information systems. MIS Q. 37(1), 21–54 (2013)CrossRef

76.

Venkatesh, V., Brown, S.A., Sullivan, Y.W.: Guidelines for conducting mixed-methods research: An extension and illustration. Inf. Syst. 17(7), 435–494 (2016)

77.

VERBI Software: Maxqda standard 12. https://www.maxqda.com (2018). Accessed: 2019-10-09

78.

Vessey, I., Conger, S.A.: Requirements specification: learning object, process, and data methodologies. Commun. ACM 37(5), 102–113 (1994)CrossRef

79.

Walsham, G.: The Emergence of Interpretivism in IS Research. Inf. Syst. Res. 6(4), 376–394 (1995). https://doi.org/10.1287/isre.6.4.376CrossRef

80.

Wand, Y., Weber, R.: Research commentary: Information systems and conceptual modeling-a research agenda. Inf. Syst. Res. 13(4), 363–376 (2002)CrossRef

81.

Weber, B., Pinggera, J., Neurauter, M., Zugal, S., Martini, M., Furtner, M., Sachse, P., Schnitzer, D.: Fixation Patterns During Process Model Creation: Initial Steps Toward Neuro-adaptive Process Modeling Environments. In: 49th Hawaii International Conference on System Sciences (HICSS), pp. 600–609. IEEE, Piscataway, NJ (2016)

82.

Wilmont, I., Brinkkemper, S., van de Weerd, I., Hoppenbrouwers, S.: Exploring Intuitive Modelling Behaviour. In: Enterprise. Business-Process and Information Systems Modeling, vol. 50, pp. 301–313. Springer, Berlin, Heidelberg (2010)

83.

Wilmont, I., Hengeveld, S., Barendsen, E., Hoppenbrouwers, S.: Cognitive mechanisms of conceptual modelling. In: International Conference on Conceptual Modeling (ER), pp. 74–87. Springer, Hong Kong, China (2013)

84.

Wilmont, I., Hoppenbrouwers, S., Barendsen, E.: An Observation Method for Behavioral Analysis of Collaborative Modeling Skills. In: A. Metzger, A. Persson (eds.) Advanced Information Systems Engineering Workshops. CAiSE 2017, pp. 59–71. Springer, Cham (2017)

85.

Zugal, S., Haisjackl, C., Pinggera, J., Weber, B.: Empirical evaluation of test driven modeling. International Journal of Information System Modeling and Design 4(2), 23–43 (2013)CrossRef

Title: Modeling difficulties in creating conceptual data models
Multimodal studies on individual modeling processes
Authors: Kristina Rosenthal
Stefan Strecker
Monique Snoeck
Publication date: 13-10-2022
Publisher: Springer Berlin Heidelberg
Published in: Software and Systems Modeling / Issue 3/2023
Print ISSN: 1619-1366
Electronic ISSN: 1619-1374
DOI: https://doi.org/10.1007/s10270-022-01051-8

Springer Professional

Modeling difficulties in creating conceptual data models

Multimodal studies on individual modeling processes

Abstract

Publisher's Note

1 Introduction

2 Conceptual modeling as cognitively complex task

4 Research design

4.1 Mixed methods research design

4.2 Multimodal observations

4.3 Exploratory study

4.4 Follow-up study

4.4.1 Study conduct

4.4.2 Data analysis

4.4.3 Participant characteristics

5 Findings

5.1 Modeling processes in the follow-up study

5.2 Modeling difficulties

6 Discussion

7 Limitations

8 Conclusion

Publisher's Note

Appendix

Main modeling task of follow-up study:

Premium Partner

Springer Professional

Abstract

Publisher's Note

1 Introduction

2 Conceptual modeling as cognitively complex task

3 Related work

4 Research design

4.1 Mixed methods research design

4.2 Multimodal observations

4.3 Exploratory study

4.4 Follow-up study

4.4.1 Study conduct

4.4.2 Data analysis

4.4.3 Participant characteristics

5 Findings

5.1 Modeling processes in the follow-up study

5.2 Modeling difficulties

6 Discussion

7 Limitations

8 Conclusion

Publisher's Note

Appendix

Main modeling task of follow-up study:

Other articles of this Issue 3/2023

Guest editorial for the theme section on modeling language engineering

Conflict management techniques for model merging: a systematic mapping review

ChatGPT in software modeling

Safe reuse in modelling language engineering using model subtyping with OCL constraints

Formal translation of YAWL workflow models to the Alloy formal specifications: a testing application

On the assessment of generative AI in modeling tasks: an experience report with ChatGPT and UML

Premium Partner