5.1 Modeling processes in the follow-up study
Regarding the length of the modeling processes, we observe a range from 21 minutes to 35 minutes with a median of 32 minutes in the follow-up study. Note that P12 terminated the modeling session after about 21 minutes accidentally in the modeling tool, and that due to a software error of the modeling observatory the modeling process of P16 was terminated after 22 minutes.
In the post-modeling questionnaire, participants were asked to self-assess the statements “I understood what the modeling task was about” and “I am familiar with the domain of the modeling task” on a scale from 1 to 7 where 1 corresponds to “I do not agree at all” and 7 to “I agree entirely.” Regarding the first statement, the answers ranged from 6 to 7 with a median of 6,5. For the second statement, the answers ranged from 4 to 7 with a median of 5,5. Hence, it is indicated that the participants understood the chosen modeling domain well enough to perform the task.
Regarding verbalization skills, our observations reinforce that there is wide variation in how well subjects are able to verbalize their thoughts while working on the modeling task. The post-modeling questionnaire asked for difficulties with think aloud with an open-ended question—instead of a closed-ended question in the exploratory study. Only two of the eight participants self-assessed to have encountered difficulties in verbalizing thoughts with one subject explicitly referring to difficulties
“to come up with the right vocabulary” (P16) because English is not the subject’s first language and the other stating that he/she prefers putting the own
“thoughts on paper before saying them out loud” (P12). Despite six participants who reported no problems and although all eight subjects received think aloud instructions and a think aloud training (see Sect.
4), we observe two participants (P9, P11) exhibiting problems in verbalizing their thoughts: In parts, the participants describe what they are doing rather than verbalizing thoughts. In addition, the modeling processes of these same participants include two silent periods of more than 30 seconds each. However, as in the exploratory study, we conclude that, altogether, the provided think aloud instructions and training initiated the intended behavior.
5.2 Modeling difficulties
We observe cognitive breakdowns as indication for modeling difficulties in 15 of 16 modeling processes. In the exploratory study, modeling difficulties occur in seven of eight modeling processes with a wide range of numbers of breakdowns, ranging from zero to six observed breakdowns. However, only three of the eight participants explain they encountered difficulties in the post-modeling questionnaire. Note that Participant 8 in the exploratory study marks an exceptional case exhibiting no breakdowns during the modeling process and constructing a straightforward solution to the modeling task in only nine minutes—P8 is an outlier regarding prior modeling experience in the exploratory study with several years of experience (see [
57] for more details). This observation confirms the deliberate design of the modeling task as demanding for modelers with little to no experience, but solvable in a straightforward manner for more experienced modelers. In the follow-up study, we observe cognitive breakdowns in all eight modeling processes, with numbers of breakdowns ranging from two to eight. Participant 10 as an outlier in terms of prior modeling experience in the follow-up study does not exhibit a modeling process that is recognizably different from those of the other participants—confirming the decision to include the outlier in the analysis. Five of the eight participants also explain to have encountered difficulties in the post-modeling questionnaire.
The observed breakdowns in both studies are categorized into eight types of modeling difficulties inducing the breakdowns—the types were developed during the coding of the audiovisual protocols as emerging and refined sub codes in the coding scheme (see Sect.
4). The types of difficulties relate to different aspects of constructing conceptual data models, i.e., entity types, generalization hierarchies, relationship types, attributes and cardinalities (e. g., [
28]). In the exploratory study, we identify five types of modeling difficulties. These are complemented with three additional types of modeling difficulties that have been observed only in the follow-up study. Re-watching and re-analyzing the data of the exploratory study confirmed that these three new types of difficulties do not occur in the modeling processes of the non-experienced modelers. An overview of the lengths of the modeling processes, the overall numbers of breakdowns and the numbers specified by type of modeling difficulties inducing the breakdown in each modeling process in the exploratory study is presented in Table
3 and for the follow-up study in Table
4. Table
5 displays the total numbers of occurrences of all types of modeling difficulties in both studies combined. In the following, each type of modeling difficulties is explained and exemplified by providing transcribed examples from the think aloud protocols of the follow-up study. For examples from the exploratory study, please see [
57].
Table 3
Completion times (in minutes), types and numbers of breakdowns in the exploratory study
Completion time | 23 | 20 | 35 | 18 | 15 | 17 | 35 | 9 | 172 |
Breakdowns | 3 | 2 | 6 | 3 | 2 | 3 | 6 | 0 | 25 |
Differentiate between entity types | | | 1 | | | | | | 1 |
Choose data type of attribute | | | | | 1 | | 1 | | 2 |
Decide between entity type and relationship type | 2 | | 3 | 2 | | | | | 7 |
Develop label for relationship type | 1 | 2 | 2 | | | 1 | 2 | | 8 |
Determine cardinalities | | | | 1 | 1 | 2 | 3 | | 7 |
Table 4
Completion times (in minutes), types and numbers of breakdowns in the follow-up study. Types of difficulties marked with an asterisk (*) emerged only in the follow-up study
Completion time | 32 | 22 | 32 | 21 | 33 | 32 | 35 | 22 | 229 |
Breakdowns | 3 | 3 | 3 | 2 | 5 | 4 | 8 | 2 | 30 |
Differentiate between entity types | | | | | | | | | 0 |
Choose data type of attribute | | | 1 | | 1 | | 2 | 1 | 5 |
Decide between entity type and relationship type | | | | 1 | 1 | 1 | | | 3 |
Develop label for relationship type | | 1 | | | | 1 | 1 | 1 | 4 |
Determine cardinalities | 1 | 2 | | 1 | 2 | 1 | 3 | | 10 |
Decide between attribute and entity type* | 2 | | | | | | 2 | | 4 |
Specify generalization hierarchy* | | | 1 | | 1 | 1 | | | 3 |
Establish relationship type* | | | 1 | | | | | | 1 |
Table 5
Overview of types of modeling difficulties in both studies combined with total numbers of occurrences
Differentiate between entity types | 1 |
Choose data type of attribute | 7 |
Decide between entity type and relationship type | 10 |
Develop label for relationship type | 12 |
Determine cardinalities | 17 |
Decide between attribute and entity type | 4 |
Specify generalization hierarchy | 3 |
Establish relationship type | 1 |
Overall number of observed modeling difficulties | 55 |
Differentiate between entity types: This type of difficulties occurs in the exploratory study only with one participant (P3) encountering a difficulty related to creating entity types. The participant terminates the modeling activity by switching to another one. However, this type of difficulties is observed only once.
Choose data type for attribute: Encountered by two participants in the exploratory study (P5, P7) and four participants in the follow-up study (P11, P13, P15, P16) with a total of seven occurrences, this type of difficulties relates to attributes of entity types (note that the chosen variant of the ER model and its notation does not allow for attributes of relationship types, to simplify the learning process for modeling beginners). The participants face the difficulty of choosing a data type for an attribute that is adequate in the context of the modeling task. A full list of predefined data types was included in the instructions and available to subjects throughout the entire modeling process. An example for this type of difficulties is exhibited in the modeling process of P13 regarding the attribute IBAN of the entity type IndividualCustomer: “IBAN number... That is a... integer? Yes, that’s not a string it’s an integer... No, it’s not a number, it’s more a string... than an integer I guess... An integer is just more like numbers that you can count, and string is just more general” (P13). Participant 16 also faces a difficulty of this type in choosing a suitable data type for the attribute Range of the entity type Car: “And the range... um... um... I am not sure... um... I will just make it an integer?” (P16).
Decide between entity type and relationship type: We observe three participants (P1, P3, P4) in the exploratory study and three participants (P12, P13, P14) in the follow-up study facing difficulties related to modeling decisions as to whether to model an entity type or a relationship type to reconstruct a given statement of the problem representation (with 10 occurrences). In each study, all difficulties of this type refer to one and the same entity type. In the follow-up study, this is the entity type Rental. For example, P14 encounters this difficulty: “Um.. a rental... wait... hmm... a rental always refers to exactly one car... um... Yes, i don’t think I have to do something with this... no... I think it’s already embedded in the relationship rents. And then for a rental the date of the rental... oh... and the due date are recorded in order to be able... oh... ok... so... um...” (P14). The subject interrupts the modeling activity and switches to modeling another relationship type, before returning to the modeling activity and solving the problem: “I think that I should create another entity type called rental with attribute type... date of rental which is a date data type and then... hmm... due date which is also a date... and then I... delete the previous relationship type I made called rents because it is not longer relevant, yes” (P14). Difficulties of this type cause long and severe periods of uncertainty in the modeling processes (of up to more than 4 minutes) and, in this sense, are particularly remarkable—especially because this type of difficulties occurs ten times in six modeling processes and because two of the respective participants were unable to find a solution to this difficulty.
Develop label for relationship type: Difficulties of this type occur in five modeling processes in the exploratory study (P1, P2, P3, P6, P7) and three in the follow-up study (P10, P14, P15), referring to a modeler who creates a relationship type and encounters a problem with finding a descriptive and sensible label for the model element. This type also includes difficulties relating to problems in developing suitable role designators when modeling a recursive relationship type. Participant 14 encounters such a difficulty in labeling the relationship type between the entity types RentalLocation and Employee: “So i put a relationship type between rental location and employee... and I call it... um... assigned... um... how do I call it... hmm... um... employees because... um.. a rental location... employees but no... not employees... um... so rental location... so an employee works... yes works at a certain rental location” (P14). Participant 15 as further example faces a difficulty in developing a sensible label for the relationship type between the entity types Rental and RentalLocation: “So, I create the relationship type and call it... um... hmm... I am not sure how to call it... so... hmm.... has like has... but it’s not nice... [...] ok, maybe car rental is assigned to location, let it be like that” (P15). Furthermore, we observe a difficulty in modeling the recursive relationship type supervises in one modeling process when a participant (P16) encounters a difficulty in developing suitable role designators: “And as role designation... no, role designator... I will name this... um... supervisee... which means that you have a supervisor... I think... or hold on... let me think... um... so this is not really clear for me... so I will make this supervisor... because in the notation sheet.. um.. it states like this... so I will just take this...” (P16). This type of difficulties related to labeling relationship types is remarkable as it constitutes the second most frequent type of difficulties in terms of the total number of occurrences (12) and number of participants concerned (9).
Determine cardinalities: We identify difficulties with regard to determining cardinalities for relationship types in four modeling processes (P4, P5, P6, P7) in the exploratory study. This difficulty reappears in six processes in the follow-up study (P9, P10, P12, P13, P14, P15) with a total of 17 occurrences—constituting the most frequent type of difficulties in both studies combined, regarding the total number of occurrences and number of participants concerned (10). In the exploratory study, it is remarkable that five of the seven occurrences of this type of difficulties pertain to a relationship type with a one-to-many cardinality [
57]. In the follow-up study, it is noticeable that this type of difficulties is primarily encountered for the two relationship types specified between the entity types
Car,
Rental and
Customer, both relationship types with a one-to-many cardinality. For P13, determining the cardinalities for the relationship type between the entity types
Rental and
Car causes a severe difficulty and period of uncertainty (over 3 minutes):
“And a... car rental... any number of cars... hmm... a car belongs to... zero... minimum zero... because not always a car has to belong to a rental or maximum... many... because a car can be rent multiple times... but not at the same time... hmm... a customer... a rental always refers to exactly one car... a rental consists of one car but a car... no.. it is the other way around... a car belongs to one rental... rental... hmm... customers of the car rental can rent any number of cars... or maybe... no... um.. think, think, think... a rental consists of one car... um... no, you can... customers of the car rental can rent any number of cars... so, it is the other way around...” (P13). Participant 15 also encounters a difficulty of this type, again for the relationship type between the entity types
Rental and
Car:
“But cars... hmm... can have... I think... um... now I am a little bit confused because I don’t know whether we are talking about... no, I think yes.. it is ok... so cars itself, they can have.. hmm... so, if the company just purchases this car, then this car may participate in zero rentals because it was just bought, but we should already insert information about this car in our database. So it’s from zero to infinitely many.... For now I leave it like that, I am not sure whether it is correct, but I hope so” (P15). As further example, for P14, determining the cardinalities for the relationship type between the entity types
Rental and
Customer results in a difficulty:
“A rental... involves exactly one customer, I have to check.. um... oh no... customers of the rental can rent any number of cars... so, I assume that it’s also at one point in time that a customer can rent several cars, so a rental involves one to many customers... and a customer is involved in... no, no, no, no .. a rental involves exactly one customer, because a rental is about one car” (P14).
Decide between attribute and entity type: Relating to modeling decisions as to whether to model an attribute or an entity type to reconstruct a statement in the modeling task description, this type of difficulties is faced by two participants in the follow-up study (P9, P15) with four occurrences. Participant 15, for example, faces a difficulty of this type relating to the attribute TypeDesignation of the entity type Car: “Yes, each car is described by a car type... I think... yes... hmm.... from my experience I remember that car types, they were like different entities. But for now I think it would be just an attribute of car. So, car type. I don’t know...” (P15). A further difficulty of this type is encountered regarding the entity type RentalLocation: “So, the next one is car rental and car rental has an attribute of location... maybe... um, for now I leave it, um, is an... oh, no... I think that... um, location, location, cars can be rented, a name to each location, employees... oh, no, I think that car rental does not have a location attribute, location will be the next entity type” (P15).
Specify generalization hierarchy: This type of difficulties relates to specifying generalization relationships between entity types and, hence, constitutes a type that only occurs in the follow-up study, faced by three participants (P11, P13, P14) with three occurrences. Participant 11 faces difficulties relating to the generalized entity type Customer with the sub types IndividualCustomer and CorporateCustomer: “What I am trying to figure out is... hmm... how to model this generalization because the customer can be individual or corporate ... but... corporate do not have a last name... hmm... I am trying to be more efficient, but do not want to restrict it or repeat attributes...”. Remarkably, the difficulty causes a long period of uncertainty in the modeling process of P11 (about 4 minutes) including a silent period of about 30 seconds with the participant finally solving the problem: “Ok, now I know how” (P11). Participant 14 also faces a difficulty relating to this generalization relationship: “Um.. this generalization is.... um... hmm... I think that it is... hmm... total because... um.. every customer is or an individual customer or a corporate customer. It cannot be something else” (P14).
Establish relationship types: This type of difficulties encountered by one participant (P11) in the follow-up study refers to establishing relationship types between the entity types Rental, Customer and Car and results in an uncertainty how to model the given statements: “Hmm... I am thinking how to put the relationships between these entities and the rental agreement” (P11). However, the participant finally solves the problem and this type of difficulties is observed only once.