1 Introduction

Service Composition (SC), particularly when aimed at end-users, is a relatively young area, and this is reflected in the small amount of work that has been done on deriving requirements for tools to support end-users in performing SC. End-user Service Composition (EUSC) is defined as SC where the user who composes a composite service is the same person who ultimately uses it. In most cases, we may assume that such a user has a relatively low level of technical knowledge compared with, for example, a professional service developer.

This work focuses on SC by end-users, with no explicit restriction on the technologies that underpin the overall composition process. Traditional SC approaches tend to be driven by the technology underpinning the composition or the developer, rather than by the end-user of the composition tool or composite service [1]. We argue that EUSC and the technologies that underpin it can benefit from active investigation of what end-users require from composition.

Prior work has derived a limited number of requirements for EUSC by using focus groups to gather user opinions and perceptions of various elements of SC on the Web [2, 3]. The requirements derived across this prior work cover the area sporadically, providing few general requirements and few requirements for specific areas of EUSC.

The field would benefit from an approach to deriving a comprehensive set of requirements for EUSC tools. While prior work has focused on Web-based SC, which is arguably one of the more compelling areas in the field, the range of available EUSC tools is now broader than this, notably on mobile platforms. Hence our work targets EUSC tools in general, and the wide range of services available across diverse domains, and is not limited to the Web.

The process and requirements presented here focus on requirements gathering from end-users—so-called ordinary people—rather than business customers in a B2B relationship. The process of SC is an inherently technical one, and potential users are likely to need exposure to concepts such as apps, Web 2.0 services, etc., in order to properly understand it. Furthermore, it is unlikely that a user who has not previously been exposed to such mobile applications (apps) and services would seek to use SC to solve the problems that she might have. We therefore restricted the target end-users in this work to owners of a smartphone (any platform) who were familiar with these concepts and who had experience of using apps and services on these devices.

The main contributions of this paper are (1) to describe a generalizable and repeatable process for deriving a robust and coherent set of requirements for EUSC tools; and (2) to present the comprehensive set of 139 requirements that we gathered. We demonstrate that our methodology is generic by outlining how it could be applied in different domains in addition to our chosen domain of EUSC. Our methodology also significantly extends and adapts the one on which it is based: we show how the data from the sessions is analysed and requirements elicited, and we adapt the methodology from targeting business customers to targeting end-users.

Our requirements cover a broad range of topics—relating both to composition in general and to a number of specific problems in composition. We structure the functional requirements based on an adapted model of the life cycle for EUSC, and the non-functional requirements are categorized against the quality standards presented in ISO/IEC 25010 [4]. Some of the requirements found in our work validate those found in prior work while others are unique to our findings. Thus, our findings can inform the design of future EUSC tools.

The work reported here is aimed at deriving requirements for EUSC tools, i.e., the tools that end-users use in the process of performing SC. To clarify, it is not aimed at deriving requirements for the composite services produced through that process. Another clarification required is in terminology. There are various definitions for the entities involved in EUSC but no current standardization, so we now provide definitions of terms that will be used in the rest of this paper. We define “components” or “component services” as the input services to the composition process; “composites” or “composite services” as the output composed services, and both components and composites generically as “services”.

The rest of this paper is structured as follows. Relevant background is discussed in the next section. In Sect. 3, we present the method that was used in our study. Section 4 presents the results, and Sect. 5 presents the analysis process and derivation of requirements. Section 6 presents a small illustrative subset of the requirements as well as an evaluation of the set as a whole. Finally, we present limitations, future work, and our conclusions.

2 Background

In this section, we present related work from EUSC, including other EUSC requirements gathering approaches and a brief overview of the method used in SCRAM.

2.1 End-user Service Composition

SC is defined as the process of creating services at runtime from a series of component services [5].Footnote 1 SC has been used widely within business but has only relatively recently been aimed at consumers as end-users. This section will discuss the SC life cycle, the different layers at which SC can operate, and available EUSC tools.

There are two main views on the life cycle for EUSC, the first of which splits it into four stages [6]. These are as follows:

  1. 1.

    Service request—the user requests the composition that she wants to create [6] (N.B. This is relevant only in automated SC);

  2. 2.

    Service discovery—the user or the system (or both) discovers the component services to be used in the composition process [5, 6];

  3. 3.

    Composition—the user or the system (or both) combines the services, coordinating them in order to create a new, composite service [5, 6];

  4. 4.

    Service execution—the output of the composition stage is executed [6]. However, to be truly dynamic, this life cycle needs to make some provision for adapting the composite service once it has been created.

The other view on this life cycle identifies 6 stages in service-oriented development, which are further broken down into 16 activities [7]. These stages and activities are outlined in Table 1.

Table 1 Stages of the EUSC life cycle [7]

Drawing on these views of the EUSC life cycle, for the purposes of this paper, we consider the following main stages of the EUSC life cycle:

  1. 1.

    Request and Discovery—The user makes a request for the service(s) that they want and discovers components or composites that meet her needs.

  2. 2.

    Composition—The discovered components are coordinated together to form a composite service.

  3. 3.

    Verification and Validation—The user verifies that the composite service executes correctly and meets her needs.

  4. 4.

    Annotation and Deployment—The user annotates the service with relevant information and optionally deploys it to some service repository to be discovered by other users.

  5. 5.

    Execution and Management—The user executes the service and can subsequently adapt it to meet changes in specification or the execution environment.

There are three layers at which SC can operate: the application layer, the service layer, and the presentation layer [8]. Our work is not specifically aimed at any one of these layers. Furthermore, the differences between application- and service-layer composition are often imperceptible to the user of the tool.

A standard “benchmark” against which to compare SC tools is Yahoo! Pipes [7],Footnote 2 which has been available for several years. More recently, Web-based SC tools such as “if this then that”—IFTTTFootnote 3 and ZapierFootnote 4 have been released. SC tools in other domains include mobile (Android-based) applications such as Tasker,Footnote 5 and hybrid Web and mobile approaches such as Microsoft’s On{X}Footnote 6 There are also examples of desktop SC applications available for Mac OS X: automator—a workflow automation tool, and Quartz Composer—a multimedia composition tool.

2.2 EUSC requirements

In this section, we describe requirements that have already been gathered for EUSC tools, or simply used to specify them. We split this prior work into two groups: (1) those which explicitly sought to gather requirements for EUSC tools, and (2) those where the requirements have simply been stated before the creation of an EUSC tool.

2.2.1 EUSC requirements gathering

To our knowledge, only one other research group has sought to gather requirements from end-users for EUSC, in that case in the area of Web-based EUSC [2, 3].

To gather requirements for EUSC tools on the Web, Namoun et al. [3] carried out a focus group study to identify users’ perceptions of services and SC in order to design a future EUSC tool. Their focus group sessions were made up of six steps. First, participants were asked to define common SC terms, before being provided with definitions of these terms. They were shown an initial mock-up of the design of a SC tool and asked to comment on its design. Next, participants were guided through an example composition from a script. Participants were then invited to provide their views on SC and evaluate various other design mock-ups for the tool. The penultimate stage involved the researcher demonstrating the process of a SC using a prototypical design of the SC tool whose design was presented earlier. Finally, participants were asked to give their views on SC and the approach taken in the design of the prototypical tool.

Following analysis of the responses given by participants across each of these sections, Namoun et al. [3] presented the following set of requirements:

  1. R1.

    Display services by their user interface

  2. R2.

    Use a semi-automatic approach to composition

  3. R3.

    Avoid technical jargon

  4. R4.

    The composition “canvas” should be large and interaction easily

  5. R5.

    Services should be secure

  6. R6.

    Feedback to users should be continuous and proactive

Other work in deriving requirements for SC followed a similar process. Mehandjiev et al. [2] also used focus groups to assess potential users’ opinions of SC and the different representations of the flow of composition—control flow and data flow.

Their sessions included five stages. As with [3], the session began with participants being asked to define terms relating to SC, in this case service and software service. They were then given a 20-min introduction of SC concepts, followed by being asked to complete a form to assess their technical knowledge and prior experience with software services. Next, participants were asked to perform three tasks to assess how they would compose “atomic” services into composite services. A final presentation and associated questionnaire were then given to assess users’ opinions on the design alternatives for composition—control flow, data flow, or their assisted composition approach. Participants’ responses to an exit questionnaire yielded the following requirements:

  1. R7.

    The data being passed between services should not be presented

  2. R8.

    Sets of data should be treated as a single item

  3. R9.

    Users should be assisted in solving control flow dependencies

  4. R10.

    Users should be assisted in solving data flow problems

  5. R11.

    The UI of the composed service should be represented

The requirements presented in [3] (R1–R6) are a small set of general requirements for EUSC on the Web, whereas those from [2] (R7–R11) are more focused on a single aspect of composition: the flow between components. The overall set of requirements is deficient in two ways: (1) there are few requirements that are general, applying to EUSC as a whole, and (2) only one-specific area of the EUSC process has been investigated. Thus, we feel that it is necessary to gather a more comprehensive set of requirements to address both of these deficiencies.

2.2.2 Other EUSC requirements

Several other authors have listed requirements for EUSC in work where gathering such requirements was not an explicit aim. Instead, requirements are often simply stated before they create an EUSC tool, without any discussion of the source of the requirements. The first of these was an investigation of EUSC at the presentation layer, i.e., composition of service interfaces by Nestler et al. [9], in which they list 5 general requirements for their tool, the ServFace Builder. Mehandjiev et al. [10] present a user-first method for composition. After presenting their composition approach, they provide 6 recommendations gathered from focus groups that were presented with their approach and asked to evaluate it. Albinola et al. [11] detail the creation and structure of a mashup framework called Mashlight, and list requirements that Mashlight should adhere to throughout the work. Finally, Bottaro et al. introduce an architecture for composition of pervasive services in the home, prior to which they identify 5 requirements [12].

A summary of the requirements found in these works is discussed in the preliminary requirements capture section, and requirements are presented in “Appendix 1”. The above sets of requirements (both gathered and stated) give a sporadic coverage of the EUSC domain, which motivates a more formal approach to gathering a comprehensive set of requirements. Our work sets out to derive this comprehensive set of requirements that cover a range of design areas within EUSC and associated tools. We derived these requirements using a method that was influenced by the scenario-based requirements analysis method (SCRAM).

2.3 Requirements gathering techniques

Most requirements gathering techniques focus on a business customer rather than being targeted at end-users. Our goal is to gather requirements from a group of potential end-users, so any such technique is likely to require some modification.

2.3.1 Interviews

Interviews with stakeholders are often used as part of the requirements gathering process [13]. Interviews normally involve a stakeholder who already uses a similar system (in the B2B context) and hence has the “right” and “wrong” answers as to what can be classed as a requirement for a system [13]. These interviews are either closed—questions to be posed to stakeholders are defined beforehand—or open—where the agenda of the interview is not defined beforehand—or, more commonly, a blend of the two [13].

Normally, interviews would also be used by requirements engineers to gather more information about the domain in which the stakeholder operates [13]. In this case, however, the requirements engineer is the domain expert (i.e. expert in EUSC) rather than the interviewee, meaning that some methodological adaptation is needed. This role reversal mitigates issues that can arise from stakeholders using terminology unfamiliar to the requirements engineer or inadequately describing the requirements due to their greater knowledge in the domain. Furthermore, these interviews would normally take place within a business context and have to take into account political and organizational issues—another concern that we do not have to address.

2.3.2 Scenarios

Scenarios are normally used in requirements gathering as a means of identifying tasks to be completed using the tool that is the output of the system development process as a whole [13]. In our case, the stakeholders in our requirements gathering process are potential end-users of EUSC tools who may not have any knowledge of EUSC prior to the interview session. Therefore, in addition to the use of scenarios to help identify tasks, they are a means of conveying this knowledge to the stakeholder.

Potts’ Inquiry Cycle [14] is an example of a scenario-based requirements analysis method that utilizes scenarios as a mechanism for identifying problems when performing requirements analysis [14, 15]. However, they provide little detail on how these problems are identified or how requirements are extracted [14, 15].

2.3.3 Scenario-based requirements analysis method (SCRAM)

SCRAM is a requirements analysis method that utilizes introductory scenarios, a concept demonstrator application, and examples of other potential designs to present a concept to potential users of a system and gather requirements for it. SCRAM consists of 4 stages [16]:

  1. 1.

    Preliminary requirements capture and domain familiarization: this is preliminary research that is required to gather requirements and design rationale to facilitate creating a prototype.

  2. 2.

    Storyboarding and design visioning: a prototype of the required system is designed and created. Scripts are also created to outline the process that the demonstrator would undertake if it were a fully fledged application.

  3. 3.

    Requirements exploration: users are presented with the concept demonstrator (along with other designs) and scenarios to demonstrate the problem area. Probe questions are asked at key points in the demonstration script, and design decisions are illustrated with design rationale documents.

  4. 4.

    Session analysis: the data are analysed to derive requirements that can be reported back to the user.

3 Methodology

This section gives an overview of the method that was applied in our requirements gathering sessions. We also describe the tasks that were undertaken before and after the sessions, before any analysis was carried out.

Our requirements gathering method is influenced by SCRAM, outlined briefly above. However, published works that describe SCRAM provide guidance only for the earlier stages of the requirements gathering process, giving little detailed guidance for the latter stages [16, 17]. We used SCRAM as the basis of our method because we felt that it would be a relatively simple process to adapt to focus on end-users rather than business customers. In particular, we felt that the prototypical demonstrator would be a very effective way of presenting EUSC to our participants. Scenarios to which participants can relate are also of particular value in developing shared understandings, given that most end-users of EUSC tools are likely to be unfamiliar with EUSC.

We divide the method into three sections:

  1. 1.

    Pre-study: the activities performed before the sessions were carried out.

  2. 2.

    Study: the method carried out within the study sessions themselves.

  3. 3.

    Post-study: the process of transforming the data gathered in the sessions into a set of requirements.

3.1 Pre-study method

Our pre-session method was based on SCRAM since this part of the method sets up the prototype demonstrator to be used in the study sessions.

3.1.1 Preliminary requirements capture

Prior to the creation of the prototypical demonstrator application, an initial set of requirements must first be gathered to which the demonstrator must adhere. Sutcliffe and Ryan present this as the first stage of SCRAM, but little instruction is provided as to how requirements should be gathered [16, 17]. The instruction provided assumes that the requirements are being gathered in a business context rather than from end-users, necessitating a different approach in our case. We collated requirements from the work described in the Background section, yielding a short list of 19 requirements, which can be found in “Appendix 1”. The topics covered in the preliminary requirements range from general EUSC [6], to EUSC on the Web [9], mashups [11, 18], to pervasive EUSC [12].

Following the preliminary collation of requirements, we performed reviews of the EUSC literature and available tools to identify the main functions of EUSC tools that had not been identified in this initial set of requirements. The main outcome of these reviews was the identification of functions that the prototype must perform that were missing from the preliminary requirements capture. The functions identified were based on aspects of the EUSC life cycle [6, 7]:

  1. 1.

    Specification and Discovery—The user specifies what they want and discovers components that they could use in composition to meet their specification.

  2. 2.

    Composition—The user coordinates the components discovered in previous step to create a composite service.

  3. 3.

    Verification and Validation—The user verifies that the composite they have created meets with their initial specification.

  4. 4.

    Annotation and Deployment—The user records information about the composite they have created, as well as potentially being able to share the composite with other users.

  5. 5.

    Execution—The user executes the composite service that they have created.

We then sought to group together the requirements in a manner in which they can be used to motivate the design of the prototype. Geyer suggests that a design space model is a useful method for presenting requirements for a system [19], but since we have already collected the requirements, we require only the design space structure. One suggested structure for a design space model groups requirements or design decisions into three groups: functional, non-functional, and structural [20].

Functional requirements identified the functions that the prototype needed to perform, which were based on the stages identified in the composition process from the prior literature and a tool review. Non-functional requirements mostly related to the representation of and possible interactions with the composition process, for example semi-automation [3], the use of metaphor [9] or abstraction, e.g. hiding technical details such as code [9, 21]. Other requirements included avoiding complex terminology and technical jargon, as well as providing proactive user feedback [21]. None of the preliminary requirements identified fit into the set of structural requirements. A complete list of the preliminary requirements can be found in “Appendix 1”.

3.1.2 Prototype specification and implementation

The second stage presented in SCRAM is the specification and development of the prototypical demonstrator, treated as a “script” with limited functionality and interactivity [17]. Sutcliffe states that better quality feedback is received from users when presenting an interactive prototype [22]. Furthermore, we felt that for SC it is important for participants to be able to experience the process of composing services and using the output. Hence, we decided that our prototype should, at the very minimum, provide the ability to compose services and execute the composite. After gathering the requirements for the prototype, these requirements needed to be translated into the design for a prototype. The functional requirements suggested three main areas in which the prototype must provide functionality:

  1. 1.

    Viewing, discovering and interacting with components.

  2. 2.

    Creating composites by using components in composition.

  3. 3.

    Viewing, discovering and interacting with composites (including iteration).

Within each of the main sections of the tool, various design decisions were made based on requirements derived earlier, as well as being influenced by the designs of currently available EUSC tools.

For example, one requirement recommended the use of templates in composition to simplify the process [10]. In a review of available tools, we saw that examples of templates could restrict composition to “if [Trigger] then [Action]” (e.g. IFTTT or Zapier), which is a particularly restrictive template. Another option (which we chose to follow) was to restrict composition to being linear, meaning that composition would allow “[Component 1] then [Component 2] then …” (e.g. Automator or Tasker), which provides more freedom to the user. Other design decisions made at this stage were recorded and later presented to participants as part of the SCRAM sessions.

An important initial design decision was the platform on which the prototype should operate; in our case, the scenario prompted a mobile platform, and Android was selected. We named the Android app Composer, and a generic icon was selected from the provided images in the Android SDK.

To ensure that the prototype was presented at the same level as the other tools in the session, it was important to present an application that was as fully featured as possible and that did not necessarily look like a prototype.

Figure 1 shows the composition process in Composer for the composite in the main scenario for the sessions—a composite that notifies the user about delays on the tube. (The relevant scenario is presented later.) Figure 2 shows the list of components that are available to be composed in the prototype.

Fig. 1
figure 1

The composition process in Composer

Fig. 2
figure 2

The list of components provided by Composer

3.1.3 Scenario

The main scenario that participants were presented with as part of the introductory materials is shown below:

“Ben has a London Underground tube line status service for his smartphone that allows him to check the status of any tube line. He feels that it is too much hassle to check each of these manually every time he needs to get the tube and wants his phone to notify him when there is a problem using the in-built notification service in the OS. There is no option in the service itself to do this, so he decides to use EUSC to fix his problem. Using the Composer tool, he is able to compose the phone’s notification service onto the tube service, so that when a problem is reported on a particular line (which he can choose), he will be notified via a new item in the notification tray.

After using this service for a few days and being notified at strange times of day, he decides that he wants to receive these notifications around the times he would normally be getting the tube. He chooses to edit the service and adds the device’s clock service in between the tube service and the notification service. He sets the clock to only let the notifications through between 6–8 a.m., and 4–7 p.m.”

This scenario was created in an earlier technical meeting with a number of experts in services and SC.

3.2 Study sessions

We made a number of changes to the session method suggested in SCRAM, based on adaptations that we required because of the different context in which we were carrying out our study and drawing on comments made by Sutcliffe et al. [16, 17], regarding problems with the method in its original form.

Our sessions were split into three main parts, shown in Fig. 3. First, participants were presented with a set of introductory materials to describe SC, a glossary of terms, and a set of five introductory scenarios (as SCRAM). In the second stage, participants watched a composition task being performed with the demonstrator application following a scripted sequence, followed by a demonstration of a similar task on other EUSC tools, and were interviewed about the relevant section of the tool while having design decisions explained to them. This process was then repeated for different general functions within the tools. Finally, participants were invited to make any further comments on SC, the concepts involved or any of the EUSC tools that they encountered during the session.

Fig. 3
figure 3

The sub-stages of our study sessions

The main part of the session was the run-through of the script with the prototypical demonstrator and other examples of EUSC tools, and subsequent interview questions (Task demonstration and interview in Fig. 3). The other EUSC tools were selected to present a diverse range of design options to participants across two domains: mobile, Web, and desktop. The tools selected were as follows: Tasker (mobile), On{X} (web/mobile hybrid), IFTTT (Web), Yahoo! Pipes (Web mashup), Automator (desktop) and Quartz Composer (desktop). This part of the session was split further into three sections, which were chosen based on the three activities identified in the prototype specification: interacting with components, composition, and interaction with composites. For each of these activities, we performed the following tasks:

  1. 1.

    Demonstrate the use of the corresponding section of the tool in the Composer prototype. This demonstration followed a scripted sequence, and the demonstrated task was the same across all participants.

  2. 2.

    Perform a similar scripted task in the same section of each of the other EUSC tools selected: Tasker, On{X}, IFTTT, Yahoo! Pipes, Automator, and Quartz composer. Note that we were unable to perform precisely the same task in each tool due to the diverse nature of the chosen EUSC tools. However, the tasks were the same across all participants.

  3. 3.

    Interview the participant on the section of the tool using the probe questions and prompts derived from our initial categories. Questions were left open-ended but were followed up with probes if participants seemed unsure how to respond.

Each session lasted approximately 90 min and was video recorded.

Our method deviates from SCRAM in this part as we presented alternative fully fledged EUSC tools to participants rather than storyboard sketches of other designs. The aim of this change was to remove the bias towards the prototypical demonstrator over the other designs, which has been identified as a weakness of the original method [16, 17]. We feel that our change is beneficial as it presents the other designs on an equal footing with the prototypical demonstrator—especially given that our prototype is more fully featured than the prototype suggested in the original method. Furthermore, the original work on SCRAM presented design alternatives to participants using questions, options, and criteria (QOC). This was found to be ineffective as users did not understand the representation, and Sutcliffe and Ryan [17] recommended using tables instead. We incorporated this design decision discussion with presenting the tools and were able to demonstrate the different design decisions made in each tool while it was being used.

To assess the effectiveness of our methodological changes, we carried out three pilot studies. When performing these pilot sessions, we found that they were running for upwards of 2 h, which became fatiguing for the participants and consequently detrimental to the effectiveness of the discussions. Sutcliffe and Ryan [17] indicate that session overloading is a problem that can occur with SCRAM and should be minimized wherever possible. Since we could not remove any of the earlier sections without compromising participants’ knowledge, and hence, the requirements we would be able to gather, we instead moved the explicit requirements elicitation process from the sessions themselves and performed it post hoc following data analysis. Our discussions with participants of their answers to the probe questions still provided enough detail from which to generate a comprehensive set of requirements.

Within the sessions, we decided to have only a single requirements engineer. A pilot session with multiple requirements engineers showed that given the lightweight nature of compositions using the EUSC tools we were demonstrating, the second requirements engineer was redundant for most of the session. This also meant that we could reduce the number of participants per session, since removing the second requirements engineer meant that the session could remain balanced with one participant, who could still maintain ownership of the session [17]. Across three pilot sessions, we found that multiple participants led to fewer topics from which requirements could be derived compared with having a single participant discussing topics with the requirements engineer.

3.3 Post-study method

This part of our method is not related to SCRAM, since in the original work, Sutcliffe and Ryan’s requirements were elicited in the session whereas ours are not, and they provide little guidance as to how this is achieved in those sessions [16, 17]. The output of our study sessions is a transcript of the discussion that the participant had with the requirements engineer, which needs to be analysed to elicit requirements. Our approach was first to codify the transcriptions in order to identify the topics of the discussions and then further to elicit requirements. The approach we used for this analysis and elicitation is directed content analysis (DCA). DCA uses a more structured approach than other content analysis approaches [23]. This is achieved by identifying aspects from prior research in the area as initial coding categories [23]. DCA involves going through the transcript and assigning codes to topics/concepts relating to the initial categories. Any topics that do not relate to the initial categories are put to one side to be analysed later. These latter topics are then coded and grouped together into a set of subsequent categories. It is useful in an interview situation where participants are asked a series of questions that follow the topics of the predetermined categories, since they are primed to respond about those topics, and DCA is robust to priming. Data coded in this manner should not be compared using statistical tests of difference, so ranking and frequency comparisons may be used instead [24]. However, when performing these ranking and frequency analyses, it is important to take into account the bias/priming incurred by the initial categories compared with those of the codes and categories derived from the data.

In contrast, conventional content analysis is used when there is relatively little existing theory or research in a particular area, where codes and categories are derived directly from the data itself [23]. This approach relies on open-ended questions with high-level probes [23], not suitable for use with SCRAM.

We identified seven initial categories for DCA from domain research. Composition flow [2] is the type of flow in the composition—control, data, etc. (reported in Sect. 2.2). Composition—connections and compatibility [25] are the connections between the components in the composition process. Metaphor [26] is the metaphor used to abstract the representation of the composition. Templates/examples [2, 10] refer to the use of templates/examples to assist the user. Component type is the “type” of components that are supported, e.g. triggers in IFTTT. Discovery/acquisition of components refers to how users discover and acquire new components [6]. Attributes are the attributes that the component or composite presents to the user [10].

There were obvious omissions from the set of categories above with respect to SC, so we reviewed the design of currently available SC tools to gather others. Components and composites are general references to components and composites, respectively. Inputs/outputs are the display and use of inputs and outputs of components (if data passing is supported). Testing is the ability of the user to test a composition as she is making it. Grouping refers to how users might want to group collections of components or composites together. Aesthetic refers to how visual the SC tool is. The categories identified in the preliminary work for DCA were used as the topics for the probe questions in the sessions.

4 Results

This section describes the stage at which the data from the sessions were transcribed and broken down into codes that could subsequently be analysed in order to derive requirements.

We carried out 10 sessions with 1 participant per session: 5 male, 5 female, with a mean age of 27.8 (SD = 10.18). Five participants were students: 4 were employed and 1 was self-employed; 3 had a background in Computer Science, 2 in Physics, 2 in Beauty, 1 in Engineering, 1 in Psychology, and 1 in Geography. All were owners of smartphones: 5 iPhone, 4 Android, and 1 Blackberry. Transcriptions of the sessions were coded using directed content analysis based on the initial categories listed in Sect. 3.1. Each category had a number of codes associated with it, and each code had a total number of occurrences across all participants: O; and a number of participants who used the code (out of 10): P. For example, within the category “Attributes – components”, three of the identified codes were as follows: [Description, 24O, 5P], [Number of uses, 8O, 5P], and [Cost (free), 8O, 5P].

According to the directed-coding process, codes that did not fit into initial categories were categorized in a bottom-up manner, yielding a set of 6 subsequent categories. Tool feature referred to general features of the tool. Social indicated any connections with social media/friends. Assistance reflected any extra assistance that the tool provides to the user. Specific tool/function indicated that there was a reference to a specific tool or feature that a tool has. Accessibility referred to accessibility features provided by the tool. Comparison with non-SC tool indicated that the participant compared an aspect of composition with something outside the domain of SC.

Given a set of codes and occurrences derived using directed content analysis, we are able to use quantitative comparisons to apply a rank ordering to the codes that were identified [23]. When considering the two types of category separately, this gives us a rough idea of how popular the different codes within the categories were. However, this is a very simplistic approach and as such cannot be used to make any grand statements about the popular features within particular areas.

For instance, Table 2 shows the five most popular codes in the initial categories, and Table 3 shows the five most popular codes in the subsequent categories. In this case, popularity was judged based first on the number of different participants with whom that code was identified, followed by the total number of occurrences across all participants. Occurrences of codes were only identified if they were explicit references to the code made by the participant; just because a participant was asked about a topic did not mean that a code was recorded at that point.

Table 2 Most popular initial codes
Table 3 Most popular subsequent codes

While this method of ranking codes by their frequency of use shows which codes were popular across participants (with or without priming/bias), it does not reflect all of the interesting findings of the study, given that the most interesting topics were usually identified by a single participant. These “interesting” codes were identified across both sets of categories—both initial and subsequent. A selection of the interesting codes is listed below:

  • OS integration—integration of SC with the home screen of the phone instead of an app.

  • Composing pervasive services—services in the environment can be discovered.

  • Multiple icons to show components/function—the composite should be represented by a collection of icons showing the components within it.

  • Making a description for a composition means you can make a composition from a description—if the process of composition can create a single plain text description of the output, then the process of composition could be powered by a single text description and the process reversed.

  • Automatic composition identification—the tool should “watch” the user’s actions and automatically identify compositions.

  • Infinite composition—the output of composition could also be used as the input.

5 Analysis and requirements derivation

This section describes the process by which the codes that were gathered from the requirements gathering sessions were analysed in order to derive a set of requirements.

To outline how requirements were derived from the codes generated in content analysis, we will select a small set of codes to illustrate how the corresponding requirements were identified. The codes that we will present fall into the following categories:

  1. 1.

    Those found in prior work that our work has validated, e.g. control flow, data flow.

  2. 2.

    Concepts that are present in current SC tools but have not been identified as requirements in prior work, e.g. metaphor, composition templates.

  3. 3.

    Concepts not identified in prior work on SC but are visible in related, e.g. user ratings (considered within the domain of mobile apps).

  4. 4.

    Concepts that are not within prior SC work or found in other related domains, e.g. automatic composition identification, infinite composition, and pervasive composition.

Control flow and Data flow were codes assigned to participants’ responses when they were asked to describe what a given composition representation and were identified based on how participants phrased their response, (in a similar way to [2] ): “The tube is looking up stuff and then it’s notifying you and then it’s the end”—P1. “The tube component passes something to the notification”—P6; “You also need to understand how the data moves in this on”—P2. Specifically, we were interested in the participants focus on either the ordering of the composition (represented by control flow), or the data being passed between the components (represented by data flow). This analysis yielded two requirements, both of which have been identified in prior work by Mehandjiev and De Angeli [2]:

R1.:

The flow of control between components should be represented in composition.

The tool should present the order that the components execute in.

Rationale: users need to be able to identify the order in which components are executed in the composition

Category: non-Functional—Operability—Appropriateness recognizability

Criticality: high

Risk: low

Associated requirements: R44, R44.2, and R44.3

Source: [Control flow]

Prior identification: [2]

COTS: Atooma, Tasker, AutomateIt, IFTTT, Zapier, Automator

R2.:

The flow of data between components should be represented in composition, if it is present.

The tool should show how the data are passed between the components in the composition.

Rationale: users should be able to identify what data are being passed between components in the composition

Category: non-Functional—Operability—Appropriateness recognizability

Criticality: medium

Risk: low

Associated requirements: R44, R44.1, and R44.3

Source: [Data flow]

Prior identification: [2]

COTS: Yahoo! Pipes, and Quartz Composer

There are other concepts that can are present in the design space of EUSC tools but have not yet been used as requirements for EUSC. Two examples we identify for this are Testing and Sharing. Testing of composites is the mechanism provided by most EUSC tools for the user to verify that the composite they have created will execute correctly and performs the task that they intended. Sharing allows users to share what they have created with other users, as well as discovering composites that other users have created. Neither testing nor sharing was identified in the preliminary requirements capture.

R1.:

The tool should allow users to share services.

The tool should allow users to share services that are created using the tool with other users of the tool.

Rationale: once a user has created and used a composite, they might want to share it with others.

Category: Functional—Management—Monitoring and Adaptation

Criticality: high

Risk: low

Trigger: the user indicates they want to share the composite.

Preconditions: there is a composite to share.

Post-conditions: the composite is shared to a shared composite repository.

Failure effects: the composite is not shared.

Associated requirements: R18 and R23

Source: [Sharing/publishing of composites]

COTS: Atooma, IFTTT, Yahoo! Pipes, AutomateIt, and On{X}

R2.:

The tool should allow users to test composites.

The tool should allow users to test their composites-in-progress while they are creating them.

Rationale: users need to ensure that composites they create function as intended.

Category: Functional—Verification

Criticality: critical

Risk: low

Trigger: user indicates that they want to test the composite.

Preconditions: the composite contains some components.

Post-conditions: the composite executes in its current state.

Failure effects: the component does not execute as intended

Associated requirements: R15–R17

Source: [Testing]

COTS: Yahoo! Pipes, Quartz Composer, Automator, Tasker, Yahoo! Pipes, and On{X}.

The third type of requirement identified from our study is those relating to concepts not found in current EUSC tools but present in other, linked domains. User ratings were suggested as being useful for users to determine the quality of either components or composites that they can discover: “But obviously having ratings for them all would be quite cool too”—P1. “But then ratings would be more important for composites”—P5.

R3.:

The tool should allow users to rate services.

Users should be able to rate services to convey their opinion on the quality of the service to other users of the tool.

Rationale: users should be able to provide feedback to the creators of components and composites.

Category: Functional—Management—Monitoring and Adaptation

Criticality: medium

Risk: low

Trigger: the user indicates they want to rate the service.

Preconditions: there is a service to be rated.

Post-conditions: the service’s current rating gets aggregated with the new rating.

Failure effects: the rating is not applied.

Associated requirements: R25

Source: [User ratings]

COTS: AutomateIt, IFTTT, Yahoo! Pipes, and On{X}

The final type of requirement that we will consider is new concepts that the participant would be unlikely to have encountered in this context. The first of these is Automatic composition identification: “It would be quite cool for it to be able to identify things for you that you might not think about automating. Like for examples if it watched things you do and suggested compositions for you”—P5. The second is Infinite composition: “You might want to use them again”—P3; “And then build them up as well if you can”—P8. This yielded two further potential requirements for such a system:

R4.:

Potential compositions could be identified automatically.

The tool should be able to monitor the activities of the user and identify tasks that they perform regularly that could be adapted to form a composite.

Rationale: if the tool were able to automatically identify potential compositions, it would reduce user burden in deciding what compositions to create.

Category: Functional—Specification

Criticality: low

Risk: high

Trigger: the user performing a manual task repeatedly

Preconditions: the tool is installed on the user’s device

Post-conditions: a composite is created that performs the repetitive task

Failure effects: the composite is not created

Associated requirements: none

Source: [Automatic composition identification]

R5.:

Composition should be “infinite”.

The tool should allow users to take composites that have been created using the tool and use them as components to be composed in a new composite.

Rationale: composition is a considerably more powerful concept if the user can reuse composites they have created in new compositions.

Category: Structural—Composition structure

Criticality: medium

Risk: low

Associated requirements: none

Source: [Infinite composition]

The categorization we use in this section also highlights the inverse of the first kind of requirement we identified: requirements identified in prior work that were not validated by our work. The only example of this type of requirement relates to the security of services. This represents a deficiency in our requirements and is discussed in the Requirements Evaluation section.

Following the analysis, there were 7 codes for which we were unable to generate requirements. Codes were unsorted for one of two reasons: they were a generic reference to an aspect of composition, such as the user liking or disliking something; or they were too specific, e.g. the participant commented on the function of a particular component.

6 Requirements

In this section, we describe a subsection of the requirements that we gathered, based on the codes that were gathered from the requirements engineering sessions. Following this, we present an evaluation of the set of requirements as a whole, based on completeness, consistency, and correctness.

We gathered a total of 139 requirements, which, like the preliminary requirements, needed to be organized into groups. First, we separated the requirements into functional and non-functional requirements [13]. Further categorization of requirements was performed based on the model-based validation process which is discussed in the Requirements Evaluation section.

A number of the requirements gathered could be used in the design of an EUSC tool in their current state. There are also a number that require further investigation before they could be used in the specification of an EUSC tool. We will discuss three of the codes on which these requirements were based, all of which relate to some aspect of automation in the tool: Automatic composition identification, Automatic description generation, and Automatic composition generation from description.

  • Automatic composition identification—Participants suggested this code as a feature where the tool would “watch” the tasks that the user performs and is able to create a composite service to perform a task that the tool “saw”. While this would be an interesting requirement for an EUSC tool, high levels of automation in composition is something that generally is not recommended [27].

  • Automatic description generation means that the tool is able to automatically generate a description for a composite based on the components that are in the composition. A number of current tools allow the user to enter their own description for composites that they create, but none are able to automate this process.

  • Automatic composition generation from description is effectively the inverse of Automatic description generation in that the tool should be able to automatically create a composite based on a description of the required composite. Hence, it would first be necessary to enable automatic description generation before tools would be able to utilize this method of creating a composite.

6.1 Requirements evaluation

To demonstrate their applicability and usefulness, our requirements must first be evaluated. There are three main properties of requirements against which we can evaluate them: completeness, correctness, and consistency [28].

6.1.1 Completeness

Completeness is an important property of the requirements for a system given that incompleteness of requirements has been identified as one of the most common causes of system failures and accidents [28]. It is also one of the more difficult problems to detect in a requirements specification [28].

Ensuring requirements completeness can be done in a number of ways including: model-based evaluation, individual evaluation, requirements metadata, comparison with repositories, and creation of a specification document [29]. Given the resources available to us, we chose to individually evaluate each requirement, as well as evaluating both our functional and non-functional requirements against a model. Model-based evaluation also provided a useful categorization for our requirements.

There are a number of types of model against which functional requirements can be evaluated, including formal models, event models, process models, data models, etc. [29]. In the Background section of this paper, we identified a process model based on a combination of two suggested EUSC life cycles [6, 7] that could be used to evaluate our functional requirements for completeness. We categorized our functional requirements against the stages of this model, identifying the number of requirements present at each stage. Figure 4 shows the breakdown of requirements within each of these stages.

Fig. 4
figure 4

Distribution of requirements across the stages of the life cycle model

All stages of the model contain requirements, although the distribution is far from uniform. We suggest that this is based on the user’s perception of and involvement in each of the stages. That is, the specification/request stage contained few requirements because of the difficulty of separating specification/request from discovery—it is hard to imagine how the user would specify to the EUSC tool what she wanted without discovering components to do so. Deployment and annotation were another stage with few requirements, and we believe that this is due to: (1) this is a stage that would have little involvement from the user and (2) it was not the subject of any of the probe questions that were part of our sessions. So, the two requirements were both elicited from unprompted comments made by participants.

The model against which we evaluated our non-functional requirements is the quality model presented in ISO/IEC 25010 [4]. This model contains 12 categories, each with a number of sub-categories: functional suitability, reliability, performance efficiency operability, security, compatibility, maintainability, transferability, usability, flexibility, and safety. When we applied this model to our non-functional requirements, we found that the vast majority (76) were classified within the category of operability, 6 within maintainability, and none in any of the other categories. A further 12 did not fit within any of these categories. The breakdown of requirements in the sub-categories of operability is shown in Fig. 5. Within the maintainability category, 5 requirements were associated with changeability and one with modularity.

Fig. 5
figure 5

Distribution of requirements within the operability category

We believe that our participants identified a large number of requirements within operability because this is a category that is associated with the use of the tool, whereas a number of the quality aspects were ones which the user may not associate with their use of a piece of software—such as transferability or safety. Furthermore, none of the probe questions in our sessions referred to quality aspects, which explains the very uneven distribution of requirements across these categories. The majority of the requirements within appropriateness recognizability related mainly to the attributes of services that the user could use to recognize whether they were appropriate to their needs or not.

The remaining non-functional requirements that covered topics not relating to quality were grouped into a further three categories: representation/UI (4), interaction (1), and architectural requirements (7).

Following the assessment of our requirements against the models, we evaluated each requirement individually against a number of properties as well as metadata. For each requirement, we identified the following: rationale, source, category, criticality, risk, prior identification of that requirement, associated requirements, software that implements that requirement, as well as any clashes or conflicts with other requirements. Further, for each of the functional requirements, we identified the following: trigger event, preconditions, post-conditions, and failure effects. A full list of the requirements with these properties included can be found in “Appendix 2”.

6.1.2 Consistency

Consistency of requirements relates to conflicts between requirements. In particular, they should not contradict one another [28]. It can also refer to consistent use of terminology.

One conflict we identified was in the requirements relating to the grouping of services within the tool. The potential conflict was between one requirement stating that services should not be grouped (R5.8), and several others saying that they should and specifying how they should be grouped (R5, R5.1–R5.7). To address this potential conflict, we took a different view on the requirement that stated that grouping was not required. A number of requirements relating to grouping of services or components were focused on metrics that could be used to group the services. Thus, the tool could apply a view on the components where they were not organized into groups and were instead presented as a single list, and the potential conflict is resolved. This solution also validates one of our other requirements, stating that grouping should be customizable (R60).

The amount of freedom or restriction within the composition process was another area in which we identified conflict. Specifically, one requirement said that composition should be free and unrestricted (R35), and another said that the process should be restricted (R59). The solution to this conflict was identified in another requirement relating to providing templates for composition (R34): multiple templates could be provided, some which restrict the possibilities of composition, and others that allow it to be completely “free”. This solution also addresses another potential conflict, identified where composition should be both simple (R50) but at the same time comprehensive, complex and hence more powerful (R11), which can also be facilitated by providing different templates.

6.1.3 Correctness

Correctness of requirements has been defined as the interplay between completeness and consistency [28], although practically, correctness of requirements relates to the mapping of the requirements to the actual needs of the users of the final system [28]. Since we do not have a single business customer, and the preferences and backgrounds of our potential users are likely to be wide and varied, it is difficult definitively to say that all requirements are “correct” for all potential users.

Given that our requirements were elicited after the sessions, we were unable to evaluate the correctness of our requirements with these participants. We also felt that given the varying opinions of these participants, we would be unlikely to get a definitive answer. Instead, we first identified where requirements had been identified previously: either in other sets of requirements for EUSC tools, or implemented in EUSC tools themselves. Roughly half of our requirements were identified as being correct in this way.

We used peer review to evaluate the correctness of the remaining requirements [30]. Thus, we invited researchers knowledgeable in the domain of SC and EUSC to review the requirements individually and as a set, and they found that none of the requirements was incorrect. We have also shown that our requirements are functionally complete and consistent, providing further corroboration of the overall correctness of the requirements.

6.1.4 Requirements validity

We have been able to demonstrate that our requirements are consistent, functionally complete, and correct. The overall completeness of the non-functional requirements was reduced by our use of end-users as the source of these requirements, as they did not consider aspects that a developer or business customer might identify. This is of course unavoidable when end-users are used as the source of the requirements.

Our requirements have not yet been validated directly by, for example, being used to influence the design of an EUSC tool. However, we can provide arguments for the validity of our requirements set based on comparisons with prior work, and the method used to gather them. The two examples of prior work aimed at gathering requirements for EUSC tools yielded two distinct sets of requirements: one which is high level and looking for requirements for SC in general [3], and one very specific, focusing on a particular aspect of composition, i.e. flow [2]. Our requirements contain both high level and specific requirements, including an overlap with the flow-based requirements from prior work. Some requirements were more present in some areas rather than others because of the use of probe questions as part of our sessions, and the limitations of this will be discussed in the next section. We can see from their abundance that this did not inhibit the generation of more general, high-level requirements. Furthermore, the topics of the probe questions were derived from prior work across a wide range of EUSC literature and tools.

7 Limitations and future work

The main limitation in our set of requirements is the lack of coverage of non-functional requirements. There are a number of categories that are not covered by our requirements, and the requirements are not distributed uniformly across those categories that are covered. This limitation is caused by a combination of factors. First, our decision to have requirements motivated by potential end-users meant that they were less likely to identify the various types of non-functional quality-based requirements. Secondly, our probe questions and prompts focused on aspects that the user would interact with directly, rather than aspects such as maintainability, security, and safety.

To address this limitation in future work, we suggest that a section of each session be explicitly dedicated to the discussion of quality-based non-functional requirements of the application being developed. This would help ensure that participants are prompted about the topics for which we have not been able to gather any requirements and should present a more even distribution of requirements across these categories.

Another limitation of the requirements is the disparity between the relative maturity of the requirements. Some are usable immediately and are applicable across all types of EUSC tools. Some present various options that the designer could choose between, or choose all of. There are also a number of requirements that require further investigation before they could be used to motivate the design of an EUSC tool. Thus, further validation of our requirements is an important next step. Validating these requirements could take a number of different avenues: EUSC tool creation, investigation of particular concepts (e.g. automatic description generation for composites), or further investigation of the design space for EUSC tools. Given the apparent popularity of design space research in EUSC and mashup tools [25, 3133], this seems to be a promising avenue for requirements validation.

Determining the relative priority of the requirements we gathered is a potential next step. This is particularly important given the size of the set of requirements derived, as well as the disparity in how the requirements could be used. We do not believe that basing priority on the relative popularity of the codes used to derive them is a sufficiently robust method for establishing priority. However, determining relative priorities could be part of our future work on the design spaces of EUSC tools.

After relative priorities have been determined for the requirements, our future work could move on to using these requirements to generate potential designs for EUSC tools, as well as being the basis upon which an EUSC tool can be built.

When detailing the analysis and requirements gathering process earlier in this paper, we identified several groups into which requirements could fit, based on our prior requirements (“Appendix 1”) and the design of current EUSC tools:

  1. 1.

    Those found in prior EUSC requirements gathering studies that our work has validated.

  2. 2.

    Those found in the design space for EUSC tools but not in prior EUSC requirements gathering studies.

  3. 3.

    Those not found within the design space for EUSC tools, but are found in related domains.

  4. 4.

    Those not within the design space for current EUSC tools or related domains.

There are two groups that we have not yet considered:

  • Those present in the design space for EUSC tools that were not identified in our study.

  • Those present in prior EUSC requirements gathering studies that were not identified in ours.

The former category is unsurprising, as the design space for EUSC tools could contain a theoretically infinite number of design decisions and options and hence a theoretically infinite number of potential requirements [34]. We can also identify one example of the former: security. Security of component services was identified by Namoun et al. [3], but was not identified as being a category in its own right, and hence was not the subject of a question in our study. Furthermore, none of the composition tasks within our study sessions required the user to consider their personal information. It is likely that if one of the tasks had required their personal information (or some other sensitive information), then participants would have been more aware of security as a requirement.

The omission of security of a requirement highlights a limitation in the method used to gather requirements: priming of participants on selected topics. Participants were much more likely to identify requirements for those topics that had already been identified by the requirements engineer and subsequently used as probe questions. This problem could be mitigated by generating new probe questions from the subsequently identified categories and using these probe questions for later sessions. However, this approach would also run the risk of overloading participants in the later sessions.

Within the SCRAM sessions, we were not able to perform precisely the same composition task using each of the EUSC tools due to their different domains and contexts of use, although the same tasks were performed across all sessions. This meant that as well as the design of the tool varying, the task being performed also varied somewhat, although the nature of the task—SC—remained constant across all tools within the session. We felt it more important to introduce participants to the diversity of SC and hence mitigate the bias towards our prototypical demonstrator rather than to perform precisely the same tasks across each tool.

Our method was designed to prime participants on certain topics within EUSC to ensure that they were familiar with them and could provide requirements in those areas. This priming was reflected in the analysis by grouping the categories into two: initial (primed topics) and subsequent (un-primed). On average, 85.1 % of identified codes represented initial codes (range 78.9–88.3 %, σ = 2.8 %). Given the priming deliberately built into the method, it is very likely that using different participants would not greatly affect this, and a majority of participants would still refer to topics within initial categories.

Our method was very demanding both in terms of work required in the sessions and the analysis of the data to gather the requirements. After 10 participants, we had enough data from which to derive a large, robust set of requirements. Restricting the sessions to 10 participants minimized the overlap of content between participants, and also the work required to analyse these data and produce requirements. In our study, participant selection required only that participants were familiar with the platforms upon which the EUSC tools are running, as the sessions provided an introduction to EUSC itself. This approach could work similarly in other domains, as outlined in the following section.

7.1 Method generalizability

The primary focus of our work is to gather requirements for EUSC tools, and to do this we extended and made a number of changes to an existing requirements gathering method: SCRAM. Our proposed method can easily be adapted for use in other domains by making a number of small changes. An overview of the stages within the method is given in Fig. 6.

Fig. 6
figure 6

Method overview

To adapt the method to a new domain, the preliminary work would need to be applied in the new domain in order to gather requirements for design and implement a prototypical demonstrator in that domain. Our initial requirements capture was based on prior studies on gathering requirements, and a review of applications that have already been created within the domain. This part of the method could therefore be carried out similarly in any domain for which prior requirements gathering has been done or in which existing tools can be profiled. For domains in which such resources are not available to the requirements engineers, we suggest falling back on the recommendations made by Sutcliffe and Ryan in SCRAM: performing conventional interviews and fact-finding [16]. If end-users are used as interviewees in the requirements gathering interview sessions, then this fact-finding could easily take the form of questionnaires which can be distributed widely, for example by on the Web, rather than one-on-one interviews.

Given an initial set of requirements derived through the process above, the next stage, i.e. design and implementation of the prototypical demonstrator, could be carried out using standard techniques and tools.

Once the demonstrator has been implemented, the next step is to research topics in the domain that can be used as probe questions and hence as initial categories for the content analysis of the sessions’ outputs. Concurrently, other tools in the domain should be identified. From this research and tool identification, a scripted task could be created to present the various features of the demonstrator, other available tools, along with probe questions. The sessions themselves would then run in the same way as reported here, while being recorded.

After the sessions, transcription and content analysis could proceed in exactly the same way as described here by performing directed content analysis using the initial categories identified in the background research. Following the transcription and analysis, requirements elicitation could then proceed based on the codes identified.

8 Conclusion

The work reported here has two main contributions:

  1. 1.

    The creation of a comprehensive set of 139 requirements for EUSC tools, which we have demonstrated to be valid.

  2. 2.

    A thorough, generalizable, and repeatable method used to generate these requirements.

The set of requirements gathered here is a larger and more comprehensive set than any gathered in prior work covering a broad range of topics within EUSC. We evaluated our requirements for completeness, consistency, and correctness. Previous approaches covered only one topic within EUSC: flow. Our requirements covered a number of topics, necessitating further categorization of our functional requirements based on aspects of the EUSC life cycle. The non-functional category covered aspects of quality-based requirements such as operability and maintainability but lacked discussion of aspects such as security and performance. A full, categorized list of the requirements gathered can be found in the Appendices.

One contribution of these requirements is to validate some of those gathered in prior EUSC requirements gathering approaches. The validity of these requirements is evidenced by the breadth of topics they cover, as well as the number of requirements we were able to gather within these topics. This is particularly useful when we consider the small number of requirements that have been gathered previously. More evidence of the validity of our requirements is the robust method used, which is based on an already-established method for requirements gathering.

Reflecting further on the methodology, we have suggested how it could be applied in other domains with relatively little modification. We believe the adaptations and extensions made to SCRAM were positive, in that we were able to adapt the basic method to cater for gathering requirements from end-users, as well as incorporating a mechanism for eliciting requirements from the output of each session without imposing a burden on participants. We made modifications to every stage of SCRAM as expressed in its initial specification [16, 17] as well as providing insight into how requirements can be elicited from the output of the interview sessions, something which is not part of the original method.

Of the requirements we present, a number are directly applicable to EUSC immediately: in their current state, they can be applied to the design process for future tools, evidenced by a number of our requirements being used in currently available EUSC tools. No single current EUSC tools implements more than half of the requirements generated (the majority implement considerably fewer than half), suggesting that tools implementing these requirements could significantly improve current designs. The requirements themselves are demonstrably consistent, functionally complete, correct, and can be used by designers of EUSC tools to inform and inspire the design of future EUSC tools, and in refining those that already exist.