Skip to main content
Top

2024 | Book

Product-Focused Software Process Improvement

24th International Conference, PROFES 2023, Dornbirn, Austria, December 10–13, 2023, Proceedings, Part I

Editors: Regine Kadgien, Andreas Jedlitschka, Andrea Janes, Valentina Lenarduzzi, Xiaozhou Li

Publisher: Springer Nature Switzerland

Book Series : Lecture Notes in Computer Science

insite
SEARCH

About this book

This book constitutes the refereed proceedings of the 24th International Conference on Product-Focused Software Process Improvement, PROFES 2023, which took place in Dornbirn, Austria, in December 2023.
The 21 full technical papers, 8 short papers, and 1 poster paper presented in this volume were carefully reviewed and selected from 82 submissions. The book also contains one tutorial paper, 12 and workshop papers and 3 doctoral symposium papers.
The contributions were organized in topical sections as follows:
Part I: Software development and project management; machine learning and data science; software analysis and tools; software testing and quality assurance; security, vulnerabilities, and human factors;
Part II: Posters; Tutorials; 2nd Workshop on Computational Intelligence and Software Engineering (CISE 2023); 2nd Workshop on Engineering Processes and Practices for Quantum Software (PPQS’ 23); doctoral symposium.

Table of Contents

Frontmatter

Software Development and Project Management

Frontmatter
Virtual Reality Collaboration Platform for Agile Software Development

Nowadays, most software teams use Scrum as their software development process framework. Scrum highly values collaboration and communication inside the team and aims to make them more flexible to spontaneous changes. However, due to the rise of working from home, many developers experienced a significant decrease in communication, social interactions, and a general feeling of social connectedness with their colleagues, which might impede the effectiveness of Scrum teams. To overcome these issues, we present a VR collaboration platform for Scrum meetings, called Virtual Reality-based Agile Collaboration Platform (VRACP). VRACP provides the visualization of and interaction with Scrum artifacts inside a realistic virtual office, the integration and synchronization of external data sources, remote collaboration, and human-like user representations. To evaluate whether VR can increase social connectedness in agile software development teams, we conducted a user study where our solution was compared to common web/desktop applications for Scrum meetings. The results suggest that although efficiency and effectiveness were reduced, it could indeed increase the feeling of being together, collaborating more naturally, and having more fun and motivation.

Enes Yigitbas, Iwo Witalinski, Sebastian Gottschalk, Gregor Engels
Effects of Ways of Working on Changes to Understanding of Benefits – Comparing Projects and Continuous Product Development

The practices of benefits management are designed to help development initiatives to identify and realize the benefits of a system under development. Although several benefits management frameworks and guidelines exist, practitioners experience challenges in applying the practices. In particular, practitioners experience challenges in that their understanding of what benefits the system should enable and how the benefits should be realized, changes during the course of a development effort. Since such benefits understanding is affected by experiences with the system in use, we conducted a survey to investigate if such changed understanding is affected by whether development is organized in projects (whose organization terminates after main deployment) or as continuous product development (whose organization persists throughout the lifecycle of the system). We find that (1) there is no difference in the occurrence of changes in understanding between the two, but that (2) practitioners in projects think that changed understanding could have been obtained earlier. There is (3) no difference in how one takes advantage of changes in benefits understanding, but (4) practitioners in continuous product development think that the use of changes in benefits understanding is more appropriate than do practitioners in projects. We also look at process models, where we do not find that agile facilitates early changes to understanding. We conclude that continuous product development seems to cater for changed benefits understanding better, but since the way one organizes work will vary depending on a host of factors, specific practices for handling changes to benefits understanding appropriately should be developed that span different ways of organizing work.

Sinan Sigurd Tanilkan, Jo Erskine Hannay
To Memorize or to Document: A Survey of Developers’ Views on Knowledge Availability

When developing, maintaining, or evolving a system, developers need different types of knowledge (e.g., domain, processes, architecture). They may have memorized (but potentially not documented) the knowledge they perceive important, while they need to recover knowledge that they could not memorize. Previous research has focused on knowledge recovery, but not on what knowledge developers consider important to memorize or document. We address this gap by reporting a survey among 37 participants in which we investigated developers’ perspectives on different types of knowledge. Our results indicate that the developers consider certain types of knowledge more important than others, particularly with respect to memorizing them—while all of them should be documented, using specific means. Such insights help researchers and practitioners understand developers’ knowledge and documentation needs within processes, thereby guiding practices and new techniques.

Jacob Krüger, Regina Hebig
Facilitating Security Champions in Software Projects - An Experience Report from Visma

The role of security practices is increasingly recognized in fast-paced software development paradigms in contributing to overall software security. Security champions have emerged as a promising role in addressing the shortage of explicit security activities within software teams. Despite the growing awareness of general security practices, there remains limited knowledge regarding security champions, including their establishment, effectiveness, challenges, and best practices. This paper aims to bridge this gap by presenting insights from a survey of 73 security champions and 11 interviews conducted within a large Norwegian software house. Through this study, we explore the diverse activities undertaken by security champions, highlighting notable differences in motivations and task descriptions between voluntary and assigned champions. We also reported challenges with onboarding, communication, and training security champions and how they can be better supported in the organization. Our insight can be relevant for similar software houses in establishing, implementing, and improving their strategic security programs.

Anh Nguyen-Duc, Daniela Soares Cruzes, Hege Aalvik, Monica Iovan
Benefits and Challenges of an Internal Corporate Accelerator in a Software Company: An Action-Research Study

To maintain a competitive advantage and adapt to the rapid changes in both the market and technology, organizations need to continuously innovate. As a result, internal corporate accelerators have been implemented as a means to internalize external innovation and promote corporate innovation. However, research on internal corporate accelerators is still limited, and there is a need for a more detailed analysis of both the positive and negative consequences of their implementation. In this paper, we employ the action research methodology to define and implement an internal corporate accelerator within a Brazilian software development company. We describe the accelerator phases, selection criteria, and provided services. We also present the advantages and challenges of implementing this accelerator. The benefits encompass the stimulation of creativity and the evolution of knowledge, while the challenges are tied to participants’ time constraints and the limitation of virtual communication.

Vanessa Lopes Abreu, Anderson Jorge Serra Costa, André Coelho Pinheiro, Cleidson R. B. de Souza
A Process for Scenario Prioritization and Selection in Simulation-Based Safety Testing of Automated Driving Systems

Simulation-based safety testing of Automated Driving Systems (ADS) is a cost-effective and safe alternative to field tests. However, it is practically impossible to test every scenario using a simulator. We propose a process for prioritizing and selecting scenarios from an existing list of scenarios. The aim is to refine the scope of tested scenarios and focus on the most representative and critical ones for evaluating ADS safety. As a proof-of-concept, we apply our process to two pre-existing scenario catalogs provided by the Land Transport Authority of Singapore and the Department of Transportation. After applying our process, we prioritized and selected six scenario groups containing 51 scenarios for testing ADS in the CARLA simulator.

Fauzia Khan, Hina Anwar, Dietmar Pfahl
The Journey to Serverless Migration: An Empirical Analysis of Intentions, Strategies, and Challenges

Serverless is an emerging cloud computing paradigm that facilitates developers to focus solely on the application logic rather than provisioning and managing the underlying infrastructure. The inherent characteristics such as scalability, flexibility, and cost efficiency of serverless computing, attracted many companies to migrate their legacy applications toward this paradigm. However, the stateless nature of serverless requires careful migration planning, consideration of its subsequent implications, and potential challenges. To this end, this study investigates the intentions, strategies, and technical and organizational challenges while migrating to a serverless architecture. We investigated the migration processes of 11 systems across diverse domains by conducting 15 in-depth interviews with professionals from 11 organizations. We also presented a detailed discussion of each migration case. Our findings reveal that large enterprises primarily migrate to enhance scalability and operational efficiency, while smaller organizations intend to reduce the cost. Furthermore, organizations use a domain-driven design approach to identify the use case and gradually migrate to serverless using a strangler pattern. However, migration encounters technical challenges i.e., testing event-driven architecture, integrating with the legacy system, lack of standardization, and organizational challenges i.e., mindset change and hiring skilled serverless developers as a prominent. The findings of this study provide a comprehensive understanding that can guide future implementations and advancements in the context of serverless migration.

Muhammad Hamza, Muhammad Azeem Akbar, Kari Smolander
On the Role of Font Formats in Building Efficient Web Applications

The success of a web application is closely linked to its performance, which positively impacts user satisfaction and contributes to energy-saving efforts. Among the various optimization techniques, one specific subject focuses on improving the utilization of web fonts. This study investigates the impact of different font formats on client-side resource consumption, such as CPU, memory, load time, and energy. In a controlled experiment, we evaluate performance metrics using the four font formats: OTF, TTF, WOFF, and WOFF2. The results of the study show that there are significant differences between all pair-wise format comparisons regarding all performance metrics. Overall, WOFF2 performs best, except in terms of memory allocation. Through the study and examination of literature, this research contributes (1) an overview of methodologies to enhance web performance through font utilization, (2) a specific exploration of the four prevalent font formats in an experimental setup, and (3) practical recommendations for scientific professionals and practitioners.

Benedikt Dornauer, Wolfgang Vigl, Michael Felderer
Web Image Formats: Assessment of Their Real-World-Usage and Performance Across Popular Web Browsers

In 2023, images on the web make up 41% of transmitted data, significantly impacting the performance of web apps. Fortunately, image formats like WEBP and AVIF could offer advanced compression and faster page loading but may face performance disparities across browsers. Therefore, we conducted performance evaluations on five major browsers - Chrome, Edge, Safari, Opera, and Firefox - while comparing four image formats. The results indicate that the newer formats exhibited notable performance enhancements across all browsers, leading to shorter loading times. Compared to the compressed JPEG format, WEBP and AVIF improved the Page Load Time by 21% and 15%, respectively. However, web scraping revealed that JPEG and PNG still dominate web image choices, with WEBP at 4% as the most used new format. Through the web scraping and web performance evaluation, this research serves to (1) explore image format preferences in web applications and analyze distribution and characteristics across frequently-visited sites in 2023 and (2) assess the performance impact of distinct web image formats on application load times across popular web browsers.

Benedikt Dornauer, Michael Felderer

Machine Learning and Data Science

Frontmatter
Operationalizing Assurance Cases for Data Scientists: A Showcase of Concepts and Tooling in the Context of Test Data Quality for Machine Learning

Assurance Cases (ACs) are an established approach in safety engineering to argue quality claims in a structured way. In the context of quality assurance for Machine Learning (ML)-based software components, ACs are also being discussed and appear promising. Tools for operationalizing ACs do exist, yet mainly focus on supporting safety engineers on the system level. However, assuring the quality of an ML component within the system is commonly the responsibility of data scientists, who are usually less familiar with these tools. To address this gap, we propose a framework to support the operationalization of ACs for ML components based on technologies that data scientists use on a daily basis: Python and Jupyter Notebook. Our aim is to make the process of creating ML-related evidence in ACs more effective. Results from the application of the framework, documented through notebooks, can be integrated into existing AC tools. We illustrate the application of the framework on an example excerpt concerned with the quality of the test data.

Lisa Jöckel, Michael Kläs, Janek Groß, Pascal Gerber, Markus Scholz, Jonathan Eberle, Marc Teschner, Daniel Seifert, Richard Hawkins, John Molloy, Jens Ottnad
Status Quo and Problems of Requirements Engineering for Machine Learning: Results from an International Survey

Systems that use Machine Learning (ML) have become commonplace for companies that want to improve their products and processes. Literature suggests that Requirements Engineering (RE) can help address many problems when engineering ML-enabled systems. However, the state of empirical evidence on how RE is applied in practice in the context of ML-enabled systems is mainly dominated by isolated case studies with limited generalizability. We conducted an international survey to gather practitioner insights into the status quo and problems of RE in ML-enabled systems. We gathered 188 complete responses from 25 countries. We conducted quantitative statistical analyses on contemporary practices using bootstrapping with confidence intervals and qualitative analyses on the reported problems involving open and axial coding procedures. We found significant differences in RE practices within ML projects. For instance, (i) RE-related activities are mostly conducted by project leaders and data scientists, (ii) the prevalent requirements documentation format concerns interactive Notebooks, (iii) the main focus of non-functional requirements includes data quality, model reliability, and model explainability, and (iv) main challenges include managing customer expectations and aligning requirements with data. The qualitative analyses revealed that practitioners face problems related to lack of business domain understanding, unclear goals and requirements, low customer engagement, and communication issues. These results help to provide a better understanding of the adopted practices and of which problems exist in practical environments. We put forward the need to adapt further and disseminate RE-related practices for engineering ML-enabled systems.

Antonio Pedro Santos Alves, Marcos Kalinowski, Görkem Giray, Daniel Mendez, Niklas Lavesson, Kelly Azevedo, Hugo Villamizar, Tatiana Escovedo, Helio Lopes, Stefan Biffl, Jürgen Musil, Michael Felderer, Stefan Wagner, Teresa Baldassarre, Tony Gorschek
A Stochastic Approach Based on Rational Decision-Making for Analyzing Software Engineering Project Status

This study presents a novel approach to project status prediction in software engineering, based on unobservable states of decision-making processes, utilizing Hidden Markov Models (HMMs). By establishing HMM structures and leveraging the Rational Decision Making model (RDM), we encoded underlying project conditions; observed project data from a software engineering organization were utilized to estimate model parameters via the Baum-Welch algorithm. The developed HMMs, four project-specific models, were subsequently tested with empirical data, demonstrating their predictive potential. However, a generalized, aggregated model did not show any sufficient accuracy. Model development and experiments were made in Python. Our approach presents preliminary work and a pathway for understanding and forecasting project dynamics in software development environments.

Hannes Salin
CAIS-DMA: A Decision-Making Assistant for Collaborative AI Systems

A Collaborative Artificial Intelligence System (CAIS) is a cyber-physical system that learns actions in collaboration with humans in a shared environment to achieve a common goal. In particular, a CAIS is equipped with an AI model to support the decision-making process of this collaboration. When an event degrades the performance of CAIS (i.e., a disruptive event), this decision-making process may be hampered or even stopped. Thus, it is of paramount importance to monitor the learning of the AI model, and eventually support its decision-making process in such circumstances. This paper introduces a new methodology to automatically support the decision-making process in CAIS when the system experiences performance degradation after a disruptive event. To this aim, we develop a framework that consists of three components: one manages or simulates CAIS’s environment and disruptive events, the second automates the decision-making process, and the third provides a visual analysis of CAIS behavior. Overall, our framework automatically monitors the decision-making process, intervenes whenever a performance degradation occurs, and recommends the next action. We demonstrate our framework by implementing an example with a real-world collaborative robot, where the framework recommends the next action that balances between minimizing the recovery time (i.e., resilience), and minimizing the energy adverse effects (i.e., greenness).

Diaeddin Rimawi, Antonio Liotta, Marco Todescato, Barbara Russo
Comparing Machine Learning Algorithms for Medical Time-Series Data

Medical software becomes increasingly advanced and more mission-critical. Machine learning is one of the methods which is used in medical software to tackle a diversity of patient data, problems with data quality and providing the ability to process increasingly large amounts of data from medical procedures. However, one of the challenges is the lack of comparisons of algorithms in-situ, during medical procedures. This paper explores the potential of performing real-time comparisons of algorithms for early stroke detection during carotid endarterectomy. SimSAX, DTW (dynamic time warping), and Pearson correlation were compared based on the real-time data against medical specialists in clinical evaluations. The analysis confirmed the general feasibility of the approach, though the algorithms were inadequate in extracting significant information from specific signals. Interviews with physicians revealed a positive outlook toward the system’s potential, advocating for further investigation. Despite their limitations, the algorithms and the prototype application provides a promising foundation for future development of new methods for detecting stroke.

Alex Helmersson, Faton Hoti, Sebastian Levander, Aliasgar Shereef, Emil Svensson, Ali El-Merhi, Richard Vithal, Jaquette Liljencrantz, Linda Block, Helena Odenstedt Hergès, Miroslaw Staron
What Data Scientists (Care To) Recall

To maintain and evolve a software system, developers need to gain new or recover lost knowledge about that system. Thus, program comprehension is a crucial activity in software development and maintenance processes. We know from previous work that developers prioritize what information they want to remember about a system based on the perceived importance of that information. However, AI-based software systems as a special case are not developed by software developers alone, but also by data scientists who deal with other concepts and have a different educational background than most developers. In this paper, we study what information data scientists (aim to) recall about their systems. For this purpose, we replicated our previous work by interviewing 11 data scientists, investigating the knowledge they consider important to remember, and whether they can remember parts of their systems correctly. Our results suggest that data scientists consider knowledge about the AI-project settings to be the most important to remember and that they perform best when remembering knowledge they consider important. Contrary to software developers, data scientists’ self-assessments increase when reflecting on their systems. Our findings indicate similarities and differences between developers and data scientists that are important for managing the processes surrounding a system.

Samar Saeed, Shahrzad Sheikholeslami, Jacob Krüger, Regina Hebig

Software Analysis and Tools

Frontmatter
Using AI-Based Code Completion for Domain-Specific Languages

Code completion is a very important feature of modern integrated development environments. Research has been done for years to improve code completion systems for general-purpose languages. However, only little literature can be found for (AI-based) code completion for domain specific languages (DSLs). A DSL is a special-purpose programming language tailored for a specific application domain. In this paper, we investigate whether AI-based state-of-the-art code completion approaches can also be applied for DSLs. This is demonstrated using the domain-specific language TTI (Thermal Text Input). TTI is used for power transformer design specification in an industrial context, where an existing code completion shall be replaced by an advanced machine learning approach. For this purpose, implementations of two code completion systems are adapted to our needs. One of them shows very promising results and achieves a top-5 accuracy of 97%. To evaluate the practical applicability, the approach is integrated into an existing editor of a power transformer manufacturer.

Christina Piereder, Günter Fleck, Verena Geist, Michael Moser, Josef Pichler
Assessing IDEA Diagrams for Supporting Analysis of Capabilities and Issues in Technical Debt Management

Context. Technical debt management (TDM) comprises activities such as prevention, monitoring, and repayment. Current technical literature has identified, for each of these TDM activities, several applicable practices as well as practice avoidance reasons (PARs). This body of knowledge (practices and PARs) is available in the literature only in widely spread text and tables, and is not organized into artifacts, hindering the use of current knowledge on TDM. Previously, we organized these practices and PARs into IDEA (Impediments, Decision factors, Enabling practices, and Actions) diagrams. However, an empirical evaluation of these diagrams is still missing. Aims. To empirically assess the IDEA diagrams with respect to their ease of use, usefulness, potential future use, and support for TDM activities. Method. We conduct two complementary empirical studies. Firstly, we applied the technology acceptance model (TAM) with 72 participants in academic contexts. Afterwards, we interviewed 11 experienced software practitioners. Results. In the TAM study, 92% of the participants indicated that they could use the diagrams. Also, the diagrams were considered easy to learn and use. Through the interviews, participants indicated that the diagrams are easy to read and follow, can influence decisions on how to manage debt items, and could be used to support their daily activities. Conclusion. Both studies provided positive evidence that IDEA diagrams can be useful for supporting TDM activities.

Sávio Freire, Verusca Rocha, Manoel Mendonça, Clemente Izurieta, Carolyn Seaman, Rodrigo Spínola
Automatic Fixation of Decompilation Quirks Using Pre-trained Language Model

Decompiler is a system for recovering the original code from bytecode. A critical challenge in decompilers is that the decompiled code contains differences from the original code. These differences not only reduce the readability of the source code but may also change the program’s behavior. In this study, we propose a deep learning-based quirk fixation method that adopts grammatical error correction. One advantage of the proposed method is that it can be applied to any decompiler and programming language. Our experimental results show that the proposed method removes 55% of identifier quirks and 91% of structural quirks. In some cases, however, the proposed method injected a small amount of new quirks.

Ryunosuke Kaichi, Shinsuke Matsumoto, Shinji Kusumoto
Log Drift Impact on Online Anomaly Detection Workflows

Traditional rule-based approaches to system monitoring have many areas for improvement. Rules are time-consuming to maintain, and their ability to detect unforeseen future incidents is limited. Online log anomaly detection workflows have the potential to improve upon rule-based methods by providing fine-grained, automated detection of abnormal behavior. However, system and process logs are not static. Code and configuration changes may alter the sequences of log entries produced by these processes, impacting the models trained on their previous behavior. These changes result in false positive signals that can overwhelm production services engineers and drown out alerts for real issues. For this reason, log drift is a significant obstacle to utilizing online log anomaly detection approaches for monitoring in industrial settings. This study explores the different types of log drift and classifies them using a newly introduced taxonomy. It then evaluates the impact these types of drift have on online anomaly detection workflows. Several potential mitigation methods are presented and evaluated based on synthetic and real-world log data. Finally, possible directions for future research are provided and discussed.

Scott Lupton, Hironori Washizaki, Nobukazu Yoshioka, Yoshiaki Fukazawa
Leveraging Historical Data to Support User Story Estimation

Accurate and reliable effort and cost estimation are still challenging for agile teams in the industry. It is argued that leveraging historical data regarding the actual time spent on similar past projects could be very helpful to support such an activity before companies embark upon a new project. In this paper, we investigate to what extent user story information retrieved from past projects can help developers estimate the effort needed to develop new similar projects. In close collaboration with a software development company, we applied design science and action research principles to develop and evaluate a tool that employs Natural Language Processing (NLP) algorithms to find past similar user stories and retrieve the actual time spent on them. The tool was then used to estimate a real project that was about to start in the company. A focus group with a team of six developers was conducted to evaluate the tool’s efficacy in estimating similar projects. The results of the focus group with the developers revealed that the tool has the potential to complement the existing estimation process and help different interested parties in the company. Our results contribute both towards a new tool-supported approach to help user story estimation based on historical data and with our lessons learned on why, when, and where such a tool and the estimations provided may play a role in agile projects in the industry.

Aleksander G. Duszkiewicz, Jacob G. Sørensen, Niclas Johansen, Henry Edison, Thiago Rocha Silva
Design Patterns Understanding and Use in the Automotive Industry: An Interview Study

Automotive software is increasing in complexity, leading to new challenges for designers and developers. Design patterns, which offer reusable solutions to common design problems, are a potential way to deal with this complexity. Although design patterns have received much focus in academic publications, it is not clear how they are used in practice. This paper presents an interview-based study that explores the use of design patterns in the automotive industry. The study findings reveal how automotive practitioners view and use design patterns in their software designs. Our study revealed that industry experts have a view of design patterns which often differs from the academic views. They use design patterns in combination with architecture guidelines, principles, and frameworks. Instead of the academic focus on the design patterns, industry professionals focus on the design, architectural tactics, and standards. Such findings highlight the need for a more nuanced understanding of the concept and practical applications of design patterns within the context of industrial software engineering practices.

Sushant Kumar Pandey, Sivajeet Chand, Jennifer Horkoff, Miroslaw Staron

Software Testing and Quality Assurance

Frontmatter
An Experience in the Evaluation of Fault Prediction

Background. ROC (Receiver Operating Characteristic) curves are widely used to represent the performance (i.e., degree of correctness) of fault proneness models. AUC, the Area Under the ROC Curve is a quite popular performance metric, which summarizes into a single number the goodness of the predictions represented by the ROC curve. Alternative techniques have been proposed for evaluating the performance represented by a ROC curve: among these are RRA (Ratio of Relevant Areas) and $$\phi $$ ϕ (alias Matthews Correlation Coefficient).Objectives. In this paper, we aim at evaluating AUC as a performance metric, also with respect to alternative proposals.Method. We carry out an empirical study by replicating a previously published fault prediction study and measuring the performance of the obtained faultiness models using AUC, RRA, and a recently proposed way of relating a specific kind of ROC curves to $$\phi $$ ϕ , based on iso- $$\phi $$ ϕ ROC curves, i.e., ROC curves with constant $$\phi $$ ϕ . We take into account prevalence, i.e., the proportion of faulty modules in the dataset that is the object of predictions.Results. AUC appears to provide indications that are concordant with $$\phi $$ ϕ for fairly balanced datasets, while it is much more optimistic than $$\phi $$ ϕ for quite imbalanced datasets. RRA’s indications appear to be moderately affected by the degree of balance in a dataset. In addition, RRA appears to agree with $$\phi $$ ϕ .Conclusions. Based on the collected evidence, AUC does not seem to be suitable for evaluating the performance of fault proneness models when used with imbalanced datasets. In these cases, using RRA can be a better choice. At any rate, more research is needed to generalize these conclusions.

Luigi Lavazza, Sandro Morasca, Gabriele Rotoloni
Is It the Best Solution? Testing an Optimisation Algorithm with Metamorphic Testing

Optimisation algorithms play a vital role in solving complex real-world problems by iteratively comparing various solutions to find the optimal or the best solution. However, testing them poses challenges due to their “non-testable” nature, where a reliable test oracle is lacking. Traditional testing techniques may not directly address whether these algorithms yield the best solution. In this context, Metamorphic Testing (MT) emerges as a promising approach. MT leverages Metamorphic Relations (MRs) to indirectly test the System Under Test (SUT) by examining input-output pairs and revealing inconsistencies based on MRs. In this paper, we apply the MT approach to a black-box industrial optimisation algorithm and present our observations and findings. We identify successful aspects, challenges, and opportunities for further research. The findings from our study are expected to shed light on the practical feasibility of MT for testing optimisation algorithms. The paper provides a formal definition of MT, an overview of related work in optimisation algorithms, and a description of the industrial context, methodology, and results.

Alejandra Duque-Torres, Claus Klammer, Stefan Fischer, Dietmar Pfahl
Impacts of Program Structures on Code Coverage of Generated Test Suites

Unit testing is a part of the process of developing software. In unit testing, developers verify that programs properly work as developers intend. Creating a test suite for a unit test is very time-consuming. For this reason, research is being conducted to generate a test suite for unit testing automatically, and before now, some test generation tools have been released. However, test generation tools may not be able to generate a test suite that fully covers a test target. In our research, we investigate the causes of this problem by focusing on structures of test targets to improve test generation tools. As a result, we found four patterns as the causes of this problem and proposed subsequent research directions for each pattern to solve this problem.

Ryoga Watanabe, Yoshiki Higo, Shinji Kusumoto
Anomaly Detection Through Container Testing: A Survey of Company Practices

Background: Containers are a commonly used solution for deploying software applications. Therefore, container functionality and security is a concern of practitioners and researchers. Testing is essential to ensure the quality of the container environment component and the software product and plays a crucial role in using containers.Objective: In light of the increasing role of software containers and the lack of research on testing them, we study container testing practices. In this paper, we investigate the current approaches for testing containers. Moreover, we aim to identify areas for improvement and emphasize the importance of testing in securing the container environment and the final software product.Method: We conducted a survey to collect primary data from companies implementing container testing practices and the commonly used tools in container testing. There were 14 respondents from a total of 10 different companies with experience using containers and varying work responsibilities.Findings: The survey findings illustrate the significance of testing, the growing interest in and utilization of containers, and the emerging security and vulnerability concerns. The research reveals variations in testing approaches between companies and the lack of consensus on how testing should be carried out, with advancements primarily driven by industry practices rather than academic research.Conclusion: In this study, we show the importance of testing software containers. It lays out the current testing approaches, challenges, and the need for standardized container testing practices. We also provide recommendations on how to develop these practices further.

Salla Timonen, Maha Sroor, Rahul Mohanani, Tommi Mikkonen
The Effects of Soft Assertion on Spectrum-Based Fault Localization

This paper investigates the negative effects of soft assertion on the accuracy of Spectrum-based Fault Localization (SBFL). Soft assertion is a kind of test assertion which continues test case execution even after an assertion failure occurs. In general, the execution path becomes longer if the test case fails by a soft assertion. Hence, soft assertion will decrease the accuracy of SBFL which leverages the execution path of failed tests. In this study, we call the change of execution path due to soft assertion as path pollution. Our experimental results show that soft assertion actually reduces the accuracy of SBFL in 35% of faults.

Kouhei Mihara, Shinsuke Matsumoto, Shinji Kusumoto
Characterizing Requirements Smells

Context: Software specifications are usually written in natural language and may suffer from imprecision, ambiguity, and other quality issues, called thereafter, requirement smells. Requirement smells can hinder the development of a project in many aspects, such as delays, reworks, and low customer satisfaction. From an industrial perspective, we want to focus our time and effort on identifying and preventing the requirement smells that are of high interest. Aim: This paper aims to characterise 12 requirements smells in terms of frequency, severity, and effects. Method: We interviewed ten experienced practitioners from different divisions of a large international company in the safety-critical domain called MBDA Italy Spa. Results: Our interview shows that the smell types perceived as most severe are Ambiguity and Verifiability, while as most frequent are Ambiguity and Complexity. We also provide a set of six lessons learnt about requirements smells, such as that effects of smells are expected to differ across smell types. Conclusions: Our results help to increase awareness about the importance of requirement smells. Our results pave the way for future empirical investigations, ranging from a survey confirming our findings to controlled experiments measuring the effect size of specific requirement smells.

Emanuele Gentili, Davide Falessi
Do Exceptional Behavior Tests Matter on Spectrum-Based Fault Localization?

Debugging is a heavy task in software development. Computer-assisted debugging is expected to reduce these costs. Spectrum-based Fault Localization (SBFL) is one of the most actively studied computer-assisted debugging techniques. SBFL aims to identify the location of faulty code elements based on the execution paths of tests. Previous research reports that the accuracy of SBFL is affected by test types, such as flaky tests. Our research focuses on exceptional behavior tests to reveal the impact of such tests on SBFL. Since separating exceptional handling from normal control flow enables developers to increase program robustness, we think the execution paths of exceptional behavior tests are different from the ones of normal control flow tests, which means that the differences significantly affect the accuracy of SBFL. In this study, we investigated the accuracy of SBFL on two types of faults: faults that occurred in the real software development process and artificially generated faults. As a result, our study reveals that SBFL tends to be more accurate when all failing tests are exceptional behavior tests than when failing tests include no exceptional behavior tests.

Haruka Yoshioka, Yoshiki Higo, Shinsuke Matsumoto, Shinji Kusumoto, Shinji Itoh, Phan Thi Thanh Huyen
On Deprecated API Usages: An Exploratory Study of Top-Starred Projects on GitHub

A deprecated Application Programming Interface (API) is one that is no longer recommended to use by its original developers. While deprecated APIs (i.e., deprecated fields, methods, and classes) are still implemented, they can be removed in future implementations. Therefore, developers should not use deprecated APIs in newly written code and should update existing code so that it does not use deprecated APIs anymore. In this paper, we present the results of an exploratory Mining-Software-Repository study to gather preliminary empirical evidence on deprecated API usages in open-source Java applications. To that end, we quantitatively analyzed the commit histories of 14 applications whose software projects were top-starred on GitHub. We found that deprecated APIs usages are pretty widespread in the studied software applications; and only in half of these applications, developers remove deprecated API usages after few commits and days. Also, half of the studied applications mostly use deprecated APIs declared in their own source code, rather than using deprecated APIs that lie in third-party software. Finally, we noted that the introductions and removals of deprecated API usages are mostly the result of changes made by senior developers, rather than newcomer ones.

Pietro Cassieri, Simone Romano, Giuseppe Scanniello

Security, Vulnerabilities, and Human Factors

Frontmatter
Evaluating Microservice Organizational Coupling Based on Cross-Service Contribution

For traditional modular software systems, “high cohesion, low coupling” is a recommended setting while it remains so for microservice architectures. However, coupling phenomena commonly exist therein which are caused by cross-service calls and dependencies. In addition, it is noticeable that teams for microservice projects can also suffer from high coupling issues in terms of their cross-service contribution, which can inevitably result in technical debt and high managerial costs. Such organizational coupling needs to be detected and mitigated in time to prevent future losses. Therefore, this paper proposes an automatable approach to evaluate the organizational coupling by investigating the microservice ownership and cross-service contribution. Furthermore, we validate the feasibility of the approach using a case study of a popular microservice project. The results show that, with sufficient software repository data, we can not only evaluate the organizational coupling in microservice system projects but also continuously monitor its evolution.

Xiaozhou Li, Dario Amoroso d’Aragona, Davide Taibi
On Fixing Bugs: Do Personality Traits Matter?

We present the results of a prospective observational study aimed to understand whether there is a relationship between personality traits (i.e., agreeableness, conscientiousness, extroversion, neuroticism, and openness) and the performance of undergraduates in Computer Science while accomplishing bug fixing. We involved 62 undergraduates, who took part in eight laboratory sessions. The experimental sessions took place over a period of seven weeks. In each session, the participants were asked to fix bugs either in a C or in a Java program. We collected a relevant number of observations (496, in total) so making our study the largest (quantitative) one on the impact of personality on individual performance while executing an SE task. We observed that the lower the neuroticism level of a student, the better his/her performance in fixing bugs is.

Simone Romano, Giuseppe Scanniello, Maria Teresa Baldassarre, Danilo Caivano, Genoveffa Tortora
A Rapid Review on Software Vulnerabilities and Embedded, Cyber-Physical, and IoT Systems

This paper presents a Rapid Review (RR) conducted to identify and characterize existing approaches and methods that discover, fix, and manage vulnerabilities in Embedded, Cyber-Physical, and Internet-of-Things systems and software (ESs hereafter). In the last years, a growing interest concerned the adoption of ESs in different domains (e.g., automotive, healthcare) and with different purposes. Modern ESs are heterogeneous, computationally powerful, connected, and intelligent systems characterized by many technologies, devices, and an extensive use of embedded software (SW). Adopting software that could emulate or substitute hardware (HD) components makes the ESs flexible, tunable, and less costly but demands attention to security aspects such as SW vulnerabilities. Vulnerabilities can be exploited by attackers and compromise entire systems. The findings of our RR emerge from 61 papers and can be summarized as follows: (i) complex and connected ESs are studied especially for autonomous vehicles and robots; (ii) new methods and approaches are proposed mainly to discover software-vulnerabilities related to memory management in ES firmware software; and (iii) most of the proposed methods apply fuzzy-based dynamic analysis to binary and executable files of ES software.

Alessandro Marchetto, Giuseppe Scanniello
Social Sustainability Approaches for Software Development: A Systematic Literature Review

Social aspects in software sustainability refer to the impact of the software on the broader social and societal context. These aspects involve considerations such as accessibility, equity, inclusion, diversity, ethical and human values. While achieving software sustainability requires developers to embrace approaches that support the three dimensions of sustainability, there remains a lack of concrete approaches to address social aspects during software development. This literature review aims to facilitate the integration of social aspects into the software development process by identifying approaches related to social sustainability in software engineering. We extracted and analyzed data from 19 studies through thematic syntheses. The results of our analysis provide a list of recommended tools and practices to support social aspects and attain software sustainability goals. By incorporating these approaches into software development, we ensure that the software is not only technically sustainable but also socially responsible from a human perspective.

Ana Carolina Moises de Souza, Daniela Soares Cruzes, Letizia Jaccheri, John Krogstie
The Testing Hopscotch Model – Six Complementary Profiles Replacing the Perfect All-Round Tester

Contrasting the idea of a team with all-round testers, the Testing Hopscotch model includes six complementary profiles, tailored for different types of testing. The model is based on 60 interviews and three focus groups with 22 participants. The validation of the Testing Hopscotch model included ten validation workshops with 58 participants from six companies developing large-scale and complex software systems. The validation showed how the model provided valuable insights and promoted good discussions, helping companies identify what they need to do in order to improve testing in each individual case. The results from the validation workshops were confirmed at a cross-company workshop with 33 participants from seven companies and six universities. Based on the diverse nature of the seven companies involved in the study, it is reasonable to expect that the Testing Hopscotch model is relevant to a large segment of the software industry at large. The validation of the Testing Hopscotch model showed that the model is novel, actionable and useful in practice, helping companies identify what they need to do to improve testing in their organization.

Torvald Mårtensson, Kristian Sandahl
Continuous Experimentation and Human Factors
An Exploratory Study

In today’s rapidly evolving technological landscape, the success of tools and systems relies heavily on their ability to meet the needs and expectations of users. User-centered design approaches, with a focus on human factors, have gained increasing attention as they prioritize the human element in the development process. With the increasing complexity of software-based systems, companies are adopting agile development methodologies and emphasizing continuous software experimentation. However, there is limited knowledge on how to effectively execute continuous experimentation with respect to human factors within this context. This research paper presents an exploratory qualitative study for integrating human factors in continuous experimentation, aiming to uncover distinctive characteristics of human factors and continuous software experiments, practical challenges for integrating human factors in continuous software experiments, and best practices associated with the management of continuous human factors experimentation.

Amna Pir Muhammad, Eric Knauss, Jonas Bärgman, Alessia Knauss
Backmatter
Metadata
Title
Product-Focused Software Process Improvement
Editors
Regine Kadgien
Andreas Jedlitschka
Andrea Janes
Valentina Lenarduzzi
Xiaozhou Li
Copyright Year
2024
Electronic ISBN
978-3-031-49266-2
Print ISBN
978-3-031-49265-5
DOI
https://doi.org/10.1007/978-3-031-49266-2

Premium Partner