Skip to main content

2020 | Buch

Mining Software Engineering Data for Software Reuse

insite
SUCHEN

Über dieses Buch

This monograph discusses software reuse and how it can be applied at different stages of the software development process, on different types of data and at different levels of granularity. Several challenging hypotheses are analyzed and confronted using novel data-driven methodologies, in order to solve problems in requirements elicitation and specification extraction, software design and implementation, as well as software quality assurance.

The book is accompanied by a number of tools, libraries and working prototypes in order to practically illustrate how the phases of the software engineering life cycle can benefit from unlocking the potential of data.

Software engineering researchers, experts, and practitioners can benefit from the various methodologies presented and can better understand how knowledge extracted from software data residing in various repositories can be combined and used to enable effective decision making and save considerable time and effort through software reuse. Mining Software Engineering Data for Software Reuse can also prove handy for graduate-level students in software engineering.

Inhaltsverzeichnis

Frontmatter

Introduction and Background

Frontmatter
Chapter 1. Introduction
Abstract
Software engineering has grown to be one of the most important disciplines with noticeable impact to business and everyday life. However, there are still several challenges posed when developing and maintaining software, which often result in lost time and effort. In this chapter, we discuss how current challenges can be confronted using mining techniques toward applying software reuse. We initially focus on defining the scope and purpose of this book given the current state of the practice in software engineering. After that, the underlying context of software reuse is discussed with respect to the areas of requirements mining, source code mining, and quality assessment. For each of these areas, we outline the contributions of the book, and finally we provide an overview of the different chapters.
Themistoklis Diamantopoulos, Andreas L. Symeonidis
Chapter 2. Theoretical Background and State-of-the-Art
Abstract
This chapter provides an overview of the background knowledge that is relevant to the main areas of application of this book. The areas of software engineering, software reuse, and software quality are discussed in the context of taking advantage of useful data in order to improve the software development process. Upon providing the relevant definitions and outlining the data and metrics provided as part of software development, we discuss how data mining techniques can be applied to software engineering data and illustrate the reuse potential that is provided in an integrated manner.
Themistoklis Diamantopoulos, Andreas L. Symeonidis

Requirements Mining

Frontmatter
Chapter 3. Modeling Software Requirements
Abstract
Enhancing requirements elicitation and specification extraction has always been of added value to software engineering, as it expedites the software development life cycle. In this context, the main challenge is to construct formal models that are capable of storing requirements from multimodal formats and can facilitate requirements reuse. In this chapter, we present an approach that captures the static and dynamic view of software projects and generates traceable system specifications. Our ontology-based approach can receive input in the form of functional requirements, UML use case and activity diagrams, and storyboards and allows for reasoning over the stored requirements for validation and reuse purposes.
Themistoklis Diamantopoulos, Andreas L. Symeonidis
Chapter 4. Mining Software Requirements
Abstract
Requirements identification is one of the most important phases in software engineering, as incomplete or badly specified requirements are the most common cause of project failure. In this chapter, we design a methodology to facilitate requirements identification based on software reuse. Our methodology employs our ontology-based model and is applied to functional requirements and UML diagrams. Concerning functional requirements, we apply association rule mining and heuristics to detect incomplete or missing requirements, while for UML use case and activity diagrams, we employ model matching techniques to find similar diagrams and thus allow the engineer to improve the description of the functionality and the data flow of the project.
Themistoklis Diamantopoulos, Andreas L. Symeonidis

Source Code Mining

Frontmatter
Chapter 5. Source Code Indexing for Component Reuse
Abstract
The momentum of the open-source community has been constantly increasing, thus leading to numerous tools for writing, maintaining, and sharing source code. Several code search engines have been developed to support development tasks and facilitate reuse either directly or by functioning as information sources for code recommenders. In this chapter, we present AGORA, a code search engine that facilitates reuse in component level, snippet level, and project level. Through its Elasticsearch index, AGORA fosters advanced queries (syntax-aware, regular expressions), while the engine also integrates with popular code hosting repositories and offers a well-designed API. We provide representative examples and a usage scenario to illustrate the functionality of AGORA, and perform a comparative analysis in a code reuse context, which indicates that AGORA provides an efficient alternative to current solutions.
Themistoklis Diamantopoulos, Andreas L. Symeonidis
Chapter 6. Mining Source Code for Component Reuse
Abstract
Although the development of code search engines has brought forth syntax-aware capabilities when searching for reusable components, these engines do not fully exploit the given context and do not assess the retrieved source code. As a result, several test-driven reuse systems have been developed to offer context-aware component search and further assess the retrieved components using test cases. However, most of these systems employ strict matching criteria and do not offer information concerning the flow and the dependencies of the retrieved components. In this chapter, we present Mantissa, a system designed to overcome the aforementioned limitations. Mantissa allows code searching in growing repositories, such as GitHub. The user provides the input query as a code snippet and Mantissa employs a mechanism that uses Information Retrieval techniques to return functional software components. Finally, we provide an example usage scenario for Mantissa and evaluate our system against popular search engines and test-driven reuse systems to illustrate its effectiveness.
Themistoklis Diamantopoulos, Andreas L. Symeonidis
Chapter 7. Mining Source Code for Snippet Reuse
Abstract
As developers rely more and more on reusing components from online sources, an important challenge is that of finding snippets in order to integrate these components and/or to address common programming problems. Thus, several snippet mining systems have been developed, which however have important limitations. API usage mining systems require the developer to know which library to use beforehand, while more generic snippet mining systems usually output a list of examples, without distinguishing among different implementations and without assessing the quality and the reusability of the proposed snippets. In this chapter, we present CodeCatch, a system that receives queries in natural language and assesses the retrieved snippets both for their quality and for their preference by the developers. Furthermore, our system clusters the snippets according to their API calls, thus allowing the developer to select among the different implementations. We provide an example usage scenario for CodeCatch and evaluate it in a set of programming queries to illustrate how it can be useful for the developer.
Themistoklis Diamantopoulos, Andreas L. Symeonidis
Chapter 8. Mining Solutions for Extended Snippet Reuse
Abstract
The introduction of question–answering services, such as Stack Overflow, has given rise to a new problem-solving paradigm in software development. Using these services, developers can post their programming questions online and get useful solutions by the community. In this chapter we propose a methodology that allows searching for solutions in Stack Overflow, using the main elements of a question post, including its title, tags, body, and source code snippets. We design a similarity scheme for these elements that can be used for finding similar question posts. Text elements are compared using Information Retrieval metrics, while snippet similarity is computed by first converting snippets into sequences using a representation that extracts structural information. The results of the evaluation of our methodology indicate that it can be effective for recommending similar question posts, and thus can be used to search for solutions without fully forming a question.
Themistoklis Diamantopoulos, Andreas L. Symeonidis

Quality Assessment

Frontmatter
Chapter 9. Providing Reusability-Aware Recommendations
Abstract
As contemporary software development relies more on software reuse, several systems have been designed to automate the process of finding reusable software components from online sources and integrating them to one’s source code. However, these systems focus on whether the proposed components cover the desired functionality, without assessing also their reusability. In this chapter, we present a recommendation system for source code components that covers both the functional and the quality aspects of component reuse. Our system, which is named QualBoa, retrieves components from online repositories and reports their functional matching to the query as well as their reusability score, which is based on configurable thresholds of source code metrics. Upon providing an example usage scenario and evaluating QualBoa, we conclude that it is effective for recommending reusable source code.
Themistoklis Diamantopoulos, Andreas L. Symeonidis
Chapter 10. Assessing the Reusability of Source Code Components
Abstract
In the context of reusing components from online repositories, assessing the quality and specifically the reusability of source code before reusing it poses a major challenge for the research community. Although several quality assessment systems have been proposed, most of them do not focus on reusability. In this chapter, we design a reusability score using as ground truth information from GitHub stars and forks, which indicate the extent to which software components are adopted/preferred by developers. Our methodology includes applying different machine learning algorithms in order to produce reusability estimation models at both class and package levels. Finally, evaluating our methodology indicates that it can be effective for assessing reusability as perceived by developers.
Themistoklis Diamantopoulos, Andreas L. Symeonidis

Conclusion and Future Work

Frontmatter
Chapter 11. Conclusion
Abstract
This chapter concludes the book and summarizes the main contributions produced by applying mining techniques on software engineering data. These contributions lie in three different areas of application, which include requirements mining, source code mining, and quality assessment. We initially review each area individually, and then we discuss how our proposed techniques facilitate the software development process as a whole.
Themistoklis Diamantopoulos, Andreas L. Symeonidis
Chapter 12. Future Work
Abstract
In this chapter, we discuss ideas for future work in the area of applying mining techniques on software engineering data. We initially focus on potential improvements in each of the three main areas of application, which include requirements mining, source code mining, and quality assessment, and then we discuss the future work that can be identified for the field as a whole.
Themistoklis Diamantopoulos, Andreas L. Symeonidis
Metadaten
Titel
Mining Software Engineering Data for Software Reuse
verfasst von
Themistoklis Diamantopoulos
Andreas L. Symeonidis
Copyright-Jahr
2020
Electronic ISBN
978-3-030-30106-4
Print ISBN
978-3-030-30105-7
DOI
https://doi.org/10.1007/978-3-030-30106-4

Premium Partner