nach oben

2019 | Buch

Kapitel lesen Erstes Kapitel lesen

Software Engineering and Methodology for Emerging Domains

16th National Conference, NASAC 2017, Harbin, China, November 4–5, 2017, and 17th National Conference, NASAC 2018, Shenzhen, China, November 23–25, 2018, Revised Selected Papers

herausgegeben von: Zheng Li, He Jiang, Ge Li, Minghui Zhou, Prof. Ming Li

Verlag: Springer Singapore

Buchreihe : Communications in Computer and Information Science

Enthalten in: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

Einloggen, um Zugang zu erhalten

Über dieses Buch

This book constitutes the thoroughly refereed proceedings of the 16th National Conference, NASAC 2017, held in Harbin, China, in November 2017, and the 17th National Conference, NASAC 2018, held in Shenzhen, China, in November 2018.
The 6 revised selected papers were selected from 17 submissions for NASAC 2017, and 5 revised selected papers were selected from 20 submissions for NASAC 2018. The papers focus on all aspects of software engineering, e.g. requirements engineering, software methodologies, software analytics, software testing and evolution, and empirical studies.

Inhaltsverzeichnis

Frontmatter

Intelligent Software Engineering (NASAC 2017 English Track/CSBSE 2017)

Frontmatter

Learning to Generate Comments for API-Based Code Snippets

Abstract

Comments play an important role in software developments. They can not only improve the readability and maintainability of source code, but also provide significant resource for software reuse. However, it is common that lots of code in software projects lacks of comments. Automatic comment generation is proposed to address this issue. In this paper, we present an end-to-end approach to generate comments for API-based code snippets automatically. It takes API sequences as the core semantic representations of method-level API-based code snippets and generates comments from API sequences with sequence-to-sequence neural models. In our evaluation, we extract 217K pairs of code snippets and comments from Java projects to construct the dataset. Finally, our approach gains 36.48% BLEU-4 score and 9.90% accuracy on the test set. We also do case studies on generated comments, which presents that our approach generates reasonable and effective comments for API-based code snippets.

Yangyang Lu, Zelong Zhao, Ge Li, Zhi Jin

Test Oracle Prediction for Mutation Based Fault Localization

Abstract

In the process of software debugging, it is very critical and difficult to identify the locations of faults in an effective and accurate manner. Mutation-based fault localization (MBFL) is one of the most effective automated fault localization techniques that have been recently proposed, and it requires the execution results (passed or failed) of test cases to locate faults. One problem preventing MBFL from becoming a practical testing technique is the large amount of human effort involved, i.e., the test oracle problem, which refers to the process of checking an original program’s output of each test case. To mitigate the impact of this problem, we use mutant coverage information and learning algorithms to predict the oracle of the test cases in this paper. Empirical results show that the proposed method can reduce 80% of the human cost required to check the test oracles and achieve almost the same fault localization accuracy as compared to the original MBFL.

Zheng Li, Yonghao Wu, Haifeng Wang, Yong Liu

Parallel Evolutionary Algorithm in Scheduling Work Packages to Minimize Duration of Software Project Management

Abstract

Software project management problem mainly includes resources allocation and work packages scheduling. This paper presents an approach to Search Based Software Project Management based on parallel implementation of evolutionary algorithm on GPU. We redesigned evolutionary algorithm to cater for the purpose of parallel programming. Our approach aims to parallelize the genetic operators including: crossover, mutation and evaluation in the evolution process to achieve faster execution. To evaluate our approach, we conducted a “proof of concept” empirical study, using data from three real-world software projects. Both sequential and parallel version of a conventional single objective evolutionary algorithm are implemented. The sequential version is based on common programming approach using C++, and the parallel version is based on GPGPU programming approach using CUDA. Results indicate that even a relatively cheap graphic card (GeForce GTX 970) can speed up the optimization process significantly. We believe that deploy parallel evolutionary algorithm based on GPU may fit many applications for other software project management problems, since software projects often have complex inter-related work packages and resources, and are typically characterized by large scale problems which optimization process ought to be accelerated by parallelism.

Jinghui Hu, Xu Wang, Jian Ren, Chao Liu

Multi-gene Genetic Programming Based Defect-Ranking Software Modules

Abstract

Most software defect prediction models aim at predicting the number of defects in a given software. However, it is very difficult to predict the precise number of defects in a module because of the presence of noise data. Another type of frequently used approach is ranking the software modules according to the relative number of defects, according to which software defect prediction can guide the testers to allocate the limited resources preferentially to modules with a greater number of defects. Owing to the redundant metrics in software defect data-sets, researchers always need to reduce the dimensions of the metrics before constructing defect prediction models. However a reduction in the number of dimensions may lead to some useful information being deleted too early, and consequently, the performance of the prediction model will decrease. In this paper, we propose an approach using multi-gene genetic programming (MGGP) to build a defect rank model. We compared the MGGP-based model with other optimized methods over 11 publicly available defect data-sets consisting of several software systems. The fault-percentile-average (FPA) is used to evaluate the performance of the MGGP and other methods. The results show that the models for different test objects that are built based on the MGGP approach perform better those based on other nonlinear prediction approaches when constructing the defect rank. In addition, the correlation between the software metrics will not affect the prediction performance. This means that, by using the MGGP method, we can use the original features to construct a prediction model without considering the influence of the correlation between the software module features.

Junxia Guo, Yingying Duan, Ying Shang

Call Graph Based Android Malware Detection with CNN

Abstract

With the increasing shipment of Android malware, malicious APK detection becomes more important. Based on static analysis, we propose a new perspective to detect malicious behaviors. In particular, we extract the patterns of suspicious APIs which are invoked together. We call these patterns local features. We propose a convolutional neural network(CNN) model based on APK’s call graph to extract local features. With the comparison of detection experiments, we demonstrate that the local features indeed help to detect malicious APKs and our model is effective in extracting local features.

Yuxuan Liu, Ge Li, Zhi Jin

Software Requirements Elicitation Based on Ontology Learning

Abstract

User demand is the key to software development. The domain ontology established by artificial intelligence can be used to describe the relationship between concepts and concepts in a specific domain, which can enable users to agree on conceptual understanding with developers. This paper uses the ontology learning method to extract the concept, and the ontology is constructed semi-automatically or automatically. Because the traditional weight calculation method ignores the distribution of feature items, the concept of information entropy is introduced and the CCM method is further integrated. This method can improve the automation degree of ontology construction, and also make the user requirements of software more accurate and complete.

Jie Zhang, Min Yuan, Zhiqiu Huang

Software Mining (NASAC 2018 English Track)

Frontmatter

An Empirical Study of Link Sharing in Review Comments

Abstract

In the pull-based development, developers sometimes exchange review comments and share links, namely Uniform Resource Locators (URLs). Links are used to refer to related information from different websites, which may be beneficial to pull request evaluation. Nevertheless, little effort has been done on analyzing how links are shared and whether sharing links has any impacts on code review in GitHub. In this paper, we conduct a study of link sharing in review comments. We collect 114,810 pull requests and 251,487 review comments from 10 popular projects in GitHub. We find that 5.25% of pull requests have links in review comments on average. We divide links into two types: internal links which point to context in the same project, and external links which point to context outside of the project. We observe that 51.49% of links are internal, while 48.51% of links are external. The majority of internal links point to pull requests or blobs inside projects. We further study impacts of links. Results show that pull requests with links in review comments have more comments, more commenters and longer evaluation time than pull requests without links. These findings show that developers indeed share links and refer to related information in review comments. These results inspire future studies which enable more effective information sharing in the open source community, and improve information accessibility and navigability for software developers.

Jing Jiang, Jin Cao, Li Zhang

Changes Are Similar: Measuring Similarity of Pull Requests That Change the Same Code in GitHub

Abstract

Pull-based development is widely used in globally collaborative platforms, such as GitHub and BitBucket. A pull request is a set of changes to existing source code in a project. A developer submits a pull request and tends to update the source code. Due to the parallel mechanism, several developers may submit multiple pull requests to change the same lines of code. This fact results in the conflict between changes, which makes the project manager difficult to decide which pull request should be merged. In this paper, we conducted a preliminary study on measuring the similarity of pull requests that aim to change the same code in GitHub. We proposed two methods, i.e., the cosine and the doc2vec, to quantify the structural similarity and the semantic similarity between pull requests and evaluated the similarity on four widely-studied open source Java projects. Our study shows that there indeed exists high similarity between competing pull requests and the similarity among projects diversifies. This complicates the merging decision by project managers.

Ping Ma, Danni Xu, Xin Zhang, Jifeng Xuan

Contiguous Sequence Mining Approach for Program Procedure Pattern

Abstract

The program procedure patterns exist in the software development process. This paper gives a program procedure pattern mining approach based on MCSPAN (Maximal Contiguous Sequential pattern mining). First the program structure features are mined. Structure feature mining is transformed into a frequent sequence mining problem with contiguous constrains and maximal constrains and an algorithm: MCSPAN is given to obtain the program structure candidate patterns as follows. Then a filtering algorithm with the constraint of the data flow feature is given to filter the structure candidate patterns and then the program structure relationship candidate patterns are obtained which have the program procedure pattern form. For clarifying the function semantics, some heuristic rules are applied in the structure relation candidate patterns filtering and finally the program procedure patterns are obtained. Through mining more than 100,100 lines of java code, about 180 kinds of program procedure patterns are discovered. This mining approach is effective through analyzing the recall and precision rate in the experiment. And the availability of the mined program procedure patterns is analyzed.

Jianbin Liu, Jingjing Zhao, Liwei Zheng

Mining the Contributions Along the Lifecycles of Open-Source Projects

Abstract

Recently the impact of developers’ behavior on the evolution of open-source software (OSS) has become a hot topic. When does the developer commit his/her code? Is there any regularity of the time distribution of commit along the lifecycles of open-source project? Will the change of the core member in a development team has an impact on software evolution process? We are quite interested in these above questions so we conducted an empirical study in this paper. We collect more than 50,000 commits from 6 open-source software in Github and design a formula to measure the contributor’s contribution value. We then take four major experiments to analyze some issues about inert intervals and the impact of the change of main contributors on software evolution. To make the result visible, we also design an automatic mining tool which can automatically mine the metadata from specified repository and make it graphically presented. Through the experiments we gained some interesting findings such as there is no inevitable statistical connection between a contributor’s inert interval and his contribution value, and main contributors’ change has a huge impact on the software evolution. We believe that these findings will have deeper research significance in the future.

Hang Zhou, Lei Xu, Yanhui Li

Issue Workflow Explorer

Abstract

Resolving issues is an essential part of Free/Libre and Open Source Software (FLOSS) development. For large and active projects, there could be hundreds of new issues reported every month, which have mixed quality. To deal with this complexity, the projects developed different protocols of resolving issues (i.e., issue workflows). To help understand existing practice and develop best practice, it’s important to explore how the workflow evolves in the history, e.g., under what circumstances a particular workflow emerges, how efficient and effective it is and whether it can be improved. We build Issue Workflow Explorer (IWE) to help practitioners seek answers. Based on ubiquitous records in issue tracking system, IWE provides functionalities of discovering workflows, quantifying, visualizing and comparing their efficiency and effectiveness. We demonstrate IWE’s effectiveness with two large OSS projects, Mozilla and GNOME. We explore what workflows there are for issue triaging and handling of incomplete issues. We obtain helpful insights for future development, e.g., triage conducted by reporters themselves should be restricted and it is not cost-effective to keep incomplete issue reports open.

The source code of IWE is available at https://github.com/johnarseal/IWE.

Jiaxin Zhu, Zhen Zhong, Minghui Zhou

Backmatter

Titel: Software Engineering and Methodology for Emerging Domains
herausgegeben von: Zheng Li
He Jiang
Ge Li
Minghui Zhou
Prof. Ming Li
Verlag: Springer Singapore
Electronic ISBN: 978-981-15-0310-8
Print ISBN: 978-981-15-0309-2
DOI: https://doi.org/10.1007/978-981-15-0310-8