A large-scale study on the usage of Java’s concurrent programming constructs
Introduction
Multicore systems offer the potential for cheap, scalable, high-performance computing and also for significant reductions in power consumption. To achieve this potential, it is essential to take advantage of new heterogeneous architectures comprising collections of multiple processing elements. To leverage multicore technology, applications must be concurrent, which poses a challenge, since it is well-known that concurrent programming is hard (Sutter, 2005). A number of programming languages provide constructs for concurrent programming. These solutions vary greatly in terms of abstraction, error-proneness, and performance. The Java programming language is particularly rich when it comes to concurrent programming constructs. For example, it includes the concept of monitor, a low-level mechanism supporting both mutual exclusion and condition-based synchronization, as well as a high-level library (Lea, 2005), java.util.concurrent, also known as j.u.c., introduced in version 1.5 of the language.
In both academia and industry, there is a strong belief that multicore technology will radically change the way software is built. However, to the best of our knowledge, there is a lack of reliable information about the current state of the practice of the development of concurrent software in terms of the constructs that developers employ. In this work, we aim to partially fill this gap.
Specifically, we present an empirical study aimed at establishing the current state of the practical usage of concurrent programming constructs in Java applications. We have analyzed 2227 stable and mature Java projects comprising more than 600 million lines of code (LoC—without blank lines and comments) from SourceForge, one of the most popular open source code repositories. Our analysis encompasses several versions of these applications and is based on more than 50 source code metrics that we have automatically collected. We have also studied correlations among some of these metrics in an attempt to find trends in the use of concurrent programming constructs. We have chosen Java because it is a widely used object-oriented programming language. Moreover, as we said before, it includes support for multithreading with both low-level and high-level mechanisms. Additionally, it is the language with the highest number of projects in SourceForge.
Evidence on how concurrent programs are written can raise developer awareness about available mechanisms. It can also indicate how well-accepted some of these mechanisms are in practice. Moreover, it can inform researchers designing new mechanisms about the kinds of constructs that developers may be more willing to use. Tool vendors can also benefit by supporting developers in the use of lesser-known, more efficient mechanisms, for example, by implementing novel refactorings (Dig, Marrero, Ernst, 2009, Ishizaki, Daijavad, Nakatani, 2011, Schäfer, Sridharan, Dolby, Tip, 2011a). Furthermore, results such as those uncovered by this study can support lecturers in more convincingly arguing students into the importance of concurrent programming, not only for the future of software development, but also for the present.
Mining data from the SourceForge repository poses several challenges. Some of them are inherent to the process of obtaining reliable data. These derive mainly from two factors: scale and lack of a standard organization for source code repositories. Others pertain to transforming the data into useful information. Grechanik et al. (2010) discussed a few challenges that make it difficult to obtain evidence from source code. For example, getting the source code of all software versions is difficult because there is no naming pattern to define if a compressed file contains source code, binary code or something else. Furthermore, it is difficult to be sure that an error has not occurred during measurement, due to the number of projects and project versions. We address these challenges by creating an infrastructure for obtaining and processing large code bases, specifically targeting SourceForge. In addition, we have conducted a survey with the committers of some of these projects as an attempt to verify whether their beliefs are supported by our data.
Based on the data we have obtained, we propose to answer a number of research questions (RQ).
We found out that more than 75% of the most recent versions of the examined projects include some form of concurrent programming, e.g., at least one occurrence of the synchronized keyword. In medium projects (20,001–100,000 LoC) this percentage grows to more than 90% and reaches 100% for large projects (over 100,000 LoC). In addition, the mean numbers (per 100,000 LoC) of synchronized methods, classes extending Thread, and classes implementing Runnable are, respectively, 66.75, 13, and 13.85. These results indicate that projects often use concurrent programming constructs and a considerable number do so intensively.1 On the other hand, perhaps counterintuitively, the overall percentage of concurrent projects has not seen significant change throughout the years, despite the pervasiveness of multicore machines.
Our data shows that only 23.21% of the analyzed concurrent projects employ classes of the java.util.concurrent library. On the other hand, there has been a growth in the adoption of this library. However, this growth does not in general seem to be related to a decrease in the use of Java’s traditional concurrent programming constructs, with a few exceptions. Furthermore, projects that have been in active development more recently, i.e., had at least one version released since 2009, employ the java.util.concurrent library more intensively than the mean. Therefore, the percentage of active, mature projects that use that library is actually higher than 23.21%.
Most of the projects use synchronized blocks and methods. The volatile modifier, explicit locks (including variations such as read-write locks), and atomic variables are less common, albeit some of them seem to be growing in popularity. We also noticed a tendency of growth in the use of synchronized blocks. In particular, the growth in their use correlates positively with the growth in the use of atomic data types, explicit locks, and the volatile modifier.
We found out that implementing the Runnable interface is the most common approach to define new threads. Moreover, a considerable number of projects employ Executors to manage thread execution (11.14% of the concurrent projects). It was possible to observe that projects that employ executors exhibit a weak tendency to reduce the number of classes that explicitly extend the Thread class.
We observed that developers are still using mostly Hashtable and HashMap, even though the former is thread-safe but inefficient and the latter is not thread-safe. Notwithstanding, there is a tendency towards the use of ConcurrentHashMap as a replacement for other associative data structures in a number of projects.
A large number of concurrent projects include invocations of the notify(), notifyAll(), or wait() methods. At the same time, we noticed that a small number of projects have eliminated many uses of these methods, employing the CountDownLatch class, part of the java.util.concurrent library, instead. This number is not large enough for statistical analysis. Nevertheless, it indicates that mechanisms with simple semantics like CountDownLatch have potential to, in some contexts, replace lower-level, more traditional ones.
Our data indicates that less than 3% of the concurrent projects implement the Thread.UncaughtExceptionHandler interface, which means that, in 97% of the concurrent projects, an exception stemming from a programming error might cause threads to die silently, potentially affecting the behavior of threads that interact with them. Moreover, analyzing these implementations, we discovered that developers often do not know what to do with uncaught exceptions in threads, even when they do implement a handler. This provides some indications that new exception handling mechanisms that explicitly address the needs of concurrent applications are called for.
To provide a basic intuition as to what developers believe to be true about the usage of concurrent programming constructs, we have also conducted a survey with more than 160 software developers. These developers are all committers of projects whose source code we have analyzed. This survey presented respondents with various questions, such as “What do you believe to be the most often used concurrent/parallel programming construct of the Java language?”. Throughout the paper, we contrast the results of this survey with data obtained by analyzing the Java source code.
This work makes the following contributions:
- •
It is the first large-scale study on the usage of concurrent programming constructs in the Java language, including an analysis on how the usage of these constructs has evolved along time.
- •
It presents a considerable amount of data pertaining to the current state-of-the-practice of real concurrent projects and the evolution of these projects along time.
- •
It presents results from a survey conducted with committers of some of the analyzed projects. This survey provides an overview of the perception of developers about the use of concurrent programming constructs.
The rest of the paper is organized as follows: Section 2 presents some background on concurrent programming in Java. Section 3 describes our survey setup and some initial results. Next, in Section 4, we describe the infrastructure we employed to download and extract the analyzed data. In Section 5 we present the results of our study organized in terms of the research questions. We then present the threats to the validity of this work in Section 6 and some implications in Section 7. Section 8 is dedicated to related work. Finally, in Section 9, we present our conclusions and discuss future directions.
Section snippets
Background
Before presenting our study, we provide a brief background on concurrent programming. A detailed presentation about concurrent programming concepts is available elsewhere (Tanenbaum, 2008).
Generally speaking, processes and threads are the main abstractions of concurrent programming. A process is a container that keeps all the information needed to run a program, for instance, the memory location where the process can read and write data. A thread, on the other hand, can be seen as a lightweight
Survey
We have conducted a survey with programmers in order to gather information about the perception of developers about the usage of concurrent programming constructs in Java. Using this information we can check whether the intuition of these developers is reflected by the source code of real systems. The questionnaire was designed to the recommendations of Kitchenham and Pfleeger (2008), following the phases prescribed by the authors: planning, creating the questionnaire, defining the target
Study setting
This section describes the configuration of our study: our basic assumptions, our mining infrastructure, and the metrics suite that we employed.
We have built a set of tools to download projects from SourceForge, analyze the source code, and collect metrics from these projects. It comprises a crawler, a metrics collection tool, and some auxiliary shell scripts. We call this infrastructure Groundhog. Fig. 1 depicts the infrastructure we employed. Initially, the crawler populates the project
Study results
This section presents the results of our study. We organized the results in terms of the research questions.
Threats to validity
In a study such as this, there are always many limitations and threats to validity. First, to download the source code of the projects, we assumed that the source files were packaged in a file with the keywords “src” or “source” in its name. This is common practice in open source repositories. Nonetheless, it is not a rule and some projects are bound to adopt different naming conventions. We have ignored such projects. Moreover, obtaining the release date of some project versions was not
Study implications
This research has implications for different kinds of stakeholders. Five of these possible groups are discussed below.
Developers: Developers are now facing the problem of developing concurrent applications with more frequency, while keeping cost as low as possible and quality as high as possible. The results of our study provide some assistance to these developers. First, by showing that concurrent programming is already in widespread use and that they cannot ignore it (RQ1). Second, by
Related work
This section discusses related research.
Conclusion
This paper presents an empirical study into a large-scale Java open source repository. We found out that developers employ mainly simple mutual exclusion constructs. These constructs are easy to understand (though difficult to reason about) and have been available in Java since its initial version, released more than 15 years ago. Almost 80% of the concurrent projects include at least one synchronized method. Still, less than 25% of the projects employ the abstractions implemented by the
Acknowledgments
We would like to thank the anonymous reviewers for their helpful comments. Fernando is partially supported by CNPq/Brazil (304755/2014-1, 487549/2012-0 and 477139/2013-2), FACEPE/Brazil (APQ- 0839-1.03/14) and INES (CNPq 573964/2008-4, FACEPE APQ-1037-1.03/08, and FACEPE APQ-0388-1.03/14). Any opinions expressed here are from the authors and do not necessarily reflect the views of the sponsors.
Gustavo Pinto is a Postdoctoral Researcher at Federal University of Pernambuco (UFPE). He received his M.Sc. degrees in Computer Science from Federal University of Paraná (UFPR), and his Ph.D. from Federal University of Pernambuco (UFPE). His research interests include performance and energy consumption, concurrent programming, social aspects of software engineering, big data analytics, and refactoring.
References (44)
The java.util.concurrent synchronizer framework
Sci. Comput. Program.
(2005)- et al.
Understanding the shape of java software
SIGPLAN Not.
(2006) - et al.
Concurrent programming with revisions and isolation types
Proceedings of OOPSLA’2010, Reno, USA
(2010) - et al.
Javelin: Internet-based parallel computing using java
Concurr. Pract. Exp.
(1997) - et al.
An empirical study of java bytecode programs
Softw. Pract. Exp.
(2007) - et al.
How do programs become more concurrent? A story of program transformations
Technical Report
(2008) - et al.
Refactoring sequential java code for concurrency via concurrent libraries
Proceedings of the 31st International Conference on Software Engineering, Vancouver, Canada
(2009) - et al.
Boa: a language and infrastructure for analyzing ultra-large-scale software repositories
ICSE’13: 35th International Conference on Software Engineering
(2013) Jpvm: network parallel computing in java
ACM 1998 Workshop on Java for High-Performance Network Computing
(1997)- et al.
Micro patterns in java code
SIGPLAN Not.
(2005)
Java Concurrency in Practice
An empirical investigation into a large-scale java open source code repository
Proceedings of the 4th International Symposium on Empirical Software Engineering and Measurement, Bolzano-Bozen, Italy
Encapsulating objects with confined types
ACM Trans. Program. Lang. Syst.
The Art of Multiprocessor Programming
R: A language for data analysis and graphics
J. Comput. Graph. Stat.
Refactoring java programs using concurrent libraries
Proceedings of the Workshop on Parallel and Distributed Systems: Testing, Analysis, and Debugging
Personal opinion surveys
Have things changed now? An empirical study of bug characteristics in modern open source software
Proceedings of the 1st Workshop on Architectural and System Support for Improving Software Dependability
Check-then-act misuse of java concurrent collections
Proceedings of the 2013 IEEE Sixth International Conference on Software Testing, Verification and Validation
Retrofitting concurrency for android applications through refactoring
Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering
Learning from mistakes: a comprehensive study on real world concurrency bug characteristics
SIGOPS Oper. Syst. Rev.
Cited by (42)
Tracking runtime concurrent dependences in java threads using thread control profiling
2019, Journal of Systems and SoftwareCitation Excerpt :Programmers often misuse concurrent programming constructs. Only about 3% of projects handle thread exceptions, which may result in bugs or deterioration in applications’ performance (Pinto et al., 2015). In real-world applications, it is almost impossible to ensure concurrent programs to behave as expected.
Do android developers neglect error handling? a maintenance-Centric study on the relationship between android abstractions and uncaught exceptions
2018, Journal of Systems and SoftwareCitation Excerpt :However, these studies are designed so that they can be performed in a completely automated manner. This is inherent to the data they aim to collect, e.g., syntactic information about usage of specific constructs (Pinto et al., 2015) or stack traces (Coelho et al., 2015). However, it is impractical to generalize the collection of information about exception handling change scenarios, e.g., Changing the catch block to use normal code (Table 2).
An Automatic Transformer from Sequential to Parallel Java Code
2023, Future InternetThe ThreadRadar visualization for debugging concurrent Java programs
2022, Journal of VisualizationDesign and Implementation of Spoken English System Based on Artificial Intelligence
2022, ACM International Conference Proceeding SeriesDazed and Confused: Studying the Prevalence of Atoms of Confusion in Long-Lived Java Libraries
2022, Proceedings - 2022 IEEE International Conference on Software Maintenance and Evolution, ICSME 2022
Gustavo Pinto is a Postdoctoral Researcher at Federal University of Pernambuco (UFPE). He received his M.Sc. degrees in Computer Science from Federal University of Paraná (UFPR), and his Ph.D. from Federal University of Pernambuco (UFPE). His research interests include performance and energy consumption, concurrent programming, social aspects of software engineering, big data analytics, and refactoring.
Weslley Torres received his M.Sc. degrees in Computer Science from Federal University of Pernambuco (UFPE) and now he is a Ph.D. student in computer science also at UFPE. His research interests cover concurrent programming and software evolution.
Benito Fernandes is a MSc student in Computer Science at Universidade Federal de Pernambuco (UFPE). His research interests cover concurrent programming, energy efficiency and software evolution.
Fernando Castor is an assistant professor at the Universidade Federal de Pernambuco (UFPE), Brazil. His research aims to support developers in the construction of large-scale, dependable software systems, with a particular emphasis on error handling, concurrent programming, energy efficiency, and software evolution.
Roberto S. M. Barros received his B.Sc. and M.Sc. degrees in Computer Science from Universidade Federal de Pernambuco (UFPE), Brazil, in 1985 and 1988, respectively, and his Ph.D. degree in Computing Science from The University of Glasgow, Scotland (UK) in 1994. From 1985 to 1995 he worked as systems analyst at UFPE and since 1995 he is a full time Professor and Researcher, also at UFPE. His main research areas are software engineering, programming languages, XML, and Machine Learning from Data streams with Concept Drift.