research-article

Mining billions of AST nodes to study actual and potential usage of Java language features

Authors:
Robert Dyer

Iowa State University, USA

Iowa State University, USA
View Profile

,
Hridesh Rajan

Iowa State University, USA

Iowa State University, USA
View Profile

,
Hoan Anh Nguyen

Iowa State University, USA

Iowa State University, USA
View Profile

,
Tien N. Nguyen

Iowa State University, USA

Iowa State University, USA
View Profile

ICSE 2014: Proceedings of the 36th International Conference on Software EngineeringMay 2014Pages 779–790https://doi.org/10.1145/2568225.2568295

Published:31 May 2014Publication History

ICSE 2014: Proceedings of the 36th International Conference on Software Engineering

Pages 779–790

ABSTRACT

Programming languages evolve over time, adding additional language features to simplify common tasks and make the language easier to use. For example, the Java Language Specification has four editions and is currently drafting a fifth. While the addition of language features is driven by an assumed need by the community (often with direct requests for such features), there is little empirical evidence demonstrating how these new features are adopted by developers once released. In this paper, we analyze over 31k open-source Java projects representing over 9 million Java files, which when parsed contain over 18 billion AST nodes. We analyze this corpus to find uses of new Java language features over time. Our study gives interesting insights, such as: there are millions of places features could potentially be used but weren't; developers convert existing code to use new features; and we found thousands of instances of potential resource handling bugs.

References

Eclipse. http://www.eclipse.org/, 2014.Google Scholar
Eclipse Java development tools (JDT). http://www.eclipse.org/jdt/overview.php, 2014.Google Scholar
Netbeans. http://www.netbeans.org/, 2014.Google Scholar
Netbeans inspect and transform. https://netbeans.org/kb/docs/java/ editor-inspect-transform.html#convert, 2014.Google Scholar
Apache Software Foundation. Hadoop: Open source implementation of MapReduce. http://hadoop.apache.org/, 2014.Google Scholar
P. F. Baldi, C. V. Lopes, E. J. Linstead, and S. K. Bajracharya. A theory of aspects as latent topics. In Proceedings of the 23rd ACM SIGPLAN conference on Object-Oriented Programming Systems Languages and Applications, OOPSLA, pages 543–562, 2008. Google ScholarDigital Library
H. A. Basit, D. C. Rajapakse, and S. Jarzabek. An empirical study on limits of clone unification using generics. In Proceedings of the 17th International Conference on Software Engineering and Knowledge Engineering, SEKE, pages 109–114, 2005.Google Scholar
G. Bracha, M. Odersky, D. Stoutamire, and P. Wadler. Making the future safe for the past: adding genericity to the Java programming language. SIGPLAN Not., 33(10), Oct. 1998. Google ScholarDigital Library
O. Callaú, R. Robbes, E. Tanter, and D. Röthlisberger. How developers use the dynamic features of programming languages: the case of Smalltalk. In Proceedings of the 8th Working Conference on Mining Software Repositories, MSR, pages 23–32, 2011. Google ScholarDigital Library
A. S. Christensen, A. Møller, and M. I. Schwartzbach. Precise analysis of string expressions. In Proceedings of the 10th international conference on Static Analysis, SAS, pages 1–18, 2003. Google ScholarDigital Library
J. Dean and S. Ghemawat. MapReduce: simplified data processing on large clusters. In Proceedings of the 6th Symposium on Operating Systems Design and Implementation, OSDI, 2004. Google ScholarDigital Library
Dice Holdings, Inc. Sourceforge website. http://sourceforge.net/, 2014.Google Scholar
E. Duala-Ekoko and M. P. Robillard. Using structure-based recommendations to facilitate discoverability in APIs. In Proceedings of the 25th European conference on Object-oriented programming, ECOOP, pages 79–104, 2011. Google ScholarDigital Library
R. Dyer, H. Nguyen, H. Rajan, and T. N. Nguyen. Boa: A language and infrastructure for analyzing ultra-large-scale software repositories. In Proceedings of the 35th ACM/IEEE International Conference on Software Engineering, ICSE, pages 422–431, 2013. Google ScholarDigital Library
R. Dyer, H. Rajan, and T. N. Nguyen. Declarative visitors to ease fine-grained source code mining with full history on billions of AST nodes. In Proceedings of the 12th International Conference on Generative Programming: Concepts & Experiences, GPCE, 2013. Google ScholarDigital Library
T. Gorschek, E. Tempero, and L. Angelis. A large-scale empirical study of practitioners’ use of object-oriented concepts. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering, ICSE, pages 115–124, 2010. Google ScholarDigital Library
J. Gosling, B. Joy, and G. Steele. Java(TM) Language Specification. Addison-Wesley Longman Publishing Co., Inc., 1st edition, 1996. Google ScholarDigital Library
J. Gosling, B. Joy, G. Steele, and G. Bracha. Java(TM) Language Specification. Addison-Wesley Longman Publishing Co., Inc., 2nd edition, 2000. Google ScholarDigital Library
J. Gosling, B. Joy, G. Steele, and G. Bracha. Java(TM) Language Specification. Addison-Wesley Professional, 3rd edition, 2005. Google ScholarDigital Library
J. Gosling, B. Joy, G. Steele, G. Bracha, and A. Buckley. Java(TM) Language Specification. Prentice Hall, Java SE 7 edition, 2013. Google ScholarDigital Library
M. Grechanik, C. McMillan, L. DeFerrari, M. Comi, S. Crespi, D. Poshyvanyk, C. Fu, Q. Xie, and C. Ghezzi. An empirical investigation into a large-scale Java open source code repository. In International Symposium on Empirical Software Engineering and Measurement, ESEM, pages 11:1–11:10, 2010. Google ScholarDigital Library
M. Hoppe and S. Hanenberg. Do developers benefit from generic types? An empirical comparison of generic and raw types in Java. In 4th ACM SIGPLAN conference on Systems, Programming, Languages and Applications: Software for Humanity, SPLASH, 2013. Google ScholarDigital Library
E. Linstead, S. Bajracharya, T. Ngo, P. Rigor, C. Lopes, and P. Baldi. Sourcerer: mining and searching internet-scale software repositories. Data Mining and Knowledge Discovery, 18, 2009. Google ScholarDigital Library
B. Livshits, J. Whaley, and M. S. Lam. Reflection analysis for Java. In Proceedings of the Third Asian conference on Programming Languages and Systems, APLAS, pages 139–160, 2005. Google ScholarDigital Library
L. Meyerovich and A. Rabkin. Empirical analysis of programming language adoption. In 4th ACM SIGPLAN conference on Systems, Programming, Languages and Applications: Software for Humanity, SPLASH, 2013. Google ScholarDigital Library
R. Muschevici, A. Potanin, E. Tempero, and J. Noble. Multiple dispatch in practice. In Proceedings of the 23rd ACM SIGPLAN conference on Object-oriented programming systems languages and applications, OOPSLA, pages 563–582, 2008. Google ScholarDigital Library
C. Parnin, C. Bird, and E. R. Murphy-Hill. Java generics adoption: how new features are introduced, championed, or ignored. In 8th IEEE International Working Conference on Mining Software Repositories, MSR, 2011. Google ScholarDigital Library
H. Rajan, T. N. Nguyen, R. Dyer, and H. A. Nguyen. Boa website. http://boa.cs.iastate.edu/, 2014.Google Scholar
P. Ratanaworabhan, B. Livshits, and B. G. Zorn. Jsmeter: comparing the behavior of JavaScript benchmarks with real web applications. In Proceedings of the 2010 USENIX conference on Web application development, WebApps, 2010. Google ScholarDigital Library
P. Resnick and H. R. Varian. Recommender systems. Commun. ACM, 40(3):56–58, 1997. Google ScholarDigital Library
G. Richards, C. Hammer, B. Burg, and J. Vitek. The eval that men do: A large-scale study of the use of eval in JavaScript applications. In Proceedings of the 25th European conference on Object-oriented programming, ECOOP, pages 52–78, 2011. Google ScholarDigital Library
G. Richards, S. Lebresne, B. Burg, and J. Vitek. An analysis of the dynamic behavior of JavaScript programs. In Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation, PLDI, 2010. Google ScholarDigital Library
M. Robillard, R. Walker, and T. Zimmermann. Recommendation systems for software engineering. IEEE Software, 27(4):80–86, 2010. Google ScholarDigital Library
S. R. Schach. Object-oriented and Classical Software Engineering. McGraw-Hill Higher Education. McGraw-Hill Higher Education, 2005. Google ScholarDigital Library
E. Tempero. How fields are used in Java: An empirical study. In Proceedings of the 20th Australian Software Engineering Conference, ASWEC, pages 91–100, 2009. Google ScholarDigital Library
E. Tempero, J. Noble, and H. Melton. How do Java programs use inheritance? An empirical study of inheritance in Java software. In Proceedings of the 22nd European conference on Object-Oriented Programming, ECOOP, pages 667–691, 2008. Google ScholarDigital Library
W. Weimer and G. C. Necula. Finding and preventing run-time error handling mistakes. In Proceedings of the 19th ACM SIGPLAN conference on Object-oriented programming systems languages and applications, OOPSLA, pages 419–431, 2004. Google ScholarDigital Library
C. Yue and H. Wang. Characterizing insecure JavaScript practices on the web. In Proceedings of the 18th international conference on World Wide Web, WWW, pages 961–970, 2009. Google ScholarDigital Library

Index Terms

Mining billions of AST nodes to study actual and potential usage of Java language features
1. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language features

Recommendations

Understanding the API usage in Java

ContextApplication Programming Interfaces (APIs) facilitate the use of programming languages. They define sets of rules and specifications for software programs to interact with. The design of language API is usually artistic, driven by aesthetic ...
Read More
Evaluating the Java Native Interface JNI: Leveraging Existing Native Code, Libraries and Threads to a Running Java Virtual Machine

This article aims to explore JNI features and to discover fundamental operations of the Java programming language, such as arrays, objects, classes, threads and exception handling, and to illustrate these by using various algorithms and code samples. ...
Read More
A study of Java's non-Java memory
OOPSLA '10: Proceedings of the ACM international conference on Object oriented programming systems languages and applications

A Java application sometimes raises an out-of-memory ex-ception. This is usually because it has exhausted the Java heap. However, a Java application can raise an out-of-memory exception when it exhausts the memory used by Java that is not in the Java ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICSE 2014: Proceedings of the 36th International Conference on Software Engineering
May 2014
1139 pages
ISBN:9781450327565
DOI:10.1145/2568225
General Chair:
Pankaj Jalote
IIIT-Delhi, India
,
Program Chairs:
Lionel Briand
University of Luxembourg, Luxembourg
,
André van der Hoek
University of California, Irvine, USA
Copyright © 2014 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 31 May 2014
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Java
empirical study
language feature use
software mining
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate276of1,856submissions,15%

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 73
  Total Citations
  View Citations
- 767
  Total Downloads
- Downloads (Last 12 months)60
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Mining billions of AST nodes to study actual and potential usage of Java language features

ICSE 2014: Proceedings of the 36th International Conference on Software Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

Understanding the API usage in Java

Evaluating the Java Native Interface JNI: Leveraging Existing Native Code, Libraries and Threads to a Running Java Virtual Machine

A study of Java's non-Java memory

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Mining billions of AST nodes to study actual and potential usage of Java language features

ICSE 2014: Proceedings of the 36th International Conference on Software Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

Understanding the API usage in Java

Evaluating the Java Native Interface JNI: Leveraging Existing Native Code, Libraries and Threads to a Running Java Virtual Machine

A study of Java's non-Java memory

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media