skip to main content
10.1145/2568225.2568295acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Mining billions of AST nodes to study actual and potential usage of Java language features

Published:31 May 2014Publication History

ABSTRACT

Programming languages evolve over time, adding additional language features to simplify common tasks and make the language easier to use. For example, the Java Language Specification has four editions and is currently drafting a fifth. While the addition of language features is driven by an assumed need by the community (often with direct requests for such features), there is little empirical evidence demonstrating how these new features are adopted by developers once released. In this paper, we analyze over 31k open-source Java projects representing over 9 million Java files, which when parsed contain over 18 billion AST nodes. We analyze this corpus to find uses of new Java language features over time. Our study gives interesting insights, such as: there are millions of places features could potentially be used but weren't; developers convert existing code to use new features; and we found thousands of instances of potential resource handling bugs.

References

  1. Eclipse. http://www.eclipse.org/, 2014.Google ScholarGoogle Scholar
  2. Eclipse Java development tools (JDT). http://www.eclipse.org/jdt/overview.php, 2014.Google ScholarGoogle Scholar
  3. Netbeans. http://www.netbeans.org/, 2014.Google ScholarGoogle Scholar
  4. Netbeans inspect and transform. https://netbeans.org/kb/docs/java/ editor-inspect-transform.html#convert, 2014.Google ScholarGoogle Scholar
  5. Apache Software Foundation. Hadoop: Open source implementation of MapReduce. http://hadoop.apache.org/, 2014.Google ScholarGoogle Scholar
  6. P. F. Baldi, C. V. Lopes, E. J. Linstead, and S. K. Bajracharya. A theory of aspects as latent topics. In Proceedings of the 23rd ACM SIGPLAN conference on Object-Oriented Programming Systems Languages and Applications, OOPSLA, pages 543–562, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. H. A. Basit, D. C. Rajapakse, and S. Jarzabek. An empirical study on limits of clone unification using generics. In Proceedings of the 17th International Conference on Software Engineering and Knowledge Engineering, SEKE, pages 109–114, 2005.Google ScholarGoogle Scholar
  8. G. Bracha, M. Odersky, D. Stoutamire, and P. Wadler. Making the future safe for the past: adding genericity to the Java programming language. SIGPLAN Not., 33(10), Oct. 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. O. Callaú, R. Robbes, E. Tanter, and D. Röthlisberger. How developers use the dynamic features of programming languages: the case of Smalltalk. In Proceedings of the 8th Working Conference on Mining Software Repositories, MSR, pages 23–32, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. S. Christensen, A. Møller, and M. I. Schwartzbach. Precise analysis of string expressions. In Proceedings of the 10th international conference on Static Analysis, SAS, pages 1–18, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. Dean and S. Ghemawat. MapReduce: simplified data processing on large clusters. In Proceedings of the 6th Symposium on Operating Systems Design and Implementation, OSDI, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Dice Holdings, Inc. Sourceforge website. http://sourceforge.net/, 2014.Google ScholarGoogle Scholar
  13. E. Duala-Ekoko and M. P. Robillard. Using structure-based recommendations to facilitate discoverability in APIs. In Proceedings of the 25th European conference on Object-oriented programming, ECOOP, pages 79–104, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. R. Dyer, H. Nguyen, H. Rajan, and T. N. Nguyen. Boa: A language and infrastructure for analyzing ultra-large-scale software repositories. In Proceedings of the 35th ACM/IEEE International Conference on Software Engineering, ICSE, pages 422–431, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. R. Dyer, H. Rajan, and T. N. Nguyen. Declarative visitors to ease fine-grained source code mining with full history on billions of AST nodes. In Proceedings of the 12th International Conference on Generative Programming: Concepts & Experiences, GPCE, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. T. Gorschek, E. Tempero, and L. Angelis. A large-scale empirical study of practitioners’ use of object-oriented concepts. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering, ICSE, pages 115–124, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Gosling, B. Joy, and G. Steele. Java(TM) Language Specification. Addison-Wesley Longman Publishing Co., Inc., 1st edition, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. Gosling, B. Joy, G. Steele, and G. Bracha. Java(TM) Language Specification. Addison-Wesley Longman Publishing Co., Inc., 2nd edition, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. J. Gosling, B. Joy, G. Steele, and G. Bracha. Java(TM) Language Specification. Addison-Wesley Professional, 3rd edition, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. Gosling, B. Joy, G. Steele, G. Bracha, and A. Buckley. Java(TM) Language Specification. Prentice Hall, Java SE 7 edition, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M. Grechanik, C. McMillan, L. DeFerrari, M. Comi, S. Crespi, D. Poshyvanyk, C. Fu, Q. Xie, and C. Ghezzi. An empirical investigation into a large-scale Java open source code repository. In International Symposium on Empirical Software Engineering and Measurement, ESEM, pages 11:1–11:10, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. Hoppe and S. Hanenberg. Do developers benefit from generic types? An empirical comparison of generic and raw types in Java. In 4th ACM SIGPLAN conference on Systems, Programming, Languages and Applications: Software for Humanity, SPLASH, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. E. Linstead, S. Bajracharya, T. Ngo, P. Rigor, C. Lopes, and P. Baldi. Sourcerer: mining and searching internet-scale software repositories. Data Mining and Knowledge Discovery, 18, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. B. Livshits, J. Whaley, and M. S. Lam. Reflection analysis for Java. In Proceedings of the Third Asian conference on Programming Languages and Systems, APLAS, pages 139–160, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. L. Meyerovich and A. Rabkin. Empirical analysis of programming language adoption. In 4th ACM SIGPLAN conference on Systems, Programming, Languages and Applications: Software for Humanity, SPLASH, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. R. Muschevici, A. Potanin, E. Tempero, and J. Noble. Multiple dispatch in practice. In Proceedings of the 23rd ACM SIGPLAN conference on Object-oriented programming systems languages and applications, OOPSLA, pages 563–582, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. C. Parnin, C. Bird, and E. R. Murphy-Hill. Java generics adoption: how new features are introduced, championed, or ignored. In 8th IEEE International Working Conference on Mining Software Repositories, MSR, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. H. Rajan, T. N. Nguyen, R. Dyer, and H. A. Nguyen. Boa website. http://boa.cs.iastate.edu/, 2014.Google ScholarGoogle Scholar
  29. P. Ratanaworabhan, B. Livshits, and B. G. Zorn. Jsmeter: comparing the behavior of JavaScript benchmarks with real web applications. In Proceedings of the 2010 USENIX conference on Web application development, WebApps, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. P. Resnick and H. R. Varian. Recommender systems. Commun. ACM, 40(3):56–58, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. G. Richards, C. Hammer, B. Burg, and J. Vitek. The eval that men do: A large-scale study of the use of eval in JavaScript applications. In Proceedings of the 25th European conference on Object-oriented programming, ECOOP, pages 52–78, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. G. Richards, S. Lebresne, B. Burg, and J. Vitek. An analysis of the dynamic behavior of JavaScript programs. In Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation, PLDI, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. M. Robillard, R. Walker, and T. Zimmermann. Recommendation systems for software engineering. IEEE Software, 27(4):80–86, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. S. R. Schach. Object-oriented and Classical Software Engineering. McGraw-Hill Higher Education. McGraw-Hill Higher Education, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. E. Tempero. How fields are used in Java: An empirical study. In Proceedings of the 20th Australian Software Engineering Conference, ASWEC, pages 91–100, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. E. Tempero, J. Noble, and H. Melton. How do Java programs use inheritance? An empirical study of inheritance in Java software. In Proceedings of the 22nd European conference on Object-Oriented Programming, ECOOP, pages 667–691, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. W. Weimer and G. C. Necula. Finding and preventing run-time error handling mistakes. In Proceedings of the 19th ACM SIGPLAN conference on Object-oriented programming systems languages and applications, OOPSLA, pages 419–431, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. C. Yue and H. Wang. Characterizing insecure JavaScript practices on the web. In Proceedings of the 18th international conference on World Wide Web, WWW, pages 961–970, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Mining billions of AST nodes to study actual and potential usage of Java language features

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      ICSE 2014: Proceedings of the 36th International Conference on Software Engineering
      May 2014
      1139 pages
      ISBN:9781450327565
      DOI:10.1145/2568225

      Copyright © 2014 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 31 May 2014

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate276of1,856submissions,15%

      Upcoming Conference

      ICSE 2025

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader