ABSTRACT
The systematic evaluation of program analyses as well as software-engineering tools requires benchmark suites that are representative of real-world projects in the domains for which the tools or analyses are designed. Such benchmarks currently only exist for a few research areas and even where they exist, they are often not effectively maintained, due to the required manual effort. This makes evaluating new analyses and tools on software that relies on current technologies often impossible. We describe ABM, a methodology to semi-automatically mine software repositories to extract up-to-date and representative sets of applications belonging to specific domains. The proposed methodology facilitates the creation of such collections and makes it easier to release updated versions of a benchmark suite. Resulting from an instantiation of the methodology, we present a collection of current real-world Java business web applications. The collection and methodology serve as a starting point for creating current, targeted benchmark suites, and thus helps to better evaluate current program-analysis and software-engineering tools.
- Adempiere. Adempiere home page. http://adempiere.org/ site/.Google Scholar
- AppDynamics. Appdynamics home page. https://www. appdynamics.com/.Google Scholar
- S. Arzt, S. Rasthofer, C. Fritz, E. Bodden, A. Bartel, J. Klein, Y. L. Traon, D. Octeau, and P. McDaniel. Flowdroid: precise context, flow, field, object-sensitive and lifecycle-aware taint analysis for android apps. In ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’14, Edinburgh, United Kingdom - June 09 - 11, 2014, page 29, 2014. Google ScholarDigital Library
- S. M. Blackburn, R. Garner, C. Hoffman, A. M. Khan, K. S. McKinley, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, M. Hirzel, A. Hosking, M. Jump, H. Lee, J. E. B. Moss, A. Phansalkar, D. Stefanovi´c, T. VanDrunen, D. von Dincklage, and B. Wiedermann. The DaCapo benchmarks: Java benchmarking development and analysis. In 21st Conference on Object-Oriented Programing, Systems, Languages, and Applications. ACM, 2006. Google ScholarDigital Library
- V. Dallmeier and T. Zimmermann. Extraction of bug localization benchmarks from history. In Proceedings of the Twenty-second IEEE/ACM International Conference on Automated Software Engineering, ASE ’07, pages 433–436, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
- J. Dujmovi´c. Automatic generation of benchmark and test workloads. In Proceedings of the First Joint WOSP/SIPEW International Conference on Performance Engineering, WOSP/SIPEW ’10, pages 263–274, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
- M. Eichberg, B. Hermann, M. Mezini, and L. Glanz. Hidden truths in dead software paths. In 10th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2015. ACM, 2015. Google ScholarDigital Library
- Enonic. Enonic home page. https://enonic.com/.Google Scholar
- GithubArchive. Githubarchive home page. https://www. githubarchive.org/.Google Scholar
- G. Gousios and D. Spinellis. Ghtorrent: Github’s data from a firehose. In Mining Software Repositories (MSR), 2012 9th IEEE Working Conference on, pages 12–21, June 2012. Google ScholarDigital Library
- INCF. Incf home page. http://www.incf.org/.Google Scholar
- B. Livshits. Defining a set of common benchmarks for web application security. In Proceedings of the Workshop on Defining the State of the Art in Software Security Tools, Aug. 2005.Google Scholar
- G. Richards, A. Gal, B. Eich, and J. Vitek. Automated construction of javascript benchmarks. In Proceedings of the 2011 ACM International Conference on Object Oriented Programming Systems Languages and Applications, OOPSLA ’11, pages 677–694, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
- A. Sewe. Design and Analysis of a Scala Benchmark Suite for the Java Virtual Machine. PhD thesis, TU Darmstadt, Darmstadt, Okt. 2012.Google Scholar
- J. Späth, L. N. Q. Do, K. Ali, and E. Bodden. Boomerang: Demanddriven flow- and context-sensitive pointer analysis for java. In ECOOP, 2016. To appear.Google Scholar
- E. Tempero, C. Anslow, J. Dietrich, T. Han, J. Li, M. Lumpe, H. Melton, and J. Noble. Qualitas Corpus: A Curated Collection of Java Code for Empirical Studies. In Asia Pacific Software Engineering Conference. IEEE, 2010. Google ScholarDigital Library
- Web Technology Surveys. Usage of content management systems for websites. http://w3techs.com/technologies/overview/ content_management/all.Google Scholar
Index Terms
- Toward an automated benchmark management system
Recommendations
Savina - An Actor Benchmark Suite: Enabling Empirical Evaluation of Actor Libraries
AGERE! '14: Proceedings of the 4th International Workshop on Programming based on Actors Agents & Decentralized ControlThis paper introduces the Savina benchmark suite for actor-oriented programs. Our goal is to provide a standard benchmark suite that enables researchers and application developers to compare different actor implementations and identify those that ...
The PARSEC benchmark suite: characterization and architectural implications
PACT '08: Proceedings of the 17th international conference on Parallel architectures and compilation techniquesThis paper presents and characterizes the Princeton Application Repository for Shared-Memory Computers (PARSEC), a benchmark suite for studies of Chip-Multiprocessors (CMPs). Previous available benchmarks for multiprocessors have focused on high-...
Scaling of the PARSEC benchmark inputs
PACT '10: Proceedings of the 19th international conference on Parallel architectures and compilation techniquesA good benchmark suite should provide users with inputs that have multiple levels of fidelity. We present a framework that takes the novel view that benchmark inputs should be considered approximations of their original, full-sized inputs. The paper ...
Comments