ABSTRACT
Researchers have reported that static analysis tools rarely achieve a false-positive rate that would make them attractive to developers. We overcome this problem by a technique that leads to reporting fewer bugs but also much fewer false positives. Our technique prunes the static call graph that sits at the core of many static analyses. Specifically, static call-graph construction proceeds as usual, after which a call-graph pruner removes many false-positive edges but few true edges. The challenge is to strike a balance between being aggressive in removing false-positive edges but not so aggressive that no true edges remain. We achieve this goal by automatically producing a call-graph pruner through an automatic, ahead-of-time learning process. We added such a call-graph pruner to a software tool for null-pointer analysis and found that the false-positive rate decreased from 73% to 23%. This improvement makes the tool more useful to developers.
- Karim Ali and Ondřej Lhoták. 2012. Application-Only Call Graph Construction. In ECOOP 2012 - Object-Oriented Programming, James Noble (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 688--712.Google Scholar
- Miltiadis Allamanis, Marc Brockschmidt, and Mahmoud Khademi. 2018. Learning to Represent Programs with Graphs. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net. https://openreview.net/forum?id=BJOFETxR-Google Scholar
- Shay Artzi, Adam Kiezun, David Glasser, and Michael D. Ernst. 2007. Combined Static and Dynamic Mutability Analysis. In Proceedings of the Twenty-Second IEEE/ACM International Conference on Automated Software Engineering (Atlanta, Georgia, USA) (ASE '07). Association for Computing Machinery, New York, NY, USA, 104--113. Google ScholarDigital Library
- Subarno Banerjee, Lazaro Clapp, and Manu Sridharan. 2019. NullAway: Practical Type-Based Null Safety for Java. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (Tallinn, Estonia) (ESEC/FSE 2019). Association for Computing Machinery, New York, NY, USA, 740--750. Google ScholarDigital Library
- James Bergstra and Yoshua Bengio. 2012. Random Search for Hyper-parameter Optimization. J. Mach. Learn. Res. 13, 1 (Feb. 2012), 281--305. http://dl.acm.org/citation.cfm?id=2503308.2188395Google ScholarDigital Library
- Al Bessey, Ken Block, Ben Chelf, Andy Chou, Bryan Fulton, Seth Hallem, Charles Henri-Gros, Asya Kamsky, Scott McPeak, and Dawson Engler. 2010. A Few Billion Lines of Code Later: Using Static Analysis to Find Bugs in the Real World. Commun. ACM 53, 2 (Feb. 2010), 66--75. Google ScholarDigital Library
- Sam Blackshear, Bor-Yuh Evan Chang, and Manu Sridharan. 2013. Thresher: Precise Refutations for Heap Reachability. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation (Seattle, Washington, USA) (PLDI '13). Association for Computing Machinery, New York, NY, USA, 275--286. Google ScholarDigital Library
- Sam Blackshear, Bor-Yuh Evan Chang, and Manu Sridharan. 2015. Selective Control-Flow Abstraction via Jumping. In Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (Pittsburgh, PA, USA) (OOPSLA 2015). Association for Computing Machinery, New York, NY, USA, 163--182. Google ScholarDigital Library
- Martin Bravenboer and Yannis Smaragdakis. 2009. Strictly Declarative Specification of Sophisticated Points-to Analyses. In Proceedings of the 24th ACM SIGPLAN Conference on Object Oriented Programming Systems Languages and Applications (Orlando, Florida, USA) (OOPSLA '09). ACM, New York, NY, USA, 243--262. Google ScholarDigital Library
- Leo Breiman. 1996. Bagging predictors. Machine Learning 24, 2 (01 Aug 1996), 123--140. Google ScholarCross Ref
- Raymond P. L. Buse and Westley Weimer. 2009. The Road Not Taken: Estimating Path Execution Frequency Statically. In Proceedings of the 31st International Conference on Software Engineering (ICSE '09). IEEE Computer Society, USA, 144--154. Google ScholarDigital Library
- Sooyoung Cha, Sehun Jeong, and Hakjoo Oh. 2018. A scalable learning algorithm for data-driven program analysis. Information and Software Technology 104 (2018), 1--13. Google ScholarCross Ref
- Tianyi Chen, Kihong Heo, and Mukund Raghothaman. 2021. Boosting Static Analysis Accuracy with Instrumented Test Executions. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (Athens, Greece) (ESEC/FSE 2021). Association for Computing Machinery, New York, NY, USA, 1154--1165. Google ScholarDigital Library
- Maria Christakis and Christian Bird. 2016. What Developers Want and Need from Program Analysis: An Empirical Study. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering (Singapore, Singapore) (ASE 2016). Association for Computing Machinery, New York, NY, USA, 332--343. Google ScholarDigital Library
- Lori Flynn, William Snavely, David Svoboda, Nathan VanHoudnos, Richard Qin, Jennifer Burns, David Zubrow, Robert Stoddard, and Guillermo Marce-Santurio. 2018. Prioritizing Alerts from Multiple Static Analysis Tools, Using Classification Models. In Proceedings of the 1st International Workshop on Software Qualities and Their Dependencies (Gothenburg, Sweden) (SQUADE '18). Association for Computing Machinery, New York, NY, USA, 13--20. Google ScholarDigital Library
- Neville Grech, George Fourtounis, Adrian Francalanza, and Yannis Smaragdakis. 2018. Shooting from the Heap: Ultra-Scalable Static Analysis with Heap Snapshots. In Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis (Amsterdam, Netherlands) (ISSTA 2018). Association for Computing Machinery, New York, NY, USA, 198--208. Google ScholarDigital Library
- Aditya Grover and Jure Leskovec. 2016. Node2vec: Scalable Feature Learning for Networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (San Francisco, California, USA) (KDD '16). Association for Computing Machinery, New York, NY, USA, 855--864. Google ScholarDigital Library
- Sarah Heckman and Laurie Williams. 2009. A Model Building Process for Identifying Actionable Static Analysis Alerts. In Proceedings of the 2009 International Conference on Software Testing Verification and Validation (ICST '09). IEEE Computer Society, USA, 161--170. Google ScholarDigital Library
- Tin Kam Ho. 1995. Random Decision Forests. In Proceedings of the Third International Conference on Document Analysis and Recognition (Volume 1) - Volume 1 (ICDAR '95). IEEE Computer Society, USA, 278.Google ScholarDigital Library
- David Hovemeyer and William Pugh. 2004. Finding Bugs is Easy. SIGPLAN Not. 39, 12 (Dec. 2004), 92--106. Google ScholarDigital Library
- Laurent Hubert, Thomas Jensen, and David Pichardie. 2008. Semantic Foundations and Inference of Non-null Annotations. In Formal Methods for Open Object-Based Distributed Systems, Gilles Barthe and Frank S. de Boer (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 132--149.Google Scholar
- Kazuaki Ishizaki, Motohiro Kawahito, Toshiaki Yasue, Hideaki Komatsu, and Toshio Nakatani. 2000. A Study of Devirtualization Techniques for a Java Just-InTime Compiler. In Proceedings of the 15th ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications (Minneapolis, Minnesota, USA) (OOPSLA '00). Association for Computing Machinery, New York, NY, USA, 294--310. Google ScholarDigital Library
- Minseok Jeon, Myungho Lee, and Hakjoo Oh. 2020. Learning Graph-Based Heuristics for Pointer Analysis without Handcrafting Application-Specific Features. Proc. ACM Program. Lang. 4, OOPSLA, Article 179 (Nov. 2020), 30 pages. Google ScholarDigital Library
- Sehun Jeong, Minseok Jeon, Sungdeok Cha, and Hakjoo Oh. 2017. Data-Driven Context-Sensitivity for Points-to Analysis. Proc. ACM Program. Lang. 1, OOPSLA, Article 100 (oct 2017), 28 pages. Google ScholarDigital Library
- Brittany Johnson, Yoonki Song, Emerson Murphy-Hill, and Robert Bowdidge. 2013. Why Don't Software Developers Use Static Analysis Tools to Find Bugs?. In Proceedings of the 2013 International Conference on Software Engineering (San Francisco, CA, USA) (ICSE '13). IEEE Press, 672--681.Google ScholarCross Ref
- Christian Gram Kalhauge and Jens Palsberg. 2018. Sound Deadlock Prediction. Proc. ACM Program. Lang. 2, OOPSLA, Article 146 (Oct. 2018), 29 pages. Google ScholarDigital Library
- Sarfraz Khurshid, Corina S. Păsăreanu, and Willem Visser. 2003. Generalized Symbolic Execution for Model Checking and Testing. In Proceedings of the 9th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (Warsaw, Poland) (TACAS'03). Springer-Verlag, Berlin, Heidelberg, 553--568.Google ScholarDigital Library
- S. B. Kotsiantis. 2007. Supervised Machine Learning: A Review of Classification Techniques. In Proceedings of the 2007 Conference on Emerging Artificial Intelligence Applications in Computer Engineering. IOS Press, NLD, 3--24.Google Scholar
- Ondrej Lhoták. 2007. Comparing Call Graphs. In Proceedings of the 7th ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering (San Diego, California, USA) (PASTE '07). Association for Computing Machinery, New York, NY, USA, 37--42. Google ScholarDigital Library
- Yue Li, Tian Tan, Anders Møller, and Yannis Smaragdakis. 2018. Scalability-First Pointer Analysis with Self-Tuning Context-Sensitivity. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (Lake Buena Vista, FL, USA) (ESEC/FSE 2018). Association for Computing Machinery, New York, NY, USA, 129--140. Google ScholarDigital Library
- Benjamin Livshits, Manu Sridharan, Yannis Smaragdakis, Ondřej Lhoták, J. Nelson Amaral, Bor-Yuh Evan Chang, Samuel Z. Guyer, Uday P. Khedker, Anders Møller, and Dimitrios Vardoulakis. 2015. In Defense of Soundiness: A Manifesto. Commun. ACM 58, 2 (Jan. 2015), 44--46. Google ScholarDigital Library
- Yi Lu, Daniel Wainwright, and Michael Reif. 2020. Probabilistic call-graph construction. (Jul 2020). US Patent No. 10,719,314 B2.Google Scholar
- Ravi Mangal, Xin Zhang, Aditya V. Nori, and Mayur Naik. 2015. A User-Guided Approach to Program Analysis. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering (Bergamo, Italy) (ESEC/FSE 2015). Association for Computing Machinery, New York, NY, USA, 462--473. Google ScholarDigital Library
- Aravind Nair, Avijit Roy, and Karl Meinke. 2020. FuncGNN: A Graph Neural Network Approach to Program Similarity. In Proceedings of the 14th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM) (Bari, Italy) (ESEM '20). Association for Computing Machinery, New York, NY, USA, Article 10, 11 pages. Google ScholarDigital Library
- Jens Palsberg and Cristina V. Lopes. 2018. NJR: A Normalized Java Resource. In Companion Proceedings for the ISSTA/ECOOP 2018 Workshops (Amsterdam, Netherlands) (ISSTA '18). Association for Computing Machinery, New York, NY, USA, 100--106. Google ScholarDigital Library
- F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825--2830.Google ScholarDigital Library
- Mukund Raghothaman, Sulekha Kulkarni, Kihong Heo, and Mayur Naik. 2018. User-Guided Program Reasoning Using Bayesian Inference. SIGPLAN Not. 53, 4 (June 2018), 722--735. Google ScholarDigital Library
- Atanas Rountev, Scott Kagan, and Michael Gibas. 2004. Static and Dynamic Analysis of Call Chains in Java. In Proceedings of the 2004 ACM SIGSOFT International Symposium on Software Testing and Analysis (Boston, Massachusetts, USA) (ISSTA '04). Association for Computing Machinery, New York, NY, USA, 1--11. Google ScholarDigital Library
- Joseph R. Ruthruff, John Penix, J. David Morgenthaler, Sebastian Elbaum, and Gregg Rothermel. 2008. Predicting Accurate and Actionable Static Analysis Warnings: An Experimental Approach. In Proceedings of the 30th International Conference on Software Engineering (Leipzig, Germany) (ICSE '08). Association for Computing Machinery, New York, NY, USA, 341--350. Google ScholarDigital Library
- Caitlin Sadowski, Edward Aftandilian, Alex Eagle, Liam Miller-Cushon, and Ciera Jaspan. 2018. Lessons from Building Static Analysis Tools at Google. Commun. ACM 61, 4 (March 2018), 58--66. Google ScholarDigital Library
- J. Sawin and A. Rountev. 2011. Assumption Hierarchy for a CHA Call Graph Construction Algorithm. In 2011 IEEE 11th International Working Conference on Source Code Analysis and Manipulation. 35--44.Google Scholar
- Scikit-learn. [n.d.]. Feature importances with a forest of trees. https://scikitlearn.org/stable/auto_examples/ensemble/plot_forest_importances.html.Google Scholar
- Li Sui, Jens Dietrich, Amjed Tahir, and George Fourtounis. 2020. On the Recall of Static Call Graph Construction in Practice (ICSE '20). Association for Computing Machinery, New York, NY, USA, 1049--1060. Google ScholarDigital Library
- Omer Tripp, Salvatore Guarnieri, Marco Pistoia, and Aleksandr Aravkin. 2014. ALETHEIA: Improving the Usability of Static Security Analysis. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security (Scottsdale, Arizona, USA) (CCS '14). Association for Computing Machinery, New York, NY, USA, 762--774. Google ScholarDigital Library
- Akshay Utture, Christian Gram Kalhauge, Shuyang Liu, and Jens Palsberg. 2020. NJR-1 Dataset. Google ScholarCross Ref
- Akshay Utture, Christian Gram Kalhauge, Shuyang Liu, and Jens Palsberg. 2021. Artifact for ICSE-22 submission "Striking a Balance: Pruning False-Positives from Static Call Graphs". Google ScholarCross Ref
- WALA. 2015. IBM, "T.J. Watson Libraries for Analysis (WALA),". http://wala.sourceforge.net.Google Scholar
- Yu Wang, Ke Wang, Fengjuan Gao, and Linzhang Wang. 2020. Learning Semantic Program Embeddings with Graph Interval Neural Network. Proc. ACM Program. Lang. 4, OOPSLA, Article 137 (Nov. 2020), 27 pages. Google ScholarDigital Library
- U. Yüksel and H. Sözer. 2013. Automated Classification of Static Code Analysis Alerts: A Case Study. In 2013 IEEE International Conference on Software Maintenance. 532--535.Google Scholar
- Weilei Zhang and Barbara G. Ryder. 2007. Automatic Construction of Accurate Application Call Graph with Library Call Abstraction for Java: Research Articles. J. Softw. Maint. Evol. 19, 4 (July 2007), 231--252.Google Scholar
Index Terms
- Striking a balance: pruning false-positives from static call graphs
Recommendations
A new assembly variation analysis model based on the method of power balance for auto-body parts
Purpose - The purpose of this paper is to propose a new assembly variation analysis model to analyze assembly variation for sheet metal parts. The main focus is to analyze assembly processes based on the method of power balance. Design/methodology/...
Interprocedural pointer alias analysis
We present practical approximation methods for computing and representing interprocedural aliases for a program written in a language that includes pointers, reference parameters, and recursion. We present the following contributions: (1) a framework ...
Hawker Beechcraft Uses a New Solution Approach to Balance Assembly Lines
In 2002, Hawker Beechcraft set out to improve the efficiency of its Hawker 800XP assembly line. The line was paced such that an aircraft was moved to the next workstation on the line on a schedule---even if work in a previous workstation had not ...
Comments