Top

Empirical Software Engineering

Published in:

01-11-2023

Evaluating seed selection for fuzzing JavaScript engines

Authors: Ming Wen, Yongcong Wang, Yifan Xia, Hai Jin

Published in: Empirical Software Engineering | Issue 6/2023

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

JavaScript (JS), as a platform-independent programming language, remains to be the most popular language over the years. However, popular JavaScript engines that have been widely utilized by web browsers to interpret JS code, have become the most common targets for attackers. Thus ensuring the security and reliability of JS engines is significant. Fuzzing is a simple yet effective method to unveil vulnerabilities. However, existing JS fuzzers focus more on the design of effective mutation mechanisms to generate diverse and valid seeds while they often ignore the importance of the initial seed corpus selected to drive the fuzzing process. In this paper, we performed extensive experiments to systematically evaluate the impact of seed selection on fuzzing JavaScript engines. In particular, we investigate seed selections from three main dimensions, their collected sources (e.g., CVE PoCs, Regression tests, etc.), the number and sizes, as well as a set of concerned code properties. Our major findings reveal that seeds collected from different sources can cast a significant impact on the fuzzing effectiveness (i.e., CVE PoC is significantly better than the other types of seeds), and seed files containing those concerned code structures can lead existing fuzzers to achieve superior results in terms of both code coverage and unique crashes identified. Inspired by our observations, we devised a simple heuristic to prioritize JavaScript files when selecting seed corpus. Our experiments show that when driven by our selected seed corpus, the existing state-of-art fuzzer is able to achieve significantly higher code coverage and identify more crashes.

previous article Do RESTful API design rules have an impact on the understandability of Web APIs?

next article Automated detection, categorisation and developers’ experience with the violations of honesty in mobile apps

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

A fuzzer usually perform a dry-run on the seed corpus to obtain the initial information.

(2019) A collection of javascript engine cves with pocs. https://github.com/tunz/js-vuln-db

Apple Javascriptcore (2014) The Built-in Javascript Engine for Webkit. https://trac.webkit.org/wiki/JavaScriptCore

Aschermann C, Frassetto T, Holz T, Jauernig P, Sadeghi AR, Teuchert D (2019) Nautilus: Fishing For Deep Bugs With Grammars. In: NDSS

Athanasakis M, Athanasopoulos E, Polychronakis M, Portokalidis G, Ioannidis S (2015) The devil is in the constants: Bypassing defenses in browser jit engines. In: NDSS

Böhme M, Pham VT, Roychoudhury A (2017) Coverage-based greybox fuzzing as markov chain. IEEE Trans Softw Eng 45(5):489–506CrossRef

Böhme M, Pham VT, Roychoudhury A (2017) Coverage-based greybox fuzzing as markov chain. IEEE Trans Softw Eng 45(5):489–506

Chen Y, Zhong R, Hu H, Zhang H, Yang Y, Wu D, Lee W (2021) One engine to fuzz’em all: Generic language processor testing with semantic validation. In: Proc 42nd IEEE Symp Secur Priv (Oakland)

Cummins C, Petoumenos P, Murray A, Leather H (2018) Compiler fuzzing through deep learning. In: Proc 27th ACM SIGSOFT Int Symp Soft Test Anal pp 95–105

Ecma (2019) standard ecma-262. https://www.ecma-international.org/publications/standards/Ecma-262.htm

Gers FA, Schmidhuber J, Cummins F (2000) Learning to forget: Continual prediction with lstm. Neural Comput 12(10):2451–2471

Gers FA, Schmidhuber J, Cummins F (2000) Learning to forget: Continual prediction with lstm. Neural Comput 12(10):2451–2471CrossRef

Godefroid P, Peleg H, Singh R (2017) Learn amp;fuzz: Machine learning for input fuzzing. In: 2017 32nd IEEE/ACM Int Conf Autom Softw Eng (ASE) pp 50–59. https://doi.org/10.1109/ASE.2017.8115618

Han H, Oh D, Cha SK (2018) Codealchemist: Semantics-aware code generation to find vulnerabilities in javascript engines. In: NDSS

Herrera A, Gunadi H, Magrath S, Norrish M, Payer M, Hosking AL (2021) Seed selection for successful fuzzing. In: Proc 30th ACM SIGSOFT Int Symp Softw Test Anal ISSTA 2021 Assoc Comput Mach. New York, NY, USA pp 230–243. https://doi.org/10.1145/3460319.3464795

He X, Xie X, Li Y, Sun J, Li F, Zou W, Liu Y, Yu L, Zhou J, Shi W, Huo W (2021) Sofi: Reflection-augmented fuzzing for javascript engines. CCS ’21

Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Computation 9(8):1735–1780CrossRef

Holler C, Herzig K, Zeller A (2012) Fuzzing with code fragments. In: 21st USENIX Secur Symp (USENIX Security 12) pp 445–458. USENIX Association, Bellevue, WA. https://www.usenix.org/conference/usenixsecurity12/technical-sessions/presentation/holler

Klees G, Ruef A, Cooper B, Wei S, Hicks M (2018) Evaluating fuzz testing. In: Proc 2018 ACM SIGSAC Conf Comput Commun Secur CCS’18 pp 2123–2138. Assoc Comput Mach. New York, NY, USA. https://doi.org/10.1145/3243734.3243804

Language Ranking (2021). https://madnight.github.io/githut/#/pullrequests/2021/3 Accessed 28 Oct 2021

Lee S, Han H, Cha SK, Son S (2020) Montage: A neural network language model-guided javascript engine fuzzer. In: 29th USENIX Secur Symp (USENIX Security 20) pp 2613–2630

Lemieux C, Sen K (2018) Fairfuzz: A targeted mutation strategy for increasing greybox fuzz testing coverage. In: Proc 33rd ACM/IEEE Int Conf Autom Softw Eng pp 475–485

LLVM Project (2015) Libfuzzer. https://llvm.org/docs/LibFuzzer.html#value-profile. Accessed 10 Jan 2021

Lyu C, Ji S, Zhang C, Li Y, Lee WH, Song Y, Beyah R (2018) MOPT: Optimized mutation scheduling for fuzzers. In: 28th USENIX Secur Symp (USENIX Security 19) pp 1949–1966

Mann HB, Whitney DR (1947) On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other. The Annals of Mathematical Statistics 18(1):50– 60. DOI 10.1214/aoms/1177730491. https://doi.org/10.1214/aoms/1177730491

Mann HB, Whitney DR (1947) On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other. The Annals of Mathematical Statistics 18(1):50–60. https://doi.org/10.1214/aoms/1177730491

Molinyawe M, Hariri AA, Spelman J (2016) Shell on earth: From browser to system compromise. Proc Black Hat USA

Official Ecmascript Conformance Test Suite (1997). https://github.com/tc39/test262

Patra J, Pradel M (2016) Learning to fuzz: Application-independent fuzz testing with probabilistic, generative models of input data. TU Darmstadt, Department of Computer Science, Tech. Rep. TUD-CS-2016-14664

Pham VT, Böhme M, Santosa AE, Căciulescu AR, Roychoudhury A (2019) Smart greybox fuzzing. IEEE Trans Softw Eng 47(9):1980–1997

Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I et al (2019) Language models are unsupervised multitask learners. OpenAI blog 1(8):9

Raychev V, Bielik P, Vechev M, Krause A (2016) Learning programs from noisy data. ACM Sigplan Notices 51(1):761–774CrossRefMATH

Raychev V, Bielik P, Vechev M, Krause A (2016) Learning programs from noisy data. ACM Sigplan Notices 51(1):761–774

Reddy S, Lemieux C, Padhye R, Sen K (2021) Quickly generating diverse valid test inputs with reinforcement learning. In: 2020 IEEE/ACM 42nd Int Conf Softw Eng (ICSE) pp 1410–1421. IEEE

Rohlf C, Ivnitskiy Y (2011) Attacking clientside jit compilers. Black Hat USA

Romano A, Lehmann D, Pradel M, Wang W (2021) Wobfuscator: Obfuscating javascript malware via opportunistic translation to webassembly

R. Swiecki. Honggfuzz. (2016). http://code.google.com/p/honggfuzz

The React.js Library (2013). https://reactjs.org. Accessed 28 Oct 2021

Theori INC (2019) pwn.js. https://github.com/theori-io/pwnjs

Veggalam S, Rawat S, Haller I, Bos H (2016) Ifuzzer: An evolutionary interpreter fuzzer using genetic programming. In: I. Askoxylakis S, Ioannidis S, Katsikas C Meadows, (ed) Comput Secur - ESORICS 2016 pp 581–601. Springer International Publishing, Cham

Wang J, Chen B, Wei L, Liu Y (2017) Skyfire: Data-driven seed generation for fuzzing. In: 2017 IEEE Symp Secur Priv (SP) pp 579–594. IEEE

Wang J, Chen B, Wei L, Liu Y (2019) Superion: Grammar-aware greybox fuzzing. In: 2019 IEEE/ACM 41st Int Conf Softw Eng (ICSE) pp 724–735. IEEE

Ye G, Tang Z, Tan SH, Huang S, Fang D, Sun X, Bian L, Wang H, Wang Z (2021) Automated conformance testing for javascript engines via deep compiler fuzzing. In: PLDI pp 435–450

Title: Evaluating seed selection for fuzzing JavaScript engines
Authors: Ming Wen
Yongcong Wang
Yifan Xia
Hai Jin
Publication date: 01-11-2023
Publisher: Springer US
Published in: Empirical Software Engineering / Issue 6/2023
Print ISSN: 1382-3256
Electronic ISSN: 1573-7616
DOI: https://doi.org/10.1007/s10664-023-10340-9

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Other articles of this Issue 6/2023

Developers talking about code quality

Generating and detecting true ambiguity: a forgotten danger in DNN supervision testing

Computation offloading for ground robotic systems communicating over WiFi – an empirical exploration on performance and energy trade-offs

What kinds of contracts do ML APIs need?

GitHub Actions: The Impact on the Pull Request Process

An empirical comparison of ethnic and gender diversity of DevOps and non-DevOps contributions to open-source projects

Premium Partner