Skip to main content
Top
Published in: Empirical Software Engineering 6/2023

01-11-2023

Evaluating seed selection for fuzzing JavaScript engines

Authors: Ming Wen, Yongcong Wang, Yifan Xia, Hai Jin

Published in: Empirical Software Engineering | Issue 6/2023

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

JavaScript (JS), as a platform-independent programming language, remains to be the most popular language over the years. However, popular JavaScript engines that have been widely utilized by web browsers to interpret JS code, have become the most common targets for attackers. Thus ensuring the security and reliability of JS engines is significant. Fuzzing is a simple yet effective method to unveil vulnerabilities. However, existing JS fuzzers focus more on the design of effective mutation mechanisms to generate diverse and valid seeds while they often ignore the importance of the initial seed corpus selected to drive the fuzzing process. In this paper, we performed extensive experiments to systematically evaluate the impact of seed selection on fuzzing JavaScript engines. In particular, we investigate seed selections from three main dimensions, their collected sources (e.g., CVE PoCs, Regression tests, etc.), the number and sizes, as well as a set of concerned code properties. Our major findings reveal that seeds collected from different sources can cast a significant impact on the fuzzing effectiveness (i.e., CVE PoC is significantly better than the other types of seeds), and seed files containing those concerned code structures can lead existing fuzzers to achieve superior results in terms of both code coverage and unique crashes identified. Inspired by our observations, we devised a simple heuristic to prioritize JavaScript files when selecting seed corpus. Our experiments show that when driven by our selected seed corpus, the existing state-of-art fuzzer is able to achieve significantly higher code coverage and identify more crashes.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Footnotes
1
A fuzzer usually perform a dry-run on the seed corpus to obtain the initial information.
 
Literature
go back to reference Aschermann C, Frassetto T, Holz T, Jauernig P, Sadeghi AR, Teuchert D (2019) Nautilus: Fishing For Deep Bugs With Grammars. In: NDSS Aschermann C, Frassetto T, Holz T, Jauernig P, Sadeghi AR, Teuchert D (2019) Nautilus: Fishing For Deep Bugs With Grammars. In: NDSS
go back to reference Athanasakis M, Athanasopoulos E, Polychronakis M, Portokalidis G, Ioannidis S (2015) The devil is in the constants: Bypassing defenses in browser jit engines. In: NDSS Athanasakis M, Athanasopoulos E, Polychronakis M, Portokalidis G, Ioannidis S (2015) The devil is in the constants: Bypassing defenses in browser jit engines. In: NDSS
go back to reference Böhme M, Pham VT, Roychoudhury A (2017) Coverage-based greybox fuzzing as markov chain. IEEE Trans Softw Eng 45(5):489–506CrossRef Böhme M, Pham VT, Roychoudhury A (2017) Coverage-based greybox fuzzing as markov chain. IEEE Trans Softw Eng 45(5):489–506CrossRef
go back to reference Böhme M, Pham VT, Roychoudhury A (2017) Coverage-based greybox fuzzing as markov chain. IEEE Trans Softw Eng 45(5):489–506 Böhme M, Pham VT, Roychoudhury A (2017) Coverage-based greybox fuzzing as markov chain. IEEE Trans Softw Eng 45(5):489–506
go back to reference Chen Y, Zhong R, Hu H, Zhang H, Yang Y, Wu D, Lee W (2021) One engine to fuzz’em all: Generic language processor testing with semantic validation. In: Proc 42nd IEEE Symp Secur Priv (Oakland) Chen Y, Zhong R, Hu H, Zhang H, Yang Y, Wu D, Lee W (2021) One engine to fuzz’em all: Generic language processor testing with semantic validation. In: Proc 42nd IEEE Symp Secur Priv (Oakland)
go back to reference Cummins C, Petoumenos P, Murray A, Leather H (2018) Compiler fuzzing through deep learning. In: Proc 27th ACM SIGSOFT Int Symp Soft Test Anal pp 95–105 Cummins C, Petoumenos P, Murray A, Leather H (2018) Compiler fuzzing through deep learning. In: Proc 27th ACM SIGSOFT Int Symp Soft Test Anal pp 95–105
go back to reference Gers FA, Schmidhuber J, Cummins F (2000) Learning to forget: Continual prediction with lstm. Neural Comput 12(10):2451–2471 Gers FA, Schmidhuber J, Cummins F (2000) Learning to forget: Continual prediction with lstm. Neural Comput 12(10):2451–2471
go back to reference Gers FA, Schmidhuber J, Cummins F (2000) Learning to forget: Continual prediction with lstm. Neural Comput 12(10):2451–2471CrossRef Gers FA, Schmidhuber J, Cummins F (2000) Learning to forget: Continual prediction with lstm. Neural Comput 12(10):2451–2471CrossRef
go back to reference Han H, Oh D, Cha SK (2018) Codealchemist: Semantics-aware code generation to find vulnerabilities in javascript engines. In: NDSS Han H, Oh D, Cha SK (2018) Codealchemist: Semantics-aware code generation to find vulnerabilities in javascript engines. In: NDSS
go back to reference He X, Xie X, Li Y, Sun J, Li F, Zou W, Liu Y, Yu L, Zhou J, Shi W, Huo W (2021) Sofi: Reflection-augmented fuzzing for javascript engines. CCS ’21 He X, Xie X, Li Y, Sun J, Li F, Zou W, Liu Y, Yu L, Zhou J, Shi W, Huo W (2021) Sofi: Reflection-augmented fuzzing for javascript engines. CCS ’21
go back to reference Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Computation 9(8):1735–1780CrossRef Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Computation 9(8):1735–1780CrossRef
go back to reference Lee S, Han H, Cha SK, Son S (2020) Montage: A neural network language model-guided javascript engine fuzzer. In: 29th USENIX Secur Symp (USENIX Security 20) pp 2613–2630 Lee S, Han H, Cha SK, Son S (2020) Montage: A neural network language model-guided javascript engine fuzzer. In: 29th USENIX Secur Symp (USENIX Security 20) pp 2613–2630
go back to reference Lemieux C, Sen K (2018) Fairfuzz: A targeted mutation strategy for increasing greybox fuzz testing coverage. In: Proc 33rd ACM/IEEE Int Conf Autom Softw Eng pp 475–485 Lemieux C, Sen K (2018) Fairfuzz: A targeted mutation strategy for increasing greybox fuzz testing coverage. In: Proc 33rd ACM/IEEE Int Conf Autom Softw Eng pp 475–485
go back to reference Lyu C, Ji S, Zhang C, Li Y, Lee WH, Song Y, Beyah R (2018) MOPT: Optimized mutation scheduling for fuzzers. In: 28th USENIX Secur Symp (USENIX Security 19) pp 1949–1966 Lyu C, Ji S, Zhang C, Li Y, Lee WH, Song Y, Beyah R (2018) MOPT: Optimized mutation scheduling for fuzzers. In: 28th USENIX Secur Symp (USENIX Security 19) pp 1949–1966
go back to reference Molinyawe M, Hariri AA, Spelman J (2016) Shell on earth: From browser to system compromise. Proc Black Hat USA Molinyawe M, Hariri AA, Spelman J (2016) Shell on earth: From browser to system compromise. Proc Black Hat USA
go back to reference Patra J, Pradel M (2016) Learning to fuzz: Application-independent fuzz testing with probabilistic, generative models of input data. TU Darmstadt, Department of Computer Science, Tech. Rep. TUD-CS-2016-14664 Patra J, Pradel M (2016) Learning to fuzz: Application-independent fuzz testing with probabilistic, generative models of input data. TU Darmstadt, Department of Computer Science, Tech. Rep. TUD-CS-2016-14664
go back to reference Pham VT, Böhme M, Santosa AE, Căciulescu AR, Roychoudhury A (2019) Smart greybox fuzzing. IEEE Trans Softw Eng 47(9):1980–1997 Pham VT, Böhme M, Santosa AE, Căciulescu AR, Roychoudhury A (2019) Smart greybox fuzzing. IEEE Trans Softw Eng 47(9):1980–1997
go back to reference Pham VT, Böhme M, Santosa AE, Căciulescu AR, Roychoudhury A (2019) Smart greybox fuzzing. IEEE Trans Softw Eng 47(9):1980–1997 Pham VT, Böhme M, Santosa AE, Căciulescu AR, Roychoudhury A (2019) Smart greybox fuzzing. IEEE Trans Softw Eng 47(9):1980–1997
go back to reference Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I et al (2019) Language models are unsupervised multitask learners. OpenAI blog 1(8):9 Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I et al (2019) Language models are unsupervised multitask learners. OpenAI blog 1(8):9
go back to reference Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I et al (2019) Language models are unsupervised multitask learners. OpenAI blog 1(8):9 Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I et al (2019) Language models are unsupervised multitask learners. OpenAI blog 1(8):9
go back to reference Raychev V, Bielik P, Vechev M, Krause A (2016) Learning programs from noisy data. ACM Sigplan Notices 51(1):761–774CrossRefMATH Raychev V, Bielik P, Vechev M, Krause A (2016) Learning programs from noisy data. ACM Sigplan Notices 51(1):761–774CrossRefMATH
go back to reference Raychev V, Bielik P, Vechev M, Krause A (2016) Learning programs from noisy data. ACM Sigplan Notices 51(1):761–774 Raychev V, Bielik P, Vechev M, Krause A (2016) Learning programs from noisy data. ACM Sigplan Notices 51(1):761–774
go back to reference Reddy S, Lemieux C, Padhye R, Sen K (2021) Quickly generating diverse valid test inputs with reinforcement learning. In: 2020 IEEE/ACM 42nd Int Conf Softw Eng (ICSE) pp 1410–1421. IEEE Reddy S, Lemieux C, Padhye R, Sen K (2021) Quickly generating diverse valid test inputs with reinforcement learning. In: 2020 IEEE/ACM 42nd Int Conf Softw Eng (ICSE) pp 1410–1421. IEEE
go back to reference Rohlf C, Ivnitskiy Y (2011) Attacking clientside jit compilers. Black Hat USA Rohlf C, Ivnitskiy Y (2011) Attacking clientside jit compilers. Black Hat USA
go back to reference Romano A, Lehmann D, Pradel M, Wang W (2021) Wobfuscator: Obfuscating javascript malware via opportunistic translation to webassembly Romano A, Lehmann D, Pradel M, Wang W (2021) Wobfuscator: Obfuscating javascript malware via opportunistic translation to webassembly
go back to reference Veggalam S, Rawat S, Haller I, Bos H (2016) Ifuzzer: An evolutionary interpreter fuzzer using genetic programming. In: I. Askoxylakis S, Ioannidis S, Katsikas C Meadows, (ed) Comput Secur - ESORICS 2016 pp 581–601. Springer International Publishing, Cham Veggalam S, Rawat S, Haller I, Bos H (2016) Ifuzzer: An evolutionary interpreter fuzzer using genetic programming. In: I. Askoxylakis S, Ioannidis S, Katsikas C Meadows, (ed) Comput Secur - ESORICS 2016 pp 581–601. Springer International Publishing, Cham
go back to reference Wang J, Chen B, Wei L, Liu Y (2017) Skyfire: Data-driven seed generation for fuzzing. In: 2017 IEEE Symp Secur Priv (SP) pp 579–594. IEEE Wang J, Chen B, Wei L, Liu Y (2017) Skyfire: Data-driven seed generation for fuzzing. In: 2017 IEEE Symp Secur Priv (SP) pp 579–594. IEEE
go back to reference Wang J, Chen B, Wei L, Liu Y (2019) Superion: Grammar-aware greybox fuzzing. In: 2019 IEEE/ACM 41st Int Conf Softw Eng (ICSE) pp 724–735. IEEE Wang J, Chen B, Wei L, Liu Y (2019) Superion: Grammar-aware greybox fuzzing. In: 2019 IEEE/ACM 41st Int Conf Softw Eng (ICSE) pp 724–735. IEEE
go back to reference Ye G, Tang Z, Tan SH, Huang S, Fang D, Sun X, Bian L, Wang H, Wang Z (2021) Automated conformance testing for javascript engines via deep compiler fuzzing. In: PLDI pp 435–450 Ye G, Tang Z, Tan SH, Huang S, Fang D, Sun X, Bian L, Wang H, Wang Z (2021) Automated conformance testing for javascript engines via deep compiler fuzzing. In: PLDI pp 435–450
Metadata
Title
Evaluating seed selection for fuzzing JavaScript engines
Authors
Ming Wen
Yongcong Wang
Yifan Xia
Hai Jin
Publication date
01-11-2023
Publisher
Springer US
Published in
Empirical Software Engineering / Issue 6/2023
Print ISSN: 1382-3256
Electronic ISSN: 1573-7616
DOI
https://doi.org/10.1007/s10664-023-10340-9

Other articles of this Issue 6/2023

Empirical Software Engineering 6/2023 Go to the issue

Premium Partner