Skip to main content
Erschienen in: Empirical Software Engineering 1/2022

01.01.2022

Demystifying regular expression bugs

A comprehensive study on regular expression bug causes, fixes, and testing

verfasst von: Peipei Wang, Chris Brown, Jamie A. Jennings, Kathryn T. Stolee

Erschienen in: Empirical Software Engineering | Ausgabe 1/2022

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Regular expressions cause string-related bugs and open security vulnerabilities for DOS attacks. However, beyond ReDoS (Regular expression Denial of Service), little is known about the extent to which regular expression issues affect software development and how these issues are addressed in practice. We conduct an empirical study of 356 regex-related bugs from merged pull requests in Apache, Mozilla, Facebook, and Google GitHub repositories. We identify and classify the nature of the regular expression problems, the fixes, and the related changes in the test code. The most important findings in this paper are as follows: 1) incorrect regular expression semantics is the dominant root cause of regular expression bugs (165/356, 46.3%). The remaining root causes are incorrect API usage (9.3%) and other code issues that require regular expression changes in the fix (29.5%), 2) fixing regular expression bugs is nontrivial as it takes more time and more lines of code to fix them compared to the general pull requests, 3) most (51%) of the regex-related pull requests do not contain test code changes. Certain regex bug types (e.g., compile error, performance issues, regex representation) are less likely to include test code changes than others, and 4) the dominant type of test code changes in regex-related pull requests is test case addition (75%). The results of this study contribute to a broader understanding of the practical problems faced by developers when using, fixing, and testing regular expressions.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Fußnoten
1
While we avoid many perils of mining GitHub (Kalliamvakou et al. 2014) through our selection of organizations and projects (i.e., Perils II, III, IV, V, and VI), evaluating only merged pull requests is Peril VIII and thus a threat to validity, as discussed in Section 8.
 
2
The specific error message is “ValueError: cannot use LOCALE flag with a str pattern”. Since Python version 3.6, re.LOCALE can be used only with bytes patterns.
 
3
Production code is the part of the source code containing the logic of the software and runs in the production environment. Test code is the other part of the source code containing the tests which verify whether the production code exercises the expected logic.
 
12
In this paper, we correct the naming of one of the manifestations from the original paper (Wang et al. 2020). In the original paper, we use the term, incorrect behavior but it should be incorrect semantics, per the definitions in Section 3.2.
 
13
Since ReDoS cares about the time complexity of running the regular expression, we regard it as a performance issue.
 
15
In Section 4.1.2 the regexes having compile errors are literal strings, hard-coded in the source code. In contrast, this section describes the regexes which are composed using variables that are passed into the regex API.
 
18
The skewness score for normal distributions is zero.
 
19
We lost the data for three bugs between the original paper (Wang et al. 2020) and this analysis of the test code. We know there were test code changes, but the details are no longer available.
 
20
The former reports 94% of found bug-fixing patches contain test cases and the latter reports only four patches with no tests changed.
 
Literatur
Zurück zum Zitat Amann S, Nadi S, Nguyen HA, Nguyen TN, Mezini M (2016) Mubench: A benchmark for api-misuse detectors. In: Proceedings of the 13th international conference on mining software repositories, pp. 464–467 Amann S, Nadi S, Nguyen HA, Nguyen TN, Mezini M (2016) Mubench: A benchmark for api-misuse detectors. In: Proceedings of the 13th international conference on mining software repositories, pp. 464–467
Zurück zum Zitat Amann S, Nguyen HA, Nadi S, Nguyen TN, Mezini M (2018) A systematic evaluation of static api-misuse detectors. IEEE Trans Softw Eng 45(12):1170–1188CrossRef Amann S, Nguyen HA, Nadi S, Nguyen TN, Mezini M (2018) A systematic evaluation of static api-misuse detectors. IEEE Trans Softw Eng 45(12):1170–1188CrossRef
Zurück zum Zitat Bae S, Cho H, Lim I, Ryu S (2014) Safewapi: Web api misuse detector for web applications. In: Proceedings of the 22nd ACM SIGSOFT international symposium on foundations of software engineering. pp 507–517 Bae S, Cho H, Lim I, Ryu S (2014) Safewapi: Web api misuse detector for web applications. In: Proceedings of the 22nd ACM SIGSOFT international symposium on foundations of software engineering. pp 507–517
Zurück zum Zitat Bai GR, Clee B, Shrestha N, Chapman C, Wright C, Stolee KT (2019) Exploring tools and strategies used during regular expression composition tasks. In: Proceedings of the 27th international conference on program comprehension. IEEE Press, pp 197–208 Bai GR, Clee B, Shrestha N, Chapman C, Wright C, Stolee KT (2019) Exploring tools and strategies used during regular expression composition tasks. In: Proceedings of the 27th international conference on program comprehension. IEEE Press, pp 197–208
Zurück zum Zitat Beller M, Gousios G, Zaidman A (2017) Oops, my tests broke the build: An explorative analysis of travis ci with github. In: 2017 IEEE/ACM 14th international conference on mining software repositories (MSR). IEEE, pp 356–367 Beller M, Gousios G, Zaidman A (2017) Oops, my tests broke the build: An explorative analysis of travis ci with github. In: 2017 IEEE/ACM 14th international conference on mining software repositories (MSR). IEEE, pp 356–367
Zurück zum Zitat Brown WH, Malveau RC, McCormick HW, Mowbray TJ (1998) AntiPatterns: refactoring software, architectures, and projects in crisis Brown WH, Malveau RC, McCormick HW, Mowbray TJ (1998) AntiPatterns: refactoring software, architectures, and projects in crisis
Zurück zum Zitat Chapman C, Stolee KT (2016) Exploring regular expression usage and context in python. In: Proceedings of the 25th international symposium on software testing and analysis. ACM, pp 282–293 Chapman C, Stolee KT (2016) Exploring regular expression usage and context in python. In: Proceedings of the 25th international symposium on software testing and analysis. ACM, pp 282–293
Zurück zum Zitat Chapman C, Wang P, Stolee KT (2017) Exploring regular expression comprehension. In: Proceedings of the 32nd IEEE/ACM international conference on automated software engineering. IEEE Press, pp 405–416 Chapman C, Wang P, Stolee KT (2017) Exploring regular expression comprehension. In: Proceedings of the 32nd IEEE/ACM international conference on automated software engineering. IEEE Press, pp 405–416
Zurück zum Zitat Chen Q, Wang X, Ye X, Durrett G, Dillig I (2020) Multi-modal synthesis of regular expressions. In: Proceedings of the 41st ACM SIGPLAN conference on programming language design and implementation. pp 487–502 Chen Q, Wang X, Ye X, Durrett G, Dillig I (2020) Multi-modal synthesis of regular expressions. In: Proceedings of the 41st ACM SIGPLAN conference on programming language design and implementation. pp 487–502
Zurück zum Zitat Cody-Kenny B, Fenton M, Ronayne A, Considine E, McGuire T, O’Neill M (2017) A search for improved performance in regular expressions. In: Proceedings of the genetic and evolutionary computation conference. ACM, pp 1280–1287 Cody-Kenny B, Fenton M, Ronayne A, Considine E, McGuire T, O’Neill M (2017) A search for improved performance in regular expressions. In: Proceedings of the genetic and evolutionary computation conference. ACM, pp 1280–1287
Zurück zum Zitat Coelho R, Almeida L, Gousios G, van Deursen A (2015) Unveiling exception handling bug hazards in android based on github and google code issues. In: 2015 IEEE/ACM 12th working conference on mining software repositories. IEEE, pp 134–145 Coelho R, Almeida L, Gousios G, van Deursen A (2015) Unveiling exception handling bug hazards in android based on github and google code issues. In: 2015 IEEE/ACM 12th working conference on mining software repositories. IEEE, pp 134–145
Zurück zum Zitat Davis JC, Coghlan CA, Servant F, Lee D (2018) The impact of regular expression denial of service (ReDoS) in practice: an empirical study at the ecosystem scale. In: The ACM joint European software engineering conference and symposium on the foundations of software engineering (ESEC/FSE) Davis JC, Coghlan CA, Servant F, Lee D (2018) The impact of regular expression denial of service (ReDoS) in practice: an empirical study at the ecosystem scale. In: The ACM joint European software engineering conference and symposium on the foundations of software engineering (ESEC/FSE)
Zurück zum Zitat Davis JC, Michael IV, Louis G, Coghlan CA, Servant F (2019) Why aren’t regular expressions a lingua franca? an empirical study on the re-use and portability of regular expressions. In: Proceedings of the 2019 27th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering. ACM, pp 443–454 Davis JC, Michael IV, Louis G, Coghlan CA, Servant F (2019) Why aren’t regular expressions a lingua franca? an empirical study on the re-use and portability of regular expressions. In: Proceedings of the 2019 27th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering. ACM, pp 443–454
Zurück zum Zitat Davis JC, Moyer D, Kazerouni AM, Lee D (2019) Testing regex generalizability and its implications: A large-scale many-language measurement study. In: 2019 34th IEEE/ACM international conference on automated software engineering (ASE). IEEE, pp 427–439 Davis JC, Moyer D, Kazerouni AM, Lee D (2019) Testing regex generalizability and its implications: A large-scale many-language measurement study. In: 2019 34th IEEE/ACM international conference on automated software engineering (ASE). IEEE, pp 427–439
Zurück zum Zitat Davis JC, Servant F, Lee D (2021) Using selective memoization to defeat regular expression denial of service (redos). In: 2021 IEEE symposium on security and privacy (SP) Davis JC, Servant F, Lee D (2021) Using selective memoization to defeat regular expression denial of service (redos). In: 2021 IEEE symposium on security and privacy (SP)
Zurück zum Zitat Di Franco A, Guo H, Rubio-González C (2017) A comprehensive study of real-world numerical bug characteristics. In: Proceedings of the 32nd IEEE/ACM international conference on automated software engineering. IEEE Press, pp 509–519 Di Franco A, Guo H, Rubio-González C (2017) A comprehensive study of real-world numerical bug characteristics. In: Proceedings of the 32nd IEEE/ACM international conference on automated software engineering. IEEE Press, pp 509–519
Zurück zum Zitat Dig D, Johnson R (2006) How do apis evolve? a story of refactoring. J Softw Maint Evol Res Pract 18(2):83–107CrossRef Dig D, Johnson R (2006) How do apis evolve? a story of refactoring. J Softw Maint Evol Res Pract 18(2):83–107CrossRef
Zurück zum Zitat Eghbali A, Pradel M (2020) No strings attached: An empirical study of string-related software bugs. In: 2020 35th IEEE/ACM international conference on automated software engineering (ASE). IEEE, pp 956–967 Eghbali A, Pradel M (2020) No strings attached: An empirical study of string-related software bugs. In: 2020 35th IEEE/ACM international conference on automated software engineering (ASE). IEEE, pp 956–967
Zurück zum Zitat Fowler M (2018) Refactoring: improving the design of existing code. Addison-Wesley Professional, BostonMATH Fowler M (2018) Refactoring: improving the design of existing code. Addison-Wesley Professional, BostonMATH
Zurück zum Zitat Gousios G, Pinzger M, Deursen AV (2014) An exploratory study of the pull-based software development model. In: Proceedings of the 36th international conference on software engineering. ACM, pp 345–355 Gousios G, Pinzger M, Deursen AV (2014) An exploratory study of the pull-based software development model. In: Proceedings of the 36th international conference on software engineering. ACM, pp 345–355
Zurück zum Zitat Gousios G, Zaidman A (2014) A dataset for pull-based development research. In: Proceedings of the 11th working conference on mining software repositories. ACM, pp 368–371 Gousios G, Zaidman A (2014) A dataset for pull-based development research. In: Proceedings of the 11th working conference on mining software repositories. ACM, pp 368–371
Zurück zum Zitat Gu Z, Wu J, Liu J, Zhou M, Gu M (2019) An empirical study on api-misuse bugs in open-source c programs. In: 2019 IEEE 43rd annual computer software and applications conference (COMPSAC), vol 1. IEEE, pp 11–20 Gu Z, Wu J, Liu J, Zhou M, Gu M (2019) An empirical study on api-misuse bugs in open-source c programs. In: 2019 IEEE 43rd annual computer software and applications conference (COMPSAC), vol 1. IEEE, pp 11–20
Zurück zum Zitat Gyimesi P, Vancsics B, Stocco A, Mazinanian D, Beszédes A., Ferenc R, Mesbah A (2019) Bugsjs: a benchmark of javascript bugs. In: 2019 12th IEEE conference on software testing, validation and verification (ICST). IEEE, pp 90–101 Gyimesi P, Vancsics B, Stocco A, Mazinanian D, Beszédes A., Ferenc R, Mesbah A (2019) Bugsjs: a benchmark of javascript bugs. In: 2019 12th IEEE conference on software testing, validation and verification (ICST). IEEE, pp 90–101
Zurück zum Zitat Herzig K, Just S, Zeller A (2013) It’s not a bug, it’s a feature: how misclassification impacts bug prediction. In: Proceedings of the 2013 international conference on software engineering. IEEE Press, pp 392–401 Herzig K, Just S, Zeller A (2013) It’s not a bug, it’s a feature: how misclassification impacts bug prediction. In: Proceedings of the 2013 international conference on software engineering. IEEE Press, pp 392–401
Zurück zum Zitat Just R, Jalali D, Ernst MD (2014) Defects4j: A database of existing faults to enable controlled testing studies for java programs. In: Proceedings of the 2014 international symposium on software testing and analysis, pp. 437–440 Just R, Jalali D, Ernst MD (2014) Defects4j: A database of existing faults to enable controlled testing studies for java programs. In: Proceedings of the 2014 international symposium on software testing and analysis, pp. 437–440
Zurück zum Zitat Kabinna S, Bezemer CP, Shang W, Hassan AE (2016) Logging library migrations: A case study for the apache software foundation projects. In: 2016 IEEE/ACM 13th working conference on mining software repositories (MSR). IEEE, pp 154–164 Kabinna S, Bezemer CP, Shang W, Hassan AE (2016) Logging library migrations: A case study for the apache software foundation projects. In: 2016 IEEE/ACM 13th working conference on mining software repositories (MSR). IEEE, pp 154–164
Zurück zum Zitat Kalliamvakou E, Gousios G, Blincoe K, Singer L, German DM, Damian D (2014) The promises and perils of mining github. In: Proceedings of the 11th working conference on mining software repositories. pp. 92–101 Kalliamvakou E, Gousios G, Blincoe K, Singer L, German DM, Damian D (2014) The promises and perils of mining github. In: Proceedings of the 11th working conference on mining software repositories. pp. 92–101
Zurück zum Zitat Kapur P, Cossette B, Walker RJ (2010) Refactoring references for library migration. In: Proceedings of the ACM international conference on Object oriented programming systems languages and applications. pp. 726–738 Kapur P, Cossette B, Walker RJ (2010) Refactoring references for library migration. In: Proceedings of the ACM international conference on Object oriented programming systems languages and applications. pp. 726–738
Zurück zum Zitat Kechagia M, Devroey X, Panichella A, Gousios G, van Deursen A (2019) Effective and efficient api misuse detection via exception propagation and search-based testing. In: Proceedings of the 28th ACM SIGSOFT international symposium on software testing and analysis. pp 192–203 Kechagia M, Devroey X, Panichella A, Gousios G, van Deursen A (2019) Effective and efficient api misuse detection via exception propagation and search-based testing. In: Proceedings of the 28th ACM SIGSOFT international symposium on software testing and analysis. pp 192–203
Zurück zum Zitat Khomh F, Di Penta M, Gueheneuc YG (2009) An exploratory study of the impact of code smells on software change-proneness. In: 2009 16th working conference on reverse engineering. IEEE, pp 75–84 Khomh F, Di Penta M, Gueheneuc YG (2009) An exploratory study of the impact of code smells on software change-proneness. In: 2009 16th working conference on reverse engineering. IEEE, pp 75–84
Zurück zum Zitat Kim M, Cai D, Kim S (2011) An empirical investigation into the role of api-level refactorings during software evolution. In: Proceedings of the 33rd international conference on software engineering. pp 151–160 Kim M, Cai D, Kim S (2011) An empirical investigation into the role of api-level refactorings during software evolution. In: Proceedings of the 33rd international conference on software engineering. pp 151–160
Zurück zum Zitat Ko D, Ma K, Park S, Kim S, Kim D, Le Traon Y (2014) Api document quality for resolving deprecated apis. In: 2014 21st Asia-Pacific software engineering conference, vol 2. IEEE, pp 27–30 Ko D, Ma K, Park S, Kim S, Kim D, Le Traon Y (2014) Api document quality for resolving deprecated apis. In: 2014 21st Asia-Pacific software engineering conference, vol 2. IEEE, pp 27–30
Zurück zum Zitat Kochhar PS, Bissyandé T. F., Lo D, Jiang L (2013) Adoption of software testing in open source projects–a preliminary study on 50,000 projects. In: 2013 17th european conference on software maintenance and reengineering. IEEE, pp 353–356 Kochhar PS, Bissyandé T. F., Lo D, Jiang L (2013) Adoption of software testing in open source projects–a preliminary study on 50,000 projects. In: 2013 17th european conference on software maintenance and reengineering. IEEE, pp 353–356
Zurück zum Zitat Larson E, Kirk A (2016) Generating evil test strings for regular expressions. In: 2016 IEEE international conference on software testing, verification and validation (ICST). IEEE, pp 309–319 Larson E, Kirk A (2016) Generating evil test strings for regular expressions. In: 2016 IEEE international conference on software testing, verification and validation (ICST). IEEE, pp 309–319
Zurück zum Zitat Locascio N, Narasimhan K, DeLeon E, Kushman N, Barzilay R (2016) Neural generation of regular expressions from natural language with minimal domain knowledge. arXiv:1608.03000 Locascio N, Narasimhan K, DeLeon E, Kushman N, Barzilay R (2016) Neural generation of regular expressions from natural language with minimal domain knowledge. arXiv:1608.​03000
Zurück zum Zitat Lou Y, Chen Z, Cao Y, Hao D, Zhang L (2020) Understanding build issue resolution in practice: Symptoms and fix patterns. In: Proceedings of the 28th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering, ESEC/FSE 2020. Association for Computing Machinery, New York, pp 617–628. https://doi.org/10.1145/3368089.3409760 Lou Y, Chen Z, Cao Y, Hao D, Zhang L (2020) Understanding build issue resolution in practice: Symptoms and fix patterns. In: Proceedings of the 28th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering, ESEC/FSE 2020. Association for Computing Machinery, New York, pp 617–628. https://​doi.​org/​10.​1145/​3368089.​3409760
Zurück zum Zitat Lu J, Chen L, Li L, Feng X (2019) Understanding node change bugs for distributed systems. In: 2019 IEEE 26th international conference on software analysis, evolution and reengineering (SANER). IEEE, pp 399–410 Lu J, Chen L, Li L, Feng X (2019) Understanding node change bugs for distributed systems. In: 2019 IEEE 26th international conference on software analysis, evolution and reengineering (SANER). IEEE, pp 399–410
Zurück zum Zitat Lu S, Park S, Seo E, Zhou Y (2008) Learning from mistakes: a comprehensive study on real world concurrency bug characteristics. In: Proceedings of the 13th international conference on Architectural support for programming languages and operating systems. pp 329–339 Lu S, Park S, Seo E, Zhou Y (2008) Learning from mistakes: a comprehensive study on real world concurrency bug characteristics. In: Proceedings of the 13th international conference on Architectural support for programming languages and operating systems. pp 329–339
Zurück zum Zitat Ma W, Chen L, Zhang X, Zhou Y, Xu B (2017) How do developers fix cross-project correlated bugs? a case study on the github scientific python ecosystem. In: 2017 IEEE/ACM 39th international conference on software engineering (ICSE). IEEE, pp 381–392 Ma W, Chen L, Zhang X, Zhou Y, Xu B (2017) How do developers fix cross-project correlated bugs? a case study on the github scientific python ecosystem. In: 2017 IEEE/ACM 39th international conference on software engineering (ICSE). IEEE, pp 381–392
Zurück zum Zitat Maalej W, Nabil H (2015) Bug report, feature request, or simply praise? on automatically classifying app reviews. In: 2015 IEEE 23rd international requirements engineering conference (RE). IEEE, pp 116–125 Maalej W, Nabil H (2015) Bug report, feature request, or simply praise? on automatically classifying app reviews. In: 2015 IEEE 23rd international requirements engineering conference (RE). IEEE, pp 116–125
Zurück zum Zitat Madeiral F, Urli S, Maia M, Monperrus M (2019) Bears: An extensible java bug benchmark for automatic program repair studies. In: 2019 IEEE 26th international conference on software analysis, evolution and reengineering (SANER). IEEE, pp 468–478 Madeiral F, Urli S, Maia M, Monperrus M (2019) Bears: An extensible java bug benchmark for automatic program repair studies. In: 2019 IEEE 26th international conference on software analysis, evolution and reengineering (SANER). IEEE, pp 468–478
Zurück zum Zitat Majumder S, Chakraborty J, Agrawal A, Menzies T (2019) Why software projects need heroes (lessons learned from 1100+ projects). arXiv:1904.09954 Majumder S, Chakraborty J, Agrawal A, Menzies T (2019) Why software projects need heroes (lessons learned from 1100+ projects). arXiv:1904.​09954
Zurück zum Zitat Marsavina C, Romano D, Zaidman A (2014) Studying fine-grained co-evolution patterns of production and test code. In: 2014 IEEE 14th international working conference on source code analysis and manipulation. IEEE, pp 195–204 Marsavina C, Romano D, Zaidman A (2014) Studying fine-grained co-evolution patterns of production and test code. In: 2014 IEEE 14th international working conference on source code analysis and manipulation. IEEE, pp 195–204
Zurück zum Zitat Michael LG, Donohue J, Davis JC, Lee D, Servant F (2019) Regexes are hard: Decision-making, difficulties, and risks in programming regular expressions. In: ACM international conference on automated software engineering (ASE). ACM Michael LG, Donohue J, Davis JC, Lee D, Servant F (2019) Regexes are hard: Decision-making, difficulties, and risks in programming regular expressions. In: ACM international conference on automated software engineering (ASE). ACM
Zurück zum Zitat Mileva YM, Dallmeier V, Burger M, Zeller A (2009) Mining trends of library usage. In: Proceedings of the joint international and annual ERCIM workshops on Principles of software evolution (IWPSE) and software evolution (Evol) workshops. pp 57–62 Mileva YM, Dallmeier V, Burger M, Zeller A (2009) Mining trends of library usage. In: Proceedings of the joint international and annual ERCIM workshops on Principles of software evolution (IWPSE) and software evolution (Evol) workshops. pp 57–62
Zurück zum Zitat Mileva YM, Dallmeier V, Zeller A (2010) Mining api popularity. In: International academic and industrial conference on practice and research techniques. Springer, pp 173–180 Mileva YM, Dallmeier V, Zeller A (2010) Mining api popularity. In: International academic and industrial conference on practice and research techniques. Springer, pp 173–180
Zurück zum Zitat Moha N, Gueheneuc YG, Duchien L, Le Meur AF (2009) Decor: A method for the specification and detection of code and design smells. IEEE Trans Softw Eng 36(1):20–36CrossRef Moha N, Gueheneuc YG, Duchien L, Le Meur AF (2009) Decor: A method for the specification and detection of code and design smells. IEEE Trans Softw Eng 36(1):20–36CrossRef
Zurück zum Zitat Munaiah N, Kroh S, Cabrey C, Nagappan M (2017) Curating github for engineered software projects. Empir Softw Eng 22(6):3219–3253CrossRef Munaiah N, Kroh S, Cabrey C, Nagappan M (2017) Curating github for engineered software projects. Empir Softw Eng 22(6):3219–3253CrossRef
Zurück zum Zitat Ohira M, Yoshiyuki H, Yamatani Y (2016) A case study on the misclassification of software performance issues in an issue tracking system. In: 2016 IEEE/ACIS 15th international conference on computer and information science (ICIS). IEEE, pp 1–6 Ohira M, Yoshiyuki H, Yamatani Y (2016) A case study on the misclassification of software performance issues in an issue tracking system. In: 2016 IEEE/ACIS 15th international conference on computer and information science (ICIS). IEEE, pp 1–6
Zurück zum Zitat Palomba F, Bavota G, Di Penta M, Oliveto R, De Lucia A, Poshyvanyk D (2013) Detecting bad smells in source code using change history information. In: Proceedings of the 28th IEEE/ACM international conference on automated software engineering. IEEE Press, pp 268–278 Palomba F, Bavota G, Di Penta M, Oliveto R, De Lucia A, Poshyvanyk D (2013) Detecting bad smells in source code using change history information. In: Proceedings of the 28th IEEE/ACM international conference on automated software engineering. IEEE Press, pp 268–278
Zurück zum Zitat Park JU, Ko SK, Cognetta M, Han YS (2019) Softregex: Generating regex from natural language descriptions using softened regex equivalence. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP). pp 6426–6432 Park JU, Ko SK, Cognetta M, Han YS (2019) Softregex: Generating regex from natural language descriptions using softened regex equivalence. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP). pp 6426–6432
Zurück zum Zitat Perkins JH (2005) Automatically generating refactorings to support api evolution. In: Proceedings of the 6th ACM SIGPLAN-SIGSOFT workshop on program analysis for software tools and engineering. pp 111–114 Perkins JH (2005) Automatically generating refactorings to support api evolution. In: Proceedings of the 6th ACM SIGPLAN-SIGSOFT workshop on program analysis for software tools and engineering. pp 111–114
Zurück zum Zitat Pham R, Singer L, Liskin O, Figueira Filho F, Schneider K (2013) Creating a shared understanding of testing culture on a social coding site. In: 2013 35th international conference on software engineering (ICSE). IEEE, pp 112–121 Pham R, Singer L, Liskin O, Figueira Filho F, Schneider K (2013) Creating a shared understanding of testing culture on a social coding site. In: 2013 35th international conference on software engineering (ICSE). IEEE, pp 112–121
Zurück zum Zitat Pingclasai N, Hata H, Matsumoto KI (2013) Classifying bug reports to bugs and other requests using topic modeling. In: 2013 20Th asia-pacific software engineering conference (APSEC), vol 2. IEEE, pp 13–18 Pingclasai N, Hata H, Matsumoto KI (2013) Classifying bug reports to bugs and other requests using topic modeling. In: 2013 20Th asia-pacific software engineering conference (APSEC), vol 2. IEEE, pp 13–18
Zurück zum Zitat Pinto LS, Sinha S, Orso A (2012) Understanding myths and realities of test-suite evolution. In: Proceedings of the ACM SIGSOFT 20th international symposium on the foundations of software engineering, pp. 1–11 Pinto LS, Sinha S, Orso A (2012) Understanding myths and realities of test-suite evolution. In: Proceedings of the ACM SIGSOFT 20th international symposium on the foundations of software engineering, pp. 1–11
Zurück zum Zitat Rahman MM, Roy CK (2014) An insight into the pull requests of github. In: Proc. MSR, vol 14 Rahman MM, Roy CK (2014) An insight into the pull requests of github. In: Proc. MSR, vol 14
Zurück zum Zitat Saha RK, Lyu Y, Lam W, Yoshida H, Prasad MR (2018) Bugs.jar: A large-scale, diverse dataset of real-world java bugs. In: Proceedings of the 15th international conference on mining software repositories. pp 10–13 Saha RK, Lyu Y, Lam W, Yoshida H, Prasad MR (2018) Bugs.jar: A large-scale, diverse dataset of real-world java bugs. In: Proceedings of the 15th international conference on mining software repositories. pp 10–13
Zurück zum Zitat Selakovic M, Pradel M (2016) Performance issues and optimizations in javascript: an empirical study. In: Proceedings of the 38th international conference on software engineering. ACM, pp 61–72 Selakovic M, Pradel M (2016) Performance issues and optimizations in javascript: an empirical study. In: Proceedings of the 38th international conference on software engineering. ACM, pp 61–72
Zurück zum Zitat Sharma T, Fragkoulis M, Spinellis D (2016) Does your configuration code smell?. In: 2016 IEEE/ACM 13th working conference on mining software repositories (MSR). IEEE, pp 189–200 Sharma T, Fragkoulis M, Spinellis D (2016) Does your configuration code smell?. In: 2016 IEEE/ACM 13th working conference on mining software repositories (MSR). IEEE, pp 189–200
Zurück zum Zitat Shen Y, Jiang Y, Xu C, Yu P, Ma X, Lu J (2018) Rescue: Crafting regular expression dos attacks. In: 2018 33rd IEEE/ACM international conference on automated software engineering (ASE). IEEE, pp 225–235 Shen Y, Jiang Y, Xu C, Yu P, Ma X, Lu J (2018) Rescue: Crafting regular expression dos attacks. In: 2018 33rd IEEE/ACM international conference on automated software engineering (ASE). IEEE, pp 225–235
Zurück zum Zitat Shi L, Zhong H, Xie T, Li M (2011) An empirical study on evolution of api documentation. In: International conference on fundamental approaches to software engineering. Springer, pp 416–431 Shi L, Zhong H, Xie T, Li M (2011) An empirical study on evolution of api documentation. In: International conference on fundamental approaches to software engineering. Springer, pp 416–431
Zurück zum Zitat Spishak E, Dietl W, Ernst MD (2012) A type system for regular expressions. In: Proceedings of the 14th workshop on formal techniques for java-like programs. ACM, pp 20–26 Spishak E, Dietl W, Ernst MD (2012) A type system for regular expressions. In: Proceedings of the 14th workshop on formal techniques for java-like programs. ACM, pp 20–26
Zurück zum Zitat Staicu CA, Pradel M (2018) Freezing the web: A study of redos vulnerabilities in javascript-based web servers. In: 27th {USENIX} Security Symposium ({USENIX} Security 18). pp 361–376 Staicu CA, Pradel M (2018) Freezing the web: A study of redos vulnerabilities in javascript-based web servers. In: 27th {USENIX} Security Symposium ({USENIX} Security 18). pp 361–376
Zurück zum Zitat Tan L, Liu C, Li Z, Wang X, Zhou Y, Zhai C (2014) Bug characteristics in open source software. Empir Softw Eng 19(6):1665–1705CrossRef Tan L, Liu C, Li Z, Wang X, Zhou Y, Zhai C (2014) Bug characteristics in open source software. Empir Softw Eng 19(6):1665–1705CrossRef
Zurück zum Zitat Teyton C, Falleri JR, Blanc X (2012) Mining library migration graphs. In: 2012 19th working conference on reverse engineering. IEEE, pp 289–298 Teyton C, Falleri JR, Blanc X (2012) Mining library migration graphs. In: 2012 19th working conference on reverse engineering. IEEE, pp 289–298
Zurück zum Zitat Thung F, Haryono SA, Serrano L, Muller G, Lawall J, Lo D, Jiang L (2020) Automated deprecated-api usage update for android apps: How far are we?. In: 2020 IEEE 27th international conference on software analysis, evolution and reengineering (SANER). IEEE, pp 602–611 Thung F, Haryono SA, Serrano L, Muller G, Lawall J, Lo D, Jiang L (2020) Automated deprecated-api usage update for android apps: How far are we?. In: 2020 IEEE 27th international conference on software analysis, evolution and reengineering (SANER). IEEE, pp 602–611
Zurück zum Zitat Tufano M, Palomba F, Bavota G, Oliveto R, Di Penta M, De Lucia A, Poshyvanyk D (2015) When and why your code starts to smell bad. In: Proceedings of the 37th international conference on software engineering-volume 1. IEEE Press, pp. 403–414 Tufano M, Palomba F, Bavota G, Oliveto R, Di Penta M, De Lucia A, Poshyvanyk D (2015) When and why your code starts to smell bad. In: Proceedings of the 37th international conference on software engineering-volume 1. IEEE Press, pp. 403–414
Zurück zum Zitat Vahabzadeh A, Fard AM, Mesbah A (2015) An empirical study of bugs in test code. In: 2015 IEEE international conference on software maintenance and evolution (ICSME). IEEE, pp 101–110 Vahabzadeh A, Fard AM, Mesbah A (2015) An empirical study of bugs in test code. In: 2015 IEEE international conference on software maintenance and evolution (ICSME). IEEE, pp 101–110
Zurück zum Zitat Veanes M, De Halleux P, Tillmann N (2010) Rex: Symbolic regular expression explorer. In: 2010 Third international conference on software testing, verification and validation. IEEE, pp 498–507 Veanes M, De Halleux P, Tillmann N (2010) Rex: Symbolic regular expression explorer. In: 2010 Third international conference on software testing, verification and validation. IEEE, pp 498–507
Zurück zum Zitat Wan Z, Lo D, Xia X, Cai L (2017) Bug characteristics in blockchain systems: a large-scale empirical study. In: 2017 IEEE/ACM 14th international conference on mining software repositories (MSR). IEEE, pp 413–424 Wan Z, Lo D, Xia X, Cai L (2017) Bug characteristics in blockchain systems: a large-scale empirical study. In: 2017 IEEE/ACM 14th international conference on mining software repositories (MSR). IEEE, pp 413–424
Zurück zum Zitat Wang P, Brown C, Jennings JA, Stolee KT (2020) An empirical study on regular expression bugs. In: Proceedings of the 17th international conference on mining software repositories. pp 103–113 Wang P, Brown C, Jennings JA, Stolee KT (2020) An empirical study on regular expression bugs. In: Proceedings of the 17th international conference on mining software repositories. pp 103–113
Zurück zum Zitat Wang P, Gina R, Stolee KT (2019) Exploring regular expression evolution. In: 2019 IEEE international conference on software analysis, evolution and reengineering (SANER). IEEE, pp 502–513 Wang P, Gina R, Stolee KT (2019) Exploring regular expression evolution. In: 2019 IEEE international conference on software analysis, evolution and reengineering (SANER). IEEE, pp 502–513
Zurück zum Zitat Wang P, Stolee KT (2018) How well are regular expressions tested in the wild?. In: Proceedings of the 2018 26th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering. ACM, pp 668–678 Wang P, Stolee KT (2018) How well are regular expressions tested in the wild?. In: Proceedings of the 2018 26th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering. ACM, pp 668–678
Zurück zum Zitat Wang X, Hong Y, Chang H, Park K, Langdale G, Hu J, Zhu H (2019) Hyperscan: a fast multi-pattern regex matcher for modern cpus. In: 16th {USENIX} symposium on networked systems design and implementation ({NSDI} 19). pp 631–648 Wang X, Hong Y, Chang H, Park K, Langdale G, Hu J, Zhu H (2019) Hyperscan: a fast multi-pattern regex matcher for modern cpus. In: 16th {USENIX} symposium on networked systems design and implementation ({NSDI} 19). pp 631–648
Zurück zum Zitat Widyasari R, Sim SQ, Lok C, Qi H, Phan J, Tay Q, Tan C, Wee F, Tan JE, Yieh Y et al (2020) Bugsinpy: a database of existing bugs in python programs to enable controlled testing and debugging studies. In: Proceedings of the 28th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering. pp 1556–1560 Widyasari R, Sim SQ, Lok C, Qi H, Phan J, Tay Q, Tan C, Wee F, Tan JE, Yieh Y et al (2020) Bugsinpy: a database of existing bugs in python programs to enable controlled testing and debugging studies. In: Proceedings of the 28th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering. pp 1556–1560
Zurück zum Zitat Wu L, Wu Q, Liang G, Wang Q, Jin Z (2015) Transforming code with compositional mappings for api-library switching. In: 2015 IEEE 39th annual computer software and applications conference, vol. 2. IEEE, pp 316–325 Wu L, Wu Q, Liang G, Wang Q, Jin Z (2015) Transforming code with compositional mappings for api-library switching. In: 2015 IEEE 39th annual computer software and applications conference, vol. 2. IEEE, pp 316–325
Zurück zum Zitat Wüstholz V., Olivo O, Heule MJ, Dillig I (2017) Static detection of dos vulnerabilities in programs that use regular expressions. In: International conference on tools and algorithms for the construction and analysis of systems. Springer, pp 3–20 Wüstholz V., Olivo O, Heule MJ, Dillig I (2017) Static detection of dos vulnerabilities in programs that use regular expressions. In: International conference on tools and algorithms for the construction and analysis of systems. Springer, pp 3–20
Zurück zum Zitat Ye X, Chen Q, Wang X, Dillig I, Durrett G (2019) Sketch-driven regular expression generation from natural language and examples. arXiv:1908.05848 Ye X, Chen Q, Wang X, Dillig I, Durrett G (2019) Sketch-driven regular expression generation from natural language and examples. arXiv:1908.​05848
Zurück zum Zitat Yin Z, Yuan D, Zhou Y, Pasupathy S, Bairavasundaram L (2011) How do fixes become bugs?. In: Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on foundations of software engineering. ACM, pp 26–36 Yin Z, Yuan D, Zhou Y, Pasupathy S, Bairavasundaram L (2011) How do fixes become bugs?. In: Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on foundations of software engineering. ACM, pp 26–36
Zurück zum Zitat Yu S, Xu L, Zhang Y, Wu J, Liao Z, Li Y (2018) Nbsl: A supervised classification model of pull request in github. In: 2018 IEEE international conference on communications (ICC). IEEE, pp 1–6 Yu S, Xu L, Zhang Y, Wu J, Liao Z, Li Y (2018) Nbsl: A supervised classification model of pull request in github. In: 2018 IEEE international conference on communications (ICC). IEEE, pp 1–6
Zurück zum Zitat Zaidman A, Van Rompaey B, Demeyer S, Van Deursen A (2008) Mining software repositories to study co-evolution of production & test code. In: 2008 1st international conference on software testing, verification, and validation. IEEE, pp 220–229 Zaidman A, Van Rompaey B, Demeyer S, Van Deursen A (2008) Mining software repositories to study co-evolution of production & test code. In: 2008 1st international conference on software testing, verification, and validation. IEEE, pp 220–229
Zurück zum Zitat Zhang Y, Chen Y, Cheung SC, Xiong Y, Zhang L (2018) An empirical study on tensorflow program bugs. In: Proceedings of the 27th ACM SIGSOFT international symposium on software testing and analysis. ACM, pp 129–140 Zhang Y, Chen Y, Cheung SC, Xiong Y, Zhang L (2018) An empirical study on tensorflow program bugs. In: Proceedings of the 27th ACM SIGSOFT international symposium on software testing and analysis. ACM, pp 129–140
Zurück zum Zitat Zhang Z, Yang Y, Xia X, Lo D, Ren X, Grundy J (2021) Unveiling the mystery of api evolution in deep learning frameworks a case study of tensorflow 2. In: 2021 IEEE/ACM 43rd international conference on software engineering: software engineering in practice (ICSE-SEIP). IEEE, pp. 238–247 Zhang Z, Yang Y, Xia X, Lo D, Ren X, Grundy J (2021) Unveiling the mystery of api evolution in deep learning frameworks a case study of tensorflow 2. In: 2021 IEEE/ACM 43rd international conference on software engineering: software engineering in practice (ICSE-SEIP). IEEE, pp. 238–247
Zurück zum Zitat Zhong H, Su Z (2015) An empirical study on real bug fixes. In: Proceedings of the 37th international conference on software engineering-volume 1. IEEE Press, pp 913–923 Zhong H, Su Z (2015) An empirical study on real bug fixes. In: Proceedings of the 37th international conference on software engineering-volume 1. IEEE Press, pp 913–923
Zurück zum Zitat Zhong H, Xie T, Zhang L, Pei J, Mei H (2009) Mapo: Mining and recommending api usage patterns. In: European conference on object-oriented programming. Springer, pp 318–343 Zhong H, Xie T, Zhang L, Pei J, Mei H (2009) Mapo: Mining and recommending api usage patterns. In: European conference on object-oriented programming. Springer, pp 318–343
Zurück zum Zitat Zhong Z, Guo J, Yang W, Peng J, Xie T, Lou JG, Liu T, Zhang D (2018) Semregex: A semantics-based approach for generating regular expressions from natural language specifications. In: Proceedings of the 2018 conference on empirical methods in natural language processing Zhong Z, Guo J, Yang W, Peng J, Xie T, Lou JG, Liu T, Zhang D (2018) Semregex: A semantics-based approach for generating regular expressions from natural language specifications. In: Proceedings of the 2018 conference on empirical methods in natural language processing
Metadaten
Titel
Demystifying regular expression bugs
A comprehensive study on regular expression bug causes, fixes, and testing
verfasst von
Peipei Wang
Chris Brown
Jamie A. Jennings
Kathryn T. Stolee
Publikationsdatum
01.01.2022
Verlag
Springer US
Erschienen in
Empirical Software Engineering / Ausgabe 1/2022
Print ISSN: 1382-3256
Elektronische ISSN: 1573-7616
DOI
https://doi.org/10.1007/s10664-021-10033-1

Weitere Artikel der Ausgabe 1/2022

Empirical Software Engineering 1/2022 Zur Ausgabe

Premium Partner