skip to main content
research-article

The exact online string matching problem: A review of the most recent results

Published:12 March 2013Publication History
Skip Abstract Section

Abstract

This article addresses the online exact string matching problem which consists in finding all occurrences of a given pattern p in a text t. It is an extensively studied problem in computer science, mainly due to its direct applications to such diverse areas as text, image and signal processing, speech analysis and recognition, information retrieval, data compression, computational biology and chemistry.

In the last decade more than 50 new algorithms have been proposed for the problem, which add up to a wide set of (almost 40) algorithms presented before 2000. In this article we review the string matching algorithms presented in the last decade and present experimental results in order to bring order among the dozens of articles published in this area.

References

  1. Ahmed, M., Kaykobad, M., and Chowdhury, R. A. 2003. A new string matching algorithm. Int. J. Comput. Math. 80, 7, 825--834.Google ScholarGoogle ScholarCross RefCross Ref
  2. Aho, A. V., Hopcroft, J. E., and Ullman, J. D. 1974. The Design and Analysis of Computer Algorithms. Addison-Wesley. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Allauzen, C., Crochemore, M., and Raffinot, M. 1999. Factor oracle: A new structure for pattern matching. In Proceedings of the 26th Conference on Current Trends in Theory and Practice of Informatics (SOFSEM'99). J. Pavelka, G. Tel, and M. Bartosek, Eds., Lecture Notes in Computer Science, vol. 1725, Springer, 291--306. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Allauzen, C. and Raffinot, M. 2000. Simple optimal string matching algorithm. J. Algor. 36, 1, 102--116. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Apostolico, A. and Giancarlo, R. 1986. The Boyer-Moore-Galil string searching strategies revisited. SIAM J. Comput. 15, 1, 98--105. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Arndt, J. 2009. Matters Computational. http://www.jjj.de/fxt/Google ScholarGoogle Scholar
  7. Baeza-Yates, R. and Gonnet, G. H. 1992. A new approach to text searching. Comm. ACM 35, 10, 74--8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Berry, T. and Ravindran, S. 1999. A fast string matching algorithm and experimental results. In Proceedings of the Prague Stringology Club Workshop '99. J. Holub and M. Šimánek, Eds., 16--28, Collaborative Report DC--99--05.Google ScholarGoogle Scholar
  9. Blumer, A., Blumer, J., Ehrenfeucht, A., Haussler, D., Chen, M. T., and Seiferas, J. 1985. The smallest automaton recognizing the subwords of a text. Theor. Comput. Sci. 40, 1, 31--55.Google ScholarGoogle ScholarCross RefCross Ref
  10. Blumer, A., Blumer, J., Ehrenfeucht, A., Haussler, D., and McConnel, R. 1983. Linear size finite automata for the set of all subwords of a word: An outline of results. Bull. Eur. Assoc. Theor. Comput. Sci. 21, 12--20.Google ScholarGoogle Scholar
  11. Boyer, R. S. and Moore, J. S. 1977. A fast string searching algorithm. Comm. ACM 20, 10, 762--772. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Breslauer, D. 1996. Saving comparisons in the Crochemore-Perrin string matching algorithm. Theor. Comput. Sci. 158, 1--2, 177--192. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Cantone, D. and Faro, S. 2003a. Fast-Search: A new efficient variant of the Boyer-Moore string matching algorithm. In Proceedings of the 2nd International Workshop on Experimental and Efficient Algorithms. Lecture Notes in Computer Science, vol. 2647, Springer, 247--267. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Cantone, D. and Faro, S. 2003b. Forward-Fast-Search: Another fast variant of the boyer-moore string matching algorithm. In Proceedings of the Prague Stringology Conference‘03, M. Šimánek, Ed., 10--24.Google ScholarGoogle Scholar
  15. Cantone, D. and Faro, S. 2004. Searching for a substring with constant extra-space complexity. In Proceedings of the 3rd International Conference on Fun with Algorithms P. Ferragina and R. Grossi, Eds., 118--131.Google ScholarGoogle Scholar
  16. Cantone, D. and Faro, S. 2005. Fast-Search algorithms: New efficient variants of the Boyer-Moore pattern-matching algorithm. J. Autom. Lang. Comb. 10, 5/6, 589--608.Google ScholarGoogle Scholar
  17. Cantone, D., Faro, S., and Giaquinta, E. 2010a. Bit-(parallelism)2: Getting to the next level of parallelism. In Proceedings of the 5th International Conference on Fun with Algorithms. P. Boldi and L. Gargano, Eds., Lecture Notes in Computer Science, vol. 6099, Springer, 166--177. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Cantone, D., Faro, S., and Giaquinta, E. 2010b. A compact representation of nondeterministic (suffix) automata for the bit-parallel approach. In Proceedings of the 21st Annual Conference on Combinatorial Pattern Matching. A. Amir and L. Parida, Eds., Lecture Notes in Computer Science, vol. 6129, Springer, 288--298. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Charras, C. and Lecroq, T. 2004. Handbook of Exact String Matching Algorithms. King's College Publications. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Colussi, L. 1994. Fastest pattern matching in strings. J. Algor. 16, 2, 163--189. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Crochemore, M. 1985. Optimal factor transducers. In Proceedings of the NATO Advanced Research Workshop on Combinatorial Algorithms on Words. A. Apostolico and Z. Galil, Eds., NATO Advanced Science Institutes, Series F, vol. 12, Springer, 31--44.Google ScholarGoogle Scholar
  22. Crochemore, M. 1986. Transducers and repetitions. Theor. Comput. Sci. 45, 1, 63--86. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Crochemore, M., Czumaj, A., Gąsieniec, L., Jarominek, S., Lecroq, T., Plandowski, W., and Rytter, W. 1994. Speeding up two string matching algorithms. Algorithmica 12, 4/5, 247--267.Google ScholarGoogle Scholar
  24. Crochemore, M. and Lecroq, T. 2008. A fast implementation of the Boyer-Moore string matching algorithm. http://www-igm.univ-mlv.fr/~lecroq/articles/cl2008.pdfGoogle ScholarGoogle Scholar
  25. Crochemore, M. and Perrin, D. 1991. Two-Way string-matching. J. Assoc. Comput. Mach. 38, 3, 651--675. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Crochemore, M. and Rytter, W. 1994. Text Algorithms. Oxford University Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Deusdado, S. and Carvalho, P. 2009. GRASPm: An efficient algorithm for exact pattern-matching in genomic sequences. Int. J. Bioinf. Res. Appl. 5, 4, 385--401. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Dömölki, B. 1968. A universal compiler system based on production rules. BIT Numer. Math. 8, 262--275.Google ScholarGoogle ScholarCross RefCross Ref
  29. Durian, B., Holub, J., Peltola, H., and Tarhio, J. 2009. Tuning BNDM with q-grams. In Proceedings of the Workshop on Algorithm Engineering and Experiments (ALENEX '09). I. Finocchi and J. Hershberger, Eds., SIAM, New York, 29--37.Google ScholarGoogle Scholar
  30. Fan, H., Yao, N., and Ma, H. 2009. Fast variants of the Backward-Oracle-Marching algorithm. In Proceedings of the 4th International Conference on Internet Computing for Science and Engineering (ICICSE '09). IEEE Computer Society, 56--59. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Faro, S. and Lecroq, T. 2008. Efficient variants of the Backward-Oracle-Matching algorithm. In Proceedings of the Prague Stringology Conference '08. J. Holub and J. Žďárek, Eds., 146--160.Google ScholarGoogle Scholar
  32. Faro, S. and Lecroq, T. 2010. The exact string matching problem: A comprehensive experimental evaluation. arXiv:1012.2547.Google ScholarGoogle Scholar
  33. Franek, F., Jennings, C. G., and Smyth, W. F. 2007. A simple fast hybrid pattern-matching algorithm. J. Discr. Algor. 5, 4, 682--695. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Fredriksson, K. and Grabowski, S. 2005. Practical and optimal string matching. In Proceedings of the 12th International Conference on String Processing and Information Retrieval (SPIRE '05), M. P. Consens and G. Navarro, Eds., Lecture Notes in Computer Science, vol. 3772, Springer, 376--387. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Galil, Z. 1981. String matching in real time. J. ACM 28, 1, 134--149. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Galil, Z. and Seiferas, J. 1983. Time-Space optimal string matching. J. Comput. Syst. Sci. 26, 3, 280--294.Google ScholarGoogle ScholarCross RefCross Ref
  37. Gasieniec, L. and Kolpakov, R. 2004. Real-Time string matching in sublinear space. In Proceedings of the 15th Annual Symposium on Combinatorial Pattern Matching (CPM). S. C. Sahinalp, S. Muthukrishnan, and U. Dogrus&ozuml;, Eds., Lecture Notes in Computer Science, vol. 3109, Springer, 117--129.Google ScholarGoogle Scholar
  38. Hancart, C. 1993. On Simon's string searching algorithm. Inf. Process. Lett. 47, 2, 95--99. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. He, L., Fang, B., and Sui, J. 2005. The wide window string matching algorithm. Theor. Comput. Sci. 332, 1-3, 391--404. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Holub, J. and Durian, B. 2005. Talk: Fast variants of bit parallel approach to suffix automata. In Proceedings of the 2nd Haifa Annual International Stringology Research Workshop of the Israeli Science Foundation. http://www.cri.haifa.ac.il/events/2005/string/presentations/Holub.pdfGoogle ScholarGoogle Scholar
  41. Horspool, R. N. 1980. Practical fast searching in strings. Softw. Pract. Exp. 10, 6, 501--506.Google ScholarGoogle ScholarCross RefCross Ref
  42. Hudaib, A., Al-Khalid, R., Suleiman, D., Itriq, M., and Al-Anani, A. 2008. A fast pattern matching algorithm with two sliding windows (TSW). J. Comput. Sci. 4, 5, 393--401.Google ScholarGoogle ScholarCross RefCross Ref
  43. Hume, A. and Sunday, D. M. 1991. Fast string searching. Softw. Pract. Exp. 21, 11, 1221--1248. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Kalsi, P., Peltola, H., and Tarhio, J. 2008. Comparison of exact string matching algorithms for biological sequences. In Proceedings of the 2nd International Conference on Bioinformatics Research and Development (BIRD'08). M. Elloumi, J. Küng, M. Linial, R. F. Murphy, K. Schneider, and C. Toma, Eds., 417--426.Google ScholarGoogle Scholar
  45. Karp, R. M. and Rabin, M. O. 1987. Efficient randomized pattern-matching algorithms. IBM J. Res.Dev. 31, 2, 249--260. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Knuth, D. E., Morris, Jr, J. H., and Pratt, V. R. 1977. Fast pattern matching in strings. SIAM J. Comput. 6, 1, 323--350.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. K&ulekciuml;, M. O. 2008. A method to overcome computer word size limitation in bit-parallel pattern matching. In Proceedings of the 19th International Symposium on Algorithms and Computation (ISAAC'08). S.-H. Hong, H. Nagamochi, and T. Fukunaga, Eds., Lecture Notes in Computer Science, vol. 5369, Springer, 496--506. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Külekci, M. O. 2009. Filter based fast matching of long patterns by using SIMD instructions. In Proceedings of the Prague Stringology Conference‘09, J. Holub and J. Žďárek, Eds., 118--128.Google ScholarGoogle Scholar
  49. Lecroq, T. 2007. Fast exact string matching algorithms. Inf. Process. Lett. 102, 6, 229--235. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Liu, C., Wang, Y., Liu, D., and Li, D. 2006. Two improved single pattern matching algorithms. In Proceedings of the 16th International Conference on Artificial Reality and Telexistence—Workshops (ICAT'06). IEEE Computer Society, 419--422. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Mancheron, A. and Moan, C. 2005. Combinatorial characterization of the language recognized by factor and suffix oracles. Int. J. Found. Comput. Sci. 16, 6, 1179--1191.Google ScholarGoogle ScholarCross RefCross Ref
  52. Morris, Jr, J. H. and Pratt, V. R. 1970. A linear pattern-matching algorithm. Report 40, University of California, Berkeley.Google ScholarGoogle Scholar
  53. Navarro, G. 2001. NR-grep: A fast and flexible pattern-matching tool. Softw. Pract. Exp. 31, 13, 1265--1312. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Navarro, G. and Raffinot, M. 1998. A bit-parallel approach to suffix automata: Fast extended string matching. In Proceedings of the 9th Annual Symposium on Combinatorial Pattern Matching. M. Farach-Colton, Ed., Lecture Notes in Computer Science, vol. 1448, Springer, 14--33. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Navarro, G. and Raffinot, M. 2000. Fast and flexible string matching by combining bit-parallelism and suffix automata. ACM J. Exper. Algor. 5, 4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Nebel, M. E. 2006. Fast string matching by using probabilities: on an optimal mismatch variant of Horspool's algorithm. Theor. Comput. Sci. 359, 1, 329--343. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Peltola, H. and Tarhio, J. 2003. Alternative algorithms for bit-parallel string matching. In Proceedings of the 10th International Symposium on String Processing and Information Retrieval (SPIRE'03), M. A. Nascimento, E. S. de Moura, and A. L. Oliveira, Eds., Lecture Notes in Computer Science, vol. 2857, Springer, 80--94.Google ScholarGoogle Scholar
  58. Raita, T. 1992. Tuning the Boyer-Moore-Horspool string searching algorithm. Softw. Pract. Exp. 22, 10, 879--884. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Sheik, S., Aggarwal, S., Poddar, A., Balakrishnan, N., and Sekar, K. 2004. A fast pattern matching algorithm. J. Chem. Inf. Comput. 44, 1251--1256.Google ScholarGoogle ScholarCross RefCross Ref
  60. Simon, I. 1993. String matching algorithms and automata. In Proceedings of the 1st South American Workshop on String Processing. R. Baeza-Yates and N. Ziviani, Eds., 151--157.Google ScholarGoogle Scholar
  61. Sunday, D. M. 1990. A very fast substring search algorithm. Comm. ACM 33, 8, 132--142. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Sustik, M. and Moore, J. 2007. String searching over small alphabets. Tech. rep. TR-07-62. Department of Computer Sciences, University of Texas at Austin.Google ScholarGoogle Scholar
  63. Thathoo, R., Virmani, A., Lakshmi, S. S., Balakrishnan, N., and Sekar, K. 2006. TVSBS: A fast exact pattern matching algorithm for biological sequences. J. Indian Acad. Sci., Current Sci. 91, 1, 47--53.Google ScholarGoogle Scholar
  64. Wu, S. and Manber, U. 1992. Fast text searching allowing errors. Comm. ACM 35, 10, 83--91. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Wu, S. and Manber, U. 1994. A fast algorithm for multi-pattern searching. Tech. rep. TR-94-17, Department of Computer Science, University of Arizona, Tucson, AZ.Google ScholarGoogle Scholar
  66. Yao, A. C. 1979. The complexity of pattern matching for a random string. SIAM J. Comput. 8, 3, 368--387.Google ScholarGoogle ScholarCross RefCross Ref
  67. Zhang, G., Zhu, E., Mao, L., and Yin, M. 2009. A bit-parallel exact string matching algorithm for small alphabet. In Proceedings of the 3rd International Workshop on Frontiers in Algorithmics (FAW'09). X. Deng, J. E. Hopcroft, and J. Xue, Eds., Lecture Notes in Computer Science, vol. 5598, Springer, 336--345. Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Zhu, R. F. and Takaoka, T. 1987. On improving the average case of the Boyer-Moore string matching algorithm. J. Inf. Process. 10, 3, 173--177. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. The exact online string matching problem: A review of the most recent results

      Recommendations

      Reviews

      Bayard Kohlhepp

      String matching, especially searching for textual needles in data haystacks, is probably the most used function in programming today. Fortunes are literally being made, and lost, depending on the ability of a company to find text strings of interest. You can tell by the title that this paper is a survey of string-matching algorithms. What you can't tell is that the paper goes beyond a simple list of methods. It additionally classifies published algorithms according to performance and class of problem. In other words, turn to the back pages, identify the type of string-matching problem you are trying to solve, and find the algorithm that performs best for that class of problem. Decades of papers are summarized here, and there are significant recent advances. I had no idea how many algorithms were available. Now I feel like a slacker for settling on "strstr" so frequently. The paper itself is very readable, even if you aren't comfortable with set or logic notation (skip the notation and read the accompanying text). At 42 pages, it's too long to be called a quick read-there are too many algorithms for it to be a short document-but you can skip to the end and read the conclusions (section 5, figure 2, map of results) if you're in a hurry. Anyone creating "big data" analytics or business intelligence solutions should keep this paper as a primary reference document. Online Computing Reviews Service

      Access critical reviews of Computing literature here

      Become a reviewer for Computing Reviews.

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Computing Surveys
        ACM Computing Surveys  Volume 45, Issue 2
        February 2013
        417 pages
        ISSN:0360-0300
        EISSN:1557-7341
        DOI:10.1145/2431211
        Issue’s Table of Contents

        Copyright © 2013 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 12 March 2013
        • Accepted: 1 August 2011
        • Revised: 1 June 2011
        • Received: 1 December 2010
        Published in csur Volume 45, Issue 2

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader