Abstract
This article addresses the online exact string matching problem which consists in finding all occurrences of a given pattern p in a text t. It is an extensively studied problem in computer science, mainly due to its direct applications to such diverse areas as text, image and signal processing, speech analysis and recognition, information retrieval, data compression, computational biology and chemistry.
In the last decade more than 50 new algorithms have been proposed for the problem, which add up to a wide set of (almost 40) algorithms presented before 2000. In this article we review the string matching algorithms presented in the last decade and present experimental results in order to bring order among the dozens of articles published in this area.
- Ahmed, M., Kaykobad, M., and Chowdhury, R. A. 2003. A new string matching algorithm. Int. J. Comput. Math. 80, 7, 825--834.Google ScholarCross Ref
- Aho, A. V., Hopcroft, J. E., and Ullman, J. D. 1974. The Design and Analysis of Computer Algorithms. Addison-Wesley. Google ScholarDigital Library
- Allauzen, C., Crochemore, M., and Raffinot, M. 1999. Factor oracle: A new structure for pattern matching. In Proceedings of the 26th Conference on Current Trends in Theory and Practice of Informatics (SOFSEM'99). J. Pavelka, G. Tel, and M. Bartosek, Eds., Lecture Notes in Computer Science, vol. 1725, Springer, 291--306. Google ScholarDigital Library
- Allauzen, C. and Raffinot, M. 2000. Simple optimal string matching algorithm. J. Algor. 36, 1, 102--116. Google ScholarDigital Library
- Apostolico, A. and Giancarlo, R. 1986. The Boyer-Moore-Galil string searching strategies revisited. SIAM J. Comput. 15, 1, 98--105. Google ScholarDigital Library
- Arndt, J. 2009. Matters Computational. http://www.jjj.de/fxt/Google Scholar
- Baeza-Yates, R. and Gonnet, G. H. 1992. A new approach to text searching. Comm. ACM 35, 10, 74--8. Google ScholarDigital Library
- Berry, T. and Ravindran, S. 1999. A fast string matching algorithm and experimental results. In Proceedings of the Prague Stringology Club Workshop '99. J. Holub and M. Šimánek, Eds., 16--28, Collaborative Report DC--99--05.Google Scholar
- Blumer, A., Blumer, J., Ehrenfeucht, A., Haussler, D., Chen, M. T., and Seiferas, J. 1985. The smallest automaton recognizing the subwords of a text. Theor. Comput. Sci. 40, 1, 31--55.Google ScholarCross Ref
- Blumer, A., Blumer, J., Ehrenfeucht, A., Haussler, D., and McConnel, R. 1983. Linear size finite automata for the set of all subwords of a word: An outline of results. Bull. Eur. Assoc. Theor. Comput. Sci. 21, 12--20.Google Scholar
- Boyer, R. S. and Moore, J. S. 1977. A fast string searching algorithm. Comm. ACM 20, 10, 762--772. Google ScholarDigital Library
- Breslauer, D. 1996. Saving comparisons in the Crochemore-Perrin string matching algorithm. Theor. Comput. Sci. 158, 1--2, 177--192. Google ScholarDigital Library
- Cantone, D. and Faro, S. 2003a. Fast-Search: A new efficient variant of the Boyer-Moore string matching algorithm. In Proceedings of the 2nd International Workshop on Experimental and Efficient Algorithms. Lecture Notes in Computer Science, vol. 2647, Springer, 247--267. Google ScholarDigital Library
- Cantone, D. and Faro, S. 2003b. Forward-Fast-Search: Another fast variant of the boyer-moore string matching algorithm. In Proceedings of the Prague Stringology Conference‘03, M. Šimánek, Ed., 10--24.Google Scholar
- Cantone, D. and Faro, S. 2004. Searching for a substring with constant extra-space complexity. In Proceedings of the 3rd International Conference on Fun with Algorithms P. Ferragina and R. Grossi, Eds., 118--131.Google Scholar
- Cantone, D. and Faro, S. 2005. Fast-Search algorithms: New efficient variants of the Boyer-Moore pattern-matching algorithm. J. Autom. Lang. Comb. 10, 5/6, 589--608.Google Scholar
- Cantone, D., Faro, S., and Giaquinta, E. 2010a. Bit-(parallelism)2: Getting to the next level of parallelism. In Proceedings of the 5th International Conference on Fun with Algorithms. P. Boldi and L. Gargano, Eds., Lecture Notes in Computer Science, vol. 6099, Springer, 166--177. Google ScholarDigital Library
- Cantone, D., Faro, S., and Giaquinta, E. 2010b. A compact representation of nondeterministic (suffix) automata for the bit-parallel approach. In Proceedings of the 21st Annual Conference on Combinatorial Pattern Matching. A. Amir and L. Parida, Eds., Lecture Notes in Computer Science, vol. 6129, Springer, 288--298. Google ScholarDigital Library
- Charras, C. and Lecroq, T. 2004. Handbook of Exact String Matching Algorithms. King's College Publications. Google ScholarDigital Library
- Colussi, L. 1994. Fastest pattern matching in strings. J. Algor. 16, 2, 163--189. Google ScholarDigital Library
- Crochemore, M. 1985. Optimal factor transducers. In Proceedings of the NATO Advanced Research Workshop on Combinatorial Algorithms on Words. A. Apostolico and Z. Galil, Eds., NATO Advanced Science Institutes, Series F, vol. 12, Springer, 31--44.Google Scholar
- Crochemore, M. 1986. Transducers and repetitions. Theor. Comput. Sci. 45, 1, 63--86. Google ScholarDigital Library
- Crochemore, M., Czumaj, A., Gąsieniec, L., Jarominek, S., Lecroq, T., Plandowski, W., and Rytter, W. 1994. Speeding up two string matching algorithms. Algorithmica 12, 4/5, 247--267.Google Scholar
- Crochemore, M. and Lecroq, T. 2008. A fast implementation of the Boyer-Moore string matching algorithm. http://www-igm.univ-mlv.fr/~lecroq/articles/cl2008.pdfGoogle Scholar
- Crochemore, M. and Perrin, D. 1991. Two-Way string-matching. J. Assoc. Comput. Mach. 38, 3, 651--675. Google ScholarDigital Library
- Crochemore, M. and Rytter, W. 1994. Text Algorithms. Oxford University Press. Google ScholarDigital Library
- Deusdado, S. and Carvalho, P. 2009. GRASPm: An efficient algorithm for exact pattern-matching in genomic sequences. Int. J. Bioinf. Res. Appl. 5, 4, 385--401. Google ScholarDigital Library
- Dömölki, B. 1968. A universal compiler system based on production rules. BIT Numer. Math. 8, 262--275.Google ScholarCross Ref
- Durian, B., Holub, J., Peltola, H., and Tarhio, J. 2009. Tuning BNDM with q-grams. In Proceedings of the Workshop on Algorithm Engineering and Experiments (ALENEX '09). I. Finocchi and J. Hershberger, Eds., SIAM, New York, 29--37.Google Scholar
- Fan, H., Yao, N., and Ma, H. 2009. Fast variants of the Backward-Oracle-Marching algorithm. In Proceedings of the 4th International Conference on Internet Computing for Science and Engineering (ICICSE '09). IEEE Computer Society, 56--59. Google ScholarDigital Library
- Faro, S. and Lecroq, T. 2008. Efficient variants of the Backward-Oracle-Matching algorithm. In Proceedings of the Prague Stringology Conference '08. J. Holub and J. Žďárek, Eds., 146--160.Google Scholar
- Faro, S. and Lecroq, T. 2010. The exact string matching problem: A comprehensive experimental evaluation. arXiv:1012.2547.Google Scholar
- Franek, F., Jennings, C. G., and Smyth, W. F. 2007. A simple fast hybrid pattern-matching algorithm. J. Discr. Algor. 5, 4, 682--695. Google ScholarDigital Library
- Fredriksson, K. and Grabowski, S. 2005. Practical and optimal string matching. In Proceedings of the 12th International Conference on String Processing and Information Retrieval (SPIRE '05), M. P. Consens and G. Navarro, Eds., Lecture Notes in Computer Science, vol. 3772, Springer, 376--387. Google ScholarDigital Library
- Galil, Z. 1981. String matching in real time. J. ACM 28, 1, 134--149. Google ScholarDigital Library
- Galil, Z. and Seiferas, J. 1983. Time-Space optimal string matching. J. Comput. Syst. Sci. 26, 3, 280--294.Google ScholarCross Ref
- Gasieniec, L. and Kolpakov, R. 2004. Real-Time string matching in sublinear space. In Proceedings of the 15th Annual Symposium on Combinatorial Pattern Matching (CPM). S. C. Sahinalp, S. Muthukrishnan, and U. Dogrus&ozuml;, Eds., Lecture Notes in Computer Science, vol. 3109, Springer, 117--129.Google Scholar
- Hancart, C. 1993. On Simon's string searching algorithm. Inf. Process. Lett. 47, 2, 95--99. Google ScholarDigital Library
- He, L., Fang, B., and Sui, J. 2005. The wide window string matching algorithm. Theor. Comput. Sci. 332, 1-3, 391--404. Google ScholarDigital Library
- Holub, J. and Durian, B. 2005. Talk: Fast variants of bit parallel approach to suffix automata. In Proceedings of the 2nd Haifa Annual International Stringology Research Workshop of the Israeli Science Foundation. http://www.cri.haifa.ac.il/events/2005/string/presentations/Holub.pdfGoogle Scholar
- Horspool, R. N. 1980. Practical fast searching in strings. Softw. Pract. Exp. 10, 6, 501--506.Google ScholarCross Ref
- Hudaib, A., Al-Khalid, R., Suleiman, D., Itriq, M., and Al-Anani, A. 2008. A fast pattern matching algorithm with two sliding windows (TSW). J. Comput. Sci. 4, 5, 393--401.Google ScholarCross Ref
- Hume, A. and Sunday, D. M. 1991. Fast string searching. Softw. Pract. Exp. 21, 11, 1221--1248. Google ScholarDigital Library
- Kalsi, P., Peltola, H., and Tarhio, J. 2008. Comparison of exact string matching algorithms for biological sequences. In Proceedings of the 2nd International Conference on Bioinformatics Research and Development (BIRD'08). M. Elloumi, J. Küng, M. Linial, R. F. Murphy, K. Schneider, and C. Toma, Eds., 417--426.Google Scholar
- Karp, R. M. and Rabin, M. O. 1987. Efficient randomized pattern-matching algorithms. IBM J. Res.Dev. 31, 2, 249--260. Google ScholarDigital Library
- Knuth, D. E., Morris, Jr, J. H., and Pratt, V. R. 1977. Fast pattern matching in strings. SIAM J. Comput. 6, 1, 323--350.Google ScholarDigital Library
- K&ulekciuml;, M. O. 2008. A method to overcome computer word size limitation in bit-parallel pattern matching. In Proceedings of the 19th International Symposium on Algorithms and Computation (ISAAC'08). S.-H. Hong, H. Nagamochi, and T. Fukunaga, Eds., Lecture Notes in Computer Science, vol. 5369, Springer, 496--506. Google ScholarDigital Library
- Külekci, M. O. 2009. Filter based fast matching of long patterns by using SIMD instructions. In Proceedings of the Prague Stringology Conference‘09, J. Holub and J. Žďárek, Eds., 118--128.Google Scholar
- Lecroq, T. 2007. Fast exact string matching algorithms. Inf. Process. Lett. 102, 6, 229--235. Google ScholarDigital Library
- Liu, C., Wang, Y., Liu, D., and Li, D. 2006. Two improved single pattern matching algorithms. In Proceedings of the 16th International Conference on Artificial Reality and Telexistence—Workshops (ICAT'06). IEEE Computer Society, 419--422. Google ScholarDigital Library
- Mancheron, A. and Moan, C. 2005. Combinatorial characterization of the language recognized by factor and suffix oracles. Int. J. Found. Comput. Sci. 16, 6, 1179--1191.Google ScholarCross Ref
- Morris, Jr, J. H. and Pratt, V. R. 1970. A linear pattern-matching algorithm. Report 40, University of California, Berkeley.Google Scholar
- Navarro, G. 2001. NR-grep: A fast and flexible pattern-matching tool. Softw. Pract. Exp. 31, 13, 1265--1312. Google ScholarDigital Library
- Navarro, G. and Raffinot, M. 1998. A bit-parallel approach to suffix automata: Fast extended string matching. In Proceedings of the 9th Annual Symposium on Combinatorial Pattern Matching. M. Farach-Colton, Ed., Lecture Notes in Computer Science, vol. 1448, Springer, 14--33. Google ScholarDigital Library
- Navarro, G. and Raffinot, M. 2000. Fast and flexible string matching by combining bit-parallelism and suffix automata. ACM J. Exper. Algor. 5, 4. Google ScholarDigital Library
- Nebel, M. E. 2006. Fast string matching by using probabilities: on an optimal mismatch variant of Horspool's algorithm. Theor. Comput. Sci. 359, 1, 329--343. Google ScholarDigital Library
- Peltola, H. and Tarhio, J. 2003. Alternative algorithms for bit-parallel string matching. In Proceedings of the 10th International Symposium on String Processing and Information Retrieval (SPIRE'03), M. A. Nascimento, E. S. de Moura, and A. L. Oliveira, Eds., Lecture Notes in Computer Science, vol. 2857, Springer, 80--94.Google Scholar
- Raita, T. 1992. Tuning the Boyer-Moore-Horspool string searching algorithm. Softw. Pract. Exp. 22, 10, 879--884. Google ScholarDigital Library
- Sheik, S., Aggarwal, S., Poddar, A., Balakrishnan, N., and Sekar, K. 2004. A fast pattern matching algorithm. J. Chem. Inf. Comput. 44, 1251--1256.Google ScholarCross Ref
- Simon, I. 1993. String matching algorithms and automata. In Proceedings of the 1st South American Workshop on String Processing. R. Baeza-Yates and N. Ziviani, Eds., 151--157.Google Scholar
- Sunday, D. M. 1990. A very fast substring search algorithm. Comm. ACM 33, 8, 132--142. Google ScholarDigital Library
- Sustik, M. and Moore, J. 2007. String searching over small alphabets. Tech. rep. TR-07-62. Department of Computer Sciences, University of Texas at Austin.Google Scholar
- Thathoo, R., Virmani, A., Lakshmi, S. S., Balakrishnan, N., and Sekar, K. 2006. TVSBS: A fast exact pattern matching algorithm for biological sequences. J. Indian Acad. Sci., Current Sci. 91, 1, 47--53.Google Scholar
- Wu, S. and Manber, U. 1992. Fast text searching allowing errors. Comm. ACM 35, 10, 83--91. Google ScholarDigital Library
- Wu, S. and Manber, U. 1994. A fast algorithm for multi-pattern searching. Tech. rep. TR-94-17, Department of Computer Science, University of Arizona, Tucson, AZ.Google Scholar
- Yao, A. C. 1979. The complexity of pattern matching for a random string. SIAM J. Comput. 8, 3, 368--387.Google ScholarCross Ref
- Zhang, G., Zhu, E., Mao, L., and Yin, M. 2009. A bit-parallel exact string matching algorithm for small alphabet. In Proceedings of the 3rd International Workshop on Frontiers in Algorithmics (FAW'09). X. Deng, J. E. Hopcroft, and J. Xue, Eds., Lecture Notes in Computer Science, vol. 5598, Springer, 336--345. Google ScholarDigital Library
- Zhu, R. F. and Takaoka, T. 1987. On improving the average case of the Boyer-Moore string matching algorithm. J. Inf. Process. 10, 3, 173--177. Google ScholarDigital Library
Index Terms
- The exact online string matching problem: A review of the most recent results
Recommendations
Fast exact string matching algorithms
String matching is the problem of finding all the occurrences of a pattern in a text. We propose a very fast new family of string matching algorithms based on hashing q-grams. The new algorithms are the fastest on many cases, in particular, on small ...
A weak approach to suffix automata simulation for exact and approximate string matching
AbstractString matching is one of the most extensively studied problems in computer science, mainly due to its direct applications to such diverse areas as text, image and signal processing, speech analysis and recognition, ...
Highlights- We present a simple and efficient technique for simulating the suffix automaton of a string.
A Bit-Parallel Exact String Matching Algorithm for Small Alphabet
FAW '09: Proceedings of the 3d International Workshop on Frontiers in AlgorithmicsThis paper concentrates on the problem of finding all the occurrences of a pattern in a text. A novel bit-parallel exact string matching algorithm for small alphabet (SABP) is proposed based on a position related character matching table, which is ...
Comments