Skip to main content
Erschienen in: Theory of Computing Systems 1/2014

01.07.2014

String Indexing for Patterns with Wildcards

Erschienen in: Theory of Computing Systems | Ausgabe 1/2014

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

We consider the problem of indexing a string t of length n to report the occurrences of a query pattern p containing m characters and j wildcards. Let occ be the number of occurrences of p in t, and σ the size of the alphabet. We obtain the following results.
  • A linear space index with query time O(m+σ j loglogn+occ). This significantly improves the previously best known linear space index by Lam et al. (in Proc. 18th ISAAC, pp. 846–857, [2007]), which requires query time Θ(jn) in the worst case.
  • An index with query time O(m+j+occ) using space \(O(\sigma^{k^{2}} n \log^{k} \log n)\), where k is the maximum number of wildcards allowed in the pattern. This is the first non-trivial bound with this query time.
  • A time-space trade-off, generalizing the index by Cole et al. (in Proc. 36th STOC, pp. 91–100, [2004]).
We also show that these indexes can be generalized to allow variable length gaps in the pattern. Our results are obtained using a novel combination of well-known and new techniques, which could be of independent interest.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Alstrup, S., Husfeldt, T., Rauhe, T.: Marked ancestor problems. In: Proc. 39th FOCS, pp. 534–543 (1998) Alstrup, S., Husfeldt, T., Rauhe, T.: Marked ancestor problems. In: Proc. 39th FOCS, pp. 534–543 (1998)
2.
Zurück zum Zitat Amir, A., Lewenstein, M., Porat, E.: Faster algorithms for string matching with k mismatches. J. Algorithms 50(2), 257–275 (2004) CrossRefMATHMathSciNet Amir, A., Lewenstein, M., Porat, E.: Faster algorithms for string matching with k mismatches. J. Algorithms 50(2), 257–275 (2004) CrossRefMATHMathSciNet
3.
Zurück zum Zitat Belazzougui, D.: Faster and space-optimal edit distance “1” dictionary. In: Proc. 20th CPM, pp. 154–167 (2009) Belazzougui, D.: Faster and space-optimal edit distance “1” dictionary. In: Proc. 20th CPM, pp. 154–167 (2009)
4.
Zurück zum Zitat Bille, P., Gørtz, I.L.: Substring range reporting. In: Proc. 22nd CPM, pp. 299–308 (2011) Bille, P., Gørtz, I.L.: Substring range reporting. In: Proc. 22nd CPM, pp. 299–308 (2011)
5.
Zurück zum Zitat Bille, P., Gørtz, I.L., Vildhøj, H., Wind, D.: String matching with variable length gaps. In: Proc. 17th SPIRE, pp. 385–394 (2010) Bille, P., Gørtz, I.L., Vildhøj, H., Wind, D.: String matching with variable length gaps. In: Proc. 17th SPIRE, pp. 385–394 (2010)
6.
Zurück zum Zitat Bucher, P., Bairoch, A.: A generalized profile syntax for biomolecular sequence motifs and its function in automatic sequence interpretation. In: Proc. 2nd ISMB, pp. 53–61 (1994) Bucher, P., Bairoch, A.: A generalized profile syntax for biomolecular sequence motifs and its function in automatic sequence interpretation. In: Proc. 2nd ISMB, pp. 53–61 (1994)
7.
Zurück zum Zitat Chan, H.L., Lam, T.W., Sung, W.K., Tam, S.L., Wong, S.S.: A linear size index for approximate pattern matching. J. Discrete Algorithms 9(4), 358–364 (2011) CrossRefMATHMathSciNet Chan, H.L., Lam, T.W., Sung, W.K., Tam, S.L., Wong, S.S.: A linear size index for approximate pattern matching. J. Discrete Algorithms 9(4), 358–364 (2011) CrossRefMATHMathSciNet
9.
Zurück zum Zitat Chen, G., Wu, X., Zhu, X., Arslan, A., He, Y.: Efficient string matching with wildcards and length constraints. Knowl. Inf. Syst. 10(4), 399–419 (2006) CrossRef Chen, G., Wu, X., Zhu, X., Arslan, A., He, Y.: Efficient string matching with wildcards and length constraints. Knowl. Inf. Syst. 10(4), 399–419 (2006) CrossRef
11.
Zurück zum Zitat Coelho, L., Oliveira, A.: Dotted suffix trees a structure for approximate text indexing. In: Proc. 13th SPIRE, pp. 329–336 (2006) Coelho, L., Oliveira, A.: Dotted suffix trees a structure for approximate text indexing. In: Proc. 13th SPIRE, pp. 329–336 (2006)
12.
13.
Zurück zum Zitat Cole, R., Hariharan, R.: Verifying candidate matches in sparse and wildcard matching. In: Proc. 34rd STOC, pp. 592–601 (2002) Cole, R., Hariharan, R.: Verifying candidate matches in sparse and wildcard matching. In: Proc. 34rd STOC, pp. 592–601 (2002)
14.
Zurück zum Zitat Cole, R., Gottlieb, L., Lewenstein, M.: Dictionary matching and indexing with errors and don’t cares. In: Proc. 36th STOC, pp. 91–100 (2004) Cole, R., Gottlieb, L., Lewenstein, M.: Dictionary matching and indexing with errors and don’t cares. In: Proc. 36th STOC, pp. 91–100 (2004)
15.
Zurück zum Zitat Fischer, M.J., Paterson, M.S.: String-matching and other products. In: Complexity of Computation, SIAM-AMS Proceedings, pp. 113–125 (1974) Fischer, M.J., Paterson, M.S.: String-matching and other products. In: Complexity of Computation, SIAM-AMS Proceedings, pp. 113–125 (1974)
16.
Zurück zum Zitat Fredman, M.L., Komlós, J., Szemerédi, E.: Storing a sparse table with O(1) worst case access time. J. ACM 31, 538–544 (1984) CrossRefMATH Fredman, M.L., Komlós, J., Szemerédi, E.: Storing a sparse table with O(1) worst case access time. J. ACM 31, 538–544 (1984) CrossRefMATH
17.
Zurück zum Zitat Fredriksson, K., Grabowski, S.: Efficient algorithms for pattern matching with general gaps, character classes, and transposition invariance. Inf. Retr. 11(4), 335–357 (2008) CrossRef Fredriksson, K., Grabowski, S.: Efficient algorithms for pattern matching with general gaps, character classes, and transposition invariance. Inf. Retr. 11(4), 335–357 (2008) CrossRef
18.
Zurück zum Zitat Fredriksson, K., Grabowski, S.: Nested counters in bit-parallel string matching. In: Proc. 3rd LATA, pp. 338–349 (2009) Fredriksson, K., Grabowski, S.: Nested counters in bit-parallel string matching. In: Proc. 3rd LATA, pp. 338–349 (2009)
19.
Zurück zum Zitat Galil, Z., Giancarlo, R.: Improved string matching with k mismatches. SIGACT News 17(4), 52–54 (1986) CrossRef Galil, Z., Giancarlo, R.: Improved string matching with k mismatches. SIGACT News 17(4), 52–54 (1986) CrossRef
20.
Zurück zum Zitat Hagerup, T.: Sorting and searching on the word RAM. In: Proc. 15th STACS, pp. 366–398 (1998) Hagerup, T.: Sorting and searching on the word RAM. In: Proc. 15th STACS, pp. 366–398 (1998)
21.
22.
Zurück zum Zitat Hofmann, K., Bucher, P., Falquet, L., Bairoch, A.: The PROSITE database, its status in 1999. Nucleic Acids Res. 27(1), 215–219 (1999) CrossRef Hofmann, K., Bucher, P., Falquet, L., Bairoch, A.: The PROSITE database, its status in 1999. Nucleic Acids Res. 27(1), 215–219 (1999) CrossRef
23.
Zurück zum Zitat Iliopoulos, C.S., Rahman, M.S.: Pattern matching algorithms with don’t cares. In: Proc. 33rd SOFSEM, pp. 116–126 (2007) Iliopoulos, C.S., Rahman, M.S.: Pattern matching algorithms with don’t cares. In: Proc. 33rd SOFSEM, pp. 116–126 (2007)
24.
Zurück zum Zitat Kalai, A.: Efficient pattern-matching with don’t cares. In: Proc. 13th SODA, pp. 655–656 (2002) Kalai, A.: Efficient pattern-matching with don’t cares. In: Proc. 13th SODA, pp. 655–656 (2002)
25.
Zurück zum Zitat Lam, T.W., Sung, W.K., Tam, S.L., Yiu, S.M.: Space efficient indexes for string matching with don’t cares. In: Proc. 18th ISAAC, pp. 846–857 (2007) Lam, T.W., Sung, W.K., Tam, S.L., Yiu, S.M.: Space efficient indexes for string matching with don’t cares. In: Proc. 18th ISAAC, pp. 846–857 (2007)
27.
28.
Zurück zum Zitat Lewenstein, M.: Indexing with gaps. In: Proc. 18th SPIRE, pp. 135–143 (2011) Lewenstein, M.: Indexing with gaps. In: Proc. 18th SPIRE, pp. 135–143 (2011)
30.
Zurück zum Zitat Mehldau, G., Myers, G.: A system for pattern matching applications on biosequences. Comput. Appl. Biosci. 9(3), 299–314 (1993) Mehldau, G., Myers, G.: A system for pattern matching applications on biosequences. Comput. Appl. Biosci. 9(3), 299–314 (1993)
31.
Zurück zum Zitat Morgante, M., Policriti, A., Vitacolonna, N., Zuccolo, A.: Structured motifs search. J. Comput. Biol. 12(8), 1065–1082 (2005) CrossRef Morgante, M., Policriti, A., Vitacolonna, N., Zuccolo, A.: Structured motifs search. J. Comput. Biol. 12(8), 1065–1082 (2005) CrossRef
32.
Zurück zum Zitat Myers, E.: Approximate matching of network expressions with spacers. J. Comput. Biol. 3(1), 33–51 (1996) CrossRef Myers, E.: Approximate matching of network expressions with spacers. J. Comput. Biol. 3(1), 33–51 (1996) CrossRef
33.
Zurück zum Zitat Navarro, G., Raffinot, M.: Fast and simple character classes and bounded gaps pattern matching, with applications to protein searching. J. Comput. Biol. 10(6), 903–923 (2003) CrossRef Navarro, G., Raffinot, M.: Fast and simple character classes and bounded gaps pattern matching, with applications to protein searching. J. Comput. Biol. 10(6), 903–923 (2003) CrossRef
34.
Zurück zum Zitat Navarro, G., Baeza-Yates, R., Sutinen, E., Tarhio, J.: Indexing methods for approximate string matching. IEEE Data Eng. Bull. 24(4), 19–27 (2001) Navarro, G., Baeza-Yates, R., Sutinen, E., Tarhio, J.: Indexing methods for approximate string matching. IEEE Data Eng. Bull. 24(4), 19–27 (2001)
35.
Zurück zum Zitat Rahman, M.S., Iliopoulos, C.S., Lee, I., Mohamed, M., Smyth, W.F.: Finding patterns with variable length gaps or don’t cares. In: Proc. 12th COCOON, pp. 146–155 (2006) Rahman, M.S., Iliopoulos, C.S., Lee, I., Mohamed, M., Smyth, W.F.: Finding patterns with variable length gaps or don’t cares. In: Proc. 12th COCOON, pp. 146–155 (2006)
36.
Zurück zum Zitat Sahinalp, S., Vishkin, U.: Efficient approximate and dynamic matching of patterns using a labeling paradigm. In: Proc. 37th FOCS, pp. 320–328 (1996) Sahinalp, S., Vishkin, U.: Efficient approximate and dynamic matching of patterns using a labeling paradigm. In: Proc. 37th FOCS, pp. 320–328 (1996)
37.
Zurück zum Zitat Tam, A., Wu, E., Lam, T., Yiu, S.: Succinct text indexing with wildcards. In: Proc. 16th SPIRE, pp. 39–50 (2009) Tam, A., Wu, E., Lam, T., Yiu, S.: Succinct text indexing with wildcards. In: Proc. 16th SPIRE, pp. 39–50 (2009)
39.
Zurück zum Zitat Weiner, P.: Linear pattern matching algorithms. In: Proc. 14th SWAT, pp. 1–11 (1973) Weiner, P.: Linear pattern matching algorithms. In: Proc. 14th SWAT, pp. 1–11 (1973)
Metadaten
Titel
String Indexing for Patterns with Wildcards
Publikationsdatum
01.07.2014
Erschienen in
Theory of Computing Systems / Ausgabe 1/2014
Print ISSN: 1432-4350
Elektronische ISSN: 1433-0490
DOI
https://doi.org/10.1007/s00224-013-9498-4

Weitere Artikel der Ausgabe 1/2014

Theory of Computing Systems 1/2014 Zur Ausgabe