Skip to main content

2015 | OriginalPaper | Buchkapitel

Dictionary Matching with Uneven Gaps

verfasst von : Wing-Kai Hon, Tak-Wah Lam, Rahul Shah, Sharma V. Thankachan, Hing-Fung Ting, Yilin Yang

Erschienen in: Combinatorial Pattern Matching

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

A gap-pattern is a sequence of sub-patterns separated by bounded sequences of don’t care characters (called gaps). A one-gap-pattern is a pattern of the form \(P[\alpha ,\beta ]Q\), where \(P\) and \(Q\) are strings drawn from alphabet \(\varSigma \) and \([\alpha , \beta ]\) are lower and upper bounds on the gap size \(g\). The gap size \(g\) is the number of don’t care characters between \(P\) and \(Q\). The dictionary matching problem with one-gap is to index a collection of one-gap-patterns, so as to identify all sub-strings of a query text \(T\) that match with any one-gap-pattern in the collection. Let \({\mathcal D}\) be such a collection of \(d\) patterns, where \({\mathcal D}=\{P_i[\alpha _i,\beta _i]Q_i\mid 1\le i \le d\}\). Let \(n=\sum _{i=1}^d|P_i|+|Q_i|\). Let \(\gamma \) and \(\lambda \) be two parameters defined on \({\mathcal D}\) as follows: \(\gamma = |\{j\mid j \in [\alpha _i,\beta _i], 1\le i\le d\}|\) and \(\lambda = |\{\alpha _i,\beta _i \mid 1\le i\le d\}|\). Specifically \(\gamma \) is the total number gap lengths possible over all patterns in \({\mathcal D}\) and \(\lambda \) is the number of distinct gap boundaries across all the patterns. We present a linear space solution (i.e., \(O(n)\) words) for answering a dictionary matching query on \({\mathcal D}\) in time \(O(|T| \gamma \log \lambda \log d+occ)\), where \(occ\) is the output size. The query time can be improved to \(O(|T|\gamma +occ)\) using \(O(n+d^{1+\epsilon })\) space, where \(\epsilon >0\) is an arbitrarily small constant. Additionally, we show a compact/succinct space index offering a space-time trade-off. In the special case where parameters \(\alpha _i\) and \(\beta _i\)’s for all the patterns are same, our results improve upon the work by Amir et al. [CPM, 2014]. We also explore several related cases where gaps can occur at arbitrary locations and where gap can be induced in the text rather than pattern.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
This is just for the ease ensuring some properties.
 
2
The lower bound \(\alpha _i\) is redundant in this case and is set to zero. Otherwise, we can always omit first \(\alpha _i\) characters from \(Q_i\) obtaining \(Q_i'\) and work with \(P_i[\beta _i-\alpha _i]Q_i'\).
 
Literatur
1.
Zurück zum Zitat Afshani, P., Arge, L., Larsen, K.G.: Higher-dimensional orthogonal range reporting and rectangle stabbing in the pointer machine model. In: Symposuim on Computational Geometry 2012, SoCG 2012, Chapel Hill, NC, USA, pp. 323–332, 17–20 June 2012 Afshani, P., Arge, L., Larsen, K.G.: Higher-dimensional orthogonal range reporting and rectangle stabbing in the pointer machine model. In: Symposuim on Computational Geometry 2012, SoCG 2012, Chapel Hill, NC, USA, pp. 323–332, 17–20 June 2012
2.
3.
Zurück zum Zitat Amir, A., Farach, M.: Adaptive dictionary matching. In: 32nd Annual Symposium on Foundations of Computer Science, San Juan, Puerto Rico, pp. 760–766, 1–4 October 1991 Amir, A., Farach, M.: Adaptive dictionary matching. In: 32nd Annual Symposium on Foundations of Computer Science, San Juan, Puerto Rico, pp. 760–766, 1–4 October 1991
4.
Zurück zum Zitat Amir, A., Farach, M., Idury, R.M., Poutré, J.A.L., Schäffer, A.A.: Improved dynamic dictionary matching. Inf. Comput. 119(2), 258–282 (1995)MATHCrossRef Amir, A., Farach, M., Idury, R.M., Poutré, J.A.L., Schäffer, A.A.: Improved dynamic dictionary matching. Inf. Comput. 119(2), 258–282 (1995)MATHCrossRef
5.
Zurück zum Zitat Amir, A., Keselman, D., Landau, G.M., Lewenstein, M., Lewenstein, N., Rodeh, M.: Text indexing and dictionary matching with one error. J. Algorithms 37(2), 309–325 (2000)MATHMathSciNetCrossRef Amir, A., Keselman, D., Landau, G.M., Lewenstein, M., Lewenstein, N., Rodeh, M.: Text indexing and dictionary matching with one error. J. Algorithms 37(2), 309–325 (2000)MATHMathSciNetCrossRef
6.
Zurück zum Zitat Amir, A., Levy, A., Porat, E., Shalom, B.R.: Dictionary matching with one gap. In: Kulikov, A.S., Kuznetsov, S.O., Pevzner, P. (eds.) CPM 2014. LNCS, vol. 8486, pp. 11–20. Springer, Heidelberg (2014) Amir, A., Levy, A., Porat, E., Shalom, B.R.: Dictionary matching with one gap. In: Kulikov, A.S., Kuznetsov, S.O., Pevzner, P. (eds.) CPM 2014. LNCS, vol. 8486, pp. 11–20. Springer, Heidelberg (2014)
7.
Zurück zum Zitat Belazzougui, D.: Succinct dictionary matching with no slowdown. In: Amir, A., Parida, L. (eds.) CPM 2010. LNCS, vol. 6129, pp. 88–100. Springer, Heidelberg (2010) CrossRef Belazzougui, D.: Succinct dictionary matching with no slowdown. In: Amir, A., Parida, L. (eds.) CPM 2010. LNCS, vol. 6129, pp. 88–100. Springer, Heidelberg (2010) CrossRef
8.
Zurück zum Zitat Boyer, R.S., Moore, J.S.: A fast string searching algorithm. Commun. ACM 20(10), 762–772 (1977)MATHCrossRef Boyer, R.S., Moore, J.S.: A fast string searching algorithm. Commun. ACM 20(10), 762–772 (1977)MATHCrossRef
9.
Zurück zum Zitat Chan, T.M., Larsen, K.G., Patrascu, M.: Orthogonal range searching on the RAM, revisited. In: Proceedings of the 27th ACM Symposium on Computational Geometry, Paris, France, pp. 1–10, 13–15 June 2011 Chan, T.M., Larsen, K.G., Patrascu, M.: Orthogonal range searching on the RAM, revisited. In: Proceedings of the 27th ACM Symposium on Computational Geometry, Paris, France, pp. 1–10, 13–15 June 2011
11.
Zurück zum Zitat Cole, R., Gottlieb, L., Lewenstein, M.: Dictionary matching and indexing with errors and don’t cares. In: Proceedings of the 36th Annual ACM Symposium on Theory of Computing, Chicago, IL, USA, pp. 91–100, 13–16 June 2004 Cole, R., Gottlieb, L., Lewenstein, M.: Dictionary matching and indexing with errors and don’t cares. In: Proceedings of the 36th Annual ACM Symposium on Theory of Computing, Chicago, IL, USA, pp. 91–100, 13–16 June 2004
12.
Zurück zum Zitat Feigenblat, G., Porat, E., Shiftan, A.: An improved query time for succinct dynamic dictionary matching. In: Kulikov, A.S., Kuznetsov, S.O., Pevzner, P. (eds.) CPM 2014. LNCS, vol. 8486, pp. 120–129. Springer, Heidelberg (2014) Feigenblat, G., Porat, E., Shiftan, A.: An improved query time for succinct dynamic dictionary matching. In: Kulikov, A.S., Kuznetsov, S.O., Pevzner, P. (eds.) CPM 2014. LNCS, vol. 8486, pp. 120–129. Springer, Heidelberg (2014)
14.
Zurück zum Zitat Fredriksson, K., Grabowski, S.: Efficient algorithms for pattern matching with general gaps, character classes, and transposition invariance. Inf. Retr. 11(4), 335–357 (2008)CrossRef Fredriksson, K., Grabowski, S.: Efficient algorithms for pattern matching with general gaps, character classes, and transposition invariance. Inf. Retr. 11(4), 335–357 (2008)CrossRef
15.
Zurück zum Zitat Grossi, R., Vitter, J.S.: Compressed suffix arrays and suffix trees with applications to text indexing and string matching. SIAM J. Comput. 35(2), 378–407 (2005)MATHMathSciNetCrossRef Grossi, R., Vitter, J.S.: Compressed suffix arrays and suffix trees with applications to text indexing and string matching. SIAM J. Comput. 35(2), 378–407 (2005)MATHMathSciNetCrossRef
16.
Zurück zum Zitat Hofmann, K., Bucher, P., Falquet, L., Bairoch, A.: The PROSITE database, its status in 1999. Nucleic Acids Res. 27(1), 215–219 (1999)CrossRef Hofmann, K., Bucher, P., Falquet, L., Bairoch, A.: The PROSITE database, its status in 1999. Nucleic Acids Res. 27(1), 215–219 (1999)CrossRef
17.
Zurück zum Zitat Hon, W., Ku, T., Shah, R., Thankachan, S.V., Vitter, J.S.: Compressed dictionary matching with one error. In: 2011 Data Compression Conference (DCC 2011), Snowbird, UT, USA, pp. 113–122, 29–31 March 2011 Hon, W., Ku, T., Shah, R., Thankachan, S.V., Vitter, J.S.: Compressed dictionary matching with one error. In: 2011 Data Compression Conference (DCC 2011), Snowbird, UT, USA, pp. 113–122, 29–31 March 2011
18.
Zurück zum Zitat Hon, W., Ku, T., Shah, R., Thankachan, S.V., Vitter, J.S.: Faster compressed dictionary matching. Theor. Comput. Sci. 475, 113–119 (2013)MATHMathSciNetCrossRef Hon, W., Ku, T., Shah, R., Thankachan, S.V., Vitter, J.S.: Faster compressed dictionary matching. Theor. Comput. Sci. 475, 113–119 (2013)MATHMathSciNetCrossRef
19.
Zurück zum Zitat Hon, W., Lam, T.W., Shah, R., Tam, S., Vitter, J.S.: Compressed index for dictionary matching. In: 2008 Data Compression Conference (DCC 2008), Snowbird, UT, USA, pp. 23–32, 25–27 March 2008 Hon, W., Lam, T.W., Shah, R., Tam, S., Vitter, J.S.: Compressed index for dictionary matching. In: 2008 Data Compression Conference (DCC 2008), Snowbird, UT, USA, pp. 23–32, 25–27 March 2008
20.
Zurück zum Zitat Hon, W.-K., Lam, T.-W., Shah, R., Tam, S.-L., Vitter, J.S.: Succinct index for dynamic dictionary matching. In: Dong, Y., Du, D.-Z., Ibarra, O. (eds.) ISAAC 2009. LNCS, vol. 5878, pp. 1034–1043. Springer, Heidelberg (2009) CrossRef Hon, W.-K., Lam, T.-W., Shah, R., Tam, S.-L., Vitter, J.S.: Succinct index for dynamic dictionary matching. In: Dong, Y., Du, D.-Z., Ibarra, O. (eds.) ISAAC 2009. LNCS, vol. 5878, pp. 1034–1043. Springer, Heidelberg (2009) CrossRef
21.
Zurück zum Zitat Hon, W.-K., Ku, T.-H., Lam, T.-W., Shah, R., Tam, S.-L., Thankachan, S.V., Vitter, J.S.: Compressing dictionary matching index via sparsification technique. Algorithmica 72(2), 515–538 (2015)MathSciNetCrossRef Hon, W.-K., Ku, T.-H., Lam, T.-W., Shah, R., Tam, S.-L., Thankachan, S.V., Vitter, J.S.: Compressing dictionary matching index via sparsification technique. Algorithmica 72(2), 515–538 (2015)MathSciNetCrossRef
22.
23.
Zurück zum Zitat Karpinski, M., Nekrich, Y.: Space efficient multi-dimensional range reporting. In: Ngo, H.Q. (ed.) COCOON 2009. LNCS, vol. 5609, pp. 215–224. Springer, Heidelberg (2009) CrossRef Karpinski, M., Nekrich, Y.: Space efficient multi-dimensional range reporting. In: Ngo, H.Q. (ed.) COCOON 2009. LNCS, vol. 5609, pp. 215–224. Springer, Heidelberg (2009) CrossRef
24.
25.
Zurück zum Zitat Lewenstein, M.: Dictionary matching. In: Kao, M.-Y. (ed.) Encyclopedia of Algorithms, pp. 1–6. Springer, US (2015) Lewenstein, M.: Dictionary matching. In: Kao, M.-Y. (ed.) Encyclopedia of Algorithms, pp. 1–6. Springer, US (2015)
26.
27.
Zurück zum Zitat Mehldau, G., Myers, G.: A system for pattern matching applications on biosequences. Comput. Appl. Biosci. 9(3), 299–314 (1993) Mehldau, G., Myers, G.: A system for pattern matching applications on biosequences. Comput. Appl. Biosci. 9(3), 299–314 (1993)
28.
Zurück zum Zitat Navarro, G., Raffinot, M.: Fast and simple character classes and bounded gaps pattern matching, with applications to protein searching. J. Comput. Biol. 10(6), 903–923 (2003)CrossRef Navarro, G., Raffinot, M.: Fast and simple character classes and bounded gaps pattern matching, with applications to protein searching. J. Comput. Biol. 10(6), 903–923 (2003)CrossRef
30.
Zurück zum Zitat Weiner, P.: Linear pattern matching algorithms. In: 14th Annual Symposium on Switching and Automata Theory, Iowa City, Iowa, USA, pp. 1–11, 15–17 October 1973 Weiner, P.: Linear pattern matching algorithms. In: 14th Annual Symposium on Switching and Automata Theory, Iowa City, Iowa, USA, pp. 1–11, 15–17 October 1973
Metadaten
Titel
Dictionary Matching with Uneven Gaps
verfasst von
Wing-Kai Hon
Tak-Wah Lam
Rahul Shah
Sharma V. Thankachan
Hing-Fung Ting
Yilin Yang
Copyright-Jahr
2015
DOI
https://doi.org/10.1007/978-3-319-19929-0_21