Skip to main content
Erschienen in: Theory of Computing Systems 1/2020

20.12.2019

Periodicity in Data Streams with Wildcards

verfasst von: Funda Ergün, Elena Grigorescu, Erfan Sadeqi Azer, Samson Zhou

Erschienen in: Theory of Computing Systems | Ausgabe 1/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

We investigate the problem of detecting periodic trends within a string S of length n, arriving in the streaming model, containing at most k wildcard characters, where k = o(n). A wildcard character is a special character that can be assigned any other character. We say that S has wildcard-period p if there exists an assignment to each of the wildcard characters so that in the resulting stream the prefix of length np equals the suffix of length np. We present a two-pass streaming algorithm that computes wildcard-periods of S using \(\mathcal {O}(k^{3} \text {polylog} n)\) bits of space, while we also show that this problem cannot be solved in sublinear space in one pass. We also give a one-pass randomized streaming algorithm that computes all wildcard-periods p of S with \(p<\frac {n}{2}\) and no wildcard characters appearing in the last p symbols of S, using \(\mathcal {O}(k^{3}\log ^{9} n)\) space.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Fußnoten
1
Although wildcard characters are usually denoted with ‘?’, we use ⊥ to differentiate from compilation errors - the https://static-content.springer.com/image/art%3A10.1007%2Fs00224-019-09950-y/MediaObjects/224_2019_9950_Fige_HTML.gifequivalent of wildcard characters
 
Literatur
1.
Zurück zum Zitat Amir, A., Eisenberg, E., Levy, A.: Approximate periodicity. Algorithms and Computation – 21st International Symposium, ISAAC Proceedings, Part I, 25–36 (2010) Amir, A., Eisenberg, E., Levy, A.: Approximate periodicity. Algorithms and Computation – 21st International Symposium, ISAAC Proceedings, Part I, 25–36 (2010)
2.
Zurück zum Zitat Apostolico, A., Galil, Z. (eds.): Pattern Matching Algorithms. Oxford University Press, Oxford (1997)MATH Apostolico, A., Galil, Z. (eds.): Pattern Matching Algorithms. Oxford University Press, Oxford (1997)MATH
3.
Zurück zum Zitat Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. 215(3), 403–410 (1990)CrossRef Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. 215(3), 403–410 (1990)CrossRef
4.
Zurück zum Zitat Andoni, A., Goldberger, A., McGregor, A., Porat, E.: Homomorphic fingerprints under misalignments: sketching edit and shift distances. In: Proceedings of the forty-fifth annual ACM symposium on Theory of computing, pp. 931–940 (2013) Andoni, A., Goldberger, A., McGregor, A., Porat, E.: Homomorphic fingerprints under misalignments: sketching edit and shift distances. In: Proceedings of the forty-fifth annual ACM symposium on Theory of computing, pp. 931–940 (2013)
5.
Zurück zum Zitat Breslauer, D., Galil, Z.: Real-Time Streaming String-Matching. In: Combinatorial Pattern Matching, pp. 162–172. Springer (2011) Breslauer, D., Galil, Z.: Real-Time Streaming String-Matching. In: Combinatorial Pattern Matching, pp. 162–172. Springer (2011)
6.
Zurück zum Zitat Blanchet-Sadri, F.: Algorithmic Combinatorics on Partial Words. Discrete mathematics and its applications. CRC Press (2008) Blanchet-Sadri, F.: Algorithmic Combinatorics on Partial Words. Discrete mathematics and its applications. CRC Press (2008)
7.
Zurück zum Zitat Blanchet-Sadri, F., Mercas, R., Rashin, A., Willett, E.: Periodicity algorithms and a conjecture on overlaps in partial words. Theor. Comput. Sci. 443, 35–45 (2012)MathSciNetCrossRef Blanchet-Sadri, F., Mercas, R., Rashin, A., Willett, E.: Periodicity algorithms and a conjecture on overlaps in partial words. Theor. Comput. Sci. 443, 35–45 (2012)MathSciNetCrossRef
8.
Zurück zum Zitat Clifford, P., Clifford, R.: Simple deterministic wildcard matching. Inf. Process. Lett. 101(2), 53–54 (2007)MathSciNetCrossRef Clifford, P., Clifford, R.: Simple deterministic wildcard matching. Inf. Process. Lett. 101(2), 53–54 (2007)MathSciNetCrossRef
9.
Zurück zum Zitat Clifford, R., Efremenko, K., Porat, E., Rothschild, A.: From coding theory to efficient pattern matching. In: Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 778–784 (2009) Clifford, R., Efremenko, K., Porat, E., Rothschild, A.: From coding theory to efficient pattern matching. In: Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 778–784 (2009)
10.
Zurück zum Zitat Clifford, R., Fontaine, A., Porat, E., Sach, B., Starikovskaya, T.A.: Dictionary Matching in a Stream. In: Algorithms - ESA 23Rd Annual European Symposium, Proceedings, pp. 361–372 (2015) Clifford, R., Fontaine, A., Porat, E., Sach, B., Starikovskaya, T.A.: Dictionary Matching in a Stream. In: Algorithms - ESA 23Rd Annual European Symposium, Proceedings, pp. 361–372 (2015)
11.
Zurück zum Zitat Clifford, R., Fontaine, A., Porat, E., Sach, B., Starikovskaya, T.A.: The k-mismatch problem revisited. In: Proceedings of the 27th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA, pp. 2039–2052 (2016) Clifford, R., Fontaine, A., Porat, E., Sach, B., Starikovskaya, T.A.: The k-mismatch problem revisited. In: Proceedings of the 27th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA, pp. 2039–2052 (2016)
12.
Zurück zum Zitat Cole, R., Hariharan, R.: Verifying candidate matches in sparse and wildcard matching. In: Proceedings on 34th Annual ACM Symposium on Theory of Computing (STOC), pp. 592–601 (2002) Cole, R., Hariharan, R.: Verifying candidate matches in sparse and wildcard matching. In: Proceedings on 34th Annual ACM Symposium on Theory of Computing (STOC), pp. 592–601 (2002)
13.
Zurück zum Zitat Clifford, R., Jalsenius, M., Porat, E., Sach, B.: Space lower bounds for online pattern matching. Theor. Comput. Sci. 483, 68–74 (2013)MathSciNetCrossRef Clifford, R., Jalsenius, M., Porat, E., Sach, B.: Space lower bounds for online pattern matching. Theor. Comput. Sci. 483, 68–74 (2013)MathSciNetCrossRef
14.
Zurück zum Zitat Clifford, R., Kociumaka, T., Porat, E.: The streaming k-mismatch problem. In: Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA, pp. 1106–1125 (2019)CrossRef Clifford, R., Kociumaka, T., Porat, E.: The streaming k-mismatch problem. In: Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA, pp. 1106–1125 (2019)CrossRef
15.
Zurück zum Zitat Crouch, M.S., McGregor, A.: Periodicity and Cyclic Shifts via Linear Sketches. In: Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques - 14Th International Workshop, APPROX, and 15th International Workshop, RANDOM. Proceedings, pp. 158–170 (2011)CrossRef Crouch, M.S., McGregor, A.: Periodicity and Cyclic Shifts via Linear Sketches. In: Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques - 14Th International Workshop, APPROX, and 15th International Workshop, RANDOM. Proceedings, pp. 158–170 (2011)CrossRef
16.
Zurück zum Zitat Elfeky, M.G., Aref, W.G., Elmagarmid, A.K.: STAGGER: periodicity mining of data streams using expanding sliding windows. In: Proceedings of the 6th IEEE International Conference on Data Mining (ICDM), pp. 188–199 (2006) Elfeky, M.G., Aref, W.G., Elmagarmid, A.K.: STAGGER: periodicity mining of data streams using expanding sliding windows. In: Proceedings of the 6th IEEE International Conference on Data Mining (ICDM), pp. 188–199 (2006)
17.
Zurück zum Zitat Ergu̇n, F., Grigorescu, E., Azer, E.S., Zhou, S.: Streaming Periodicity with Mismatches. In: Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, APPROX/RANDOM, pp. 42:1–42:21 (2017) Ergu̇n, F., Grigorescu, E., Azer, E.S., Zhou, S.: Streaming Periodicity with Mismatches. In: Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, APPROX/RANDOM, pp. 42:1–42:21 (2017)
18.
Zurück zum Zitat Ergu̇n, F., Jowhari, H., Saglam, M.: Periodicity in Streams. In: Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, 13Th International Workshop, APPROX 2010, and 14Th International Workshop, RANDOM 2010. Proceedings, pp. 545–559 (2010) Ergu̇n, F., Jowhari, H., Saglam, M.: Periodicity in Streams. In: Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, 13Th International Workshop, APPROX 2010, and 14Th International Workshop, RANDOM 2010. Proceedings, pp. 545–559 (2010)
19.
Zurück zum Zitat Ergu̇n, F., Muthukrishnan, S., Sahinalp, S.C.: Periodicity testing with sublinear samples and space. ACM Trans. Algorithm. 6(2), 43:1–43:14 (2010)MathSciNetCrossRef Ergu̇n, F., Muthukrishnan, S., Sahinalp, S.C.: Periodicity testing with sublinear samples and space. ACM Trans. Algorithm. 6(2), 43:1–43:14 (2010)MathSciNetCrossRef
20.
Zurück zum Zitat Gawrychowski, P.: Optimal pattern matching in lzw compressed strings. ACM Trans. Algorithm. (TALG) 9(3), 25 (2013)MathSciNetMATH Gawrychowski, P.: Optimal pattern matching in lzw compressed strings. ACM Trans. Algorithm. (TALG) 9(3), 25 (2013)MathSciNetMATH
21.
Zurück zum Zitat Golan, S., Kopelowitz, T., Porat, E.: Streaming Pattern Matching with D Wildcards. In: 24Th Annual European Symposium on Algorithms, pp. 44:1–44:16 (2016) Golan, S., Kopelowitz, T., Porat, E.: Streaming Pattern Matching with D Wildcards. In: 24Th Annual European Symposium on Algorithms, pp. 44:1–44:16 (2016)
22.
23.
Zurück zum Zitat Hermelin, D., Rozenberg, L.: Parameterized Complexity Analysis for the Closest String with Wildcards Problem. In: Combinatorial Pattern Matching - 25Th Annual Symposium, CPM Proceedings, pp. 140–149 (2014)CrossRef Hermelin, D., Rozenberg, L.: Parameterized Complexity Analysis for the Closest String with Wildcards Problem. In: Combinatorial Pattern Matching - 25Th Annual Symposium, CPM Proceedings, pp. 140–149 (2014)CrossRef
24.
Zurück zum Zitat Indyk, P., Koudas, N., Muthukrishnan, S.: Identifying representative trends in massive time series data sets using sketches. In: VLDB, Proceedings of 26th International Conference on Very Large Data Bases, pp. 363–372 (2000) Indyk, P., Koudas, N., Muthukrishnan, S.: Identifying representative trends in massive time series data sets using sketches. In: VLDB, Proceedings of 26th International Conference on Very Large Data Bases, pp. 363–372 (2000)
25.
Zurück zum Zitat Indyk, P.: Faster algorithms for string matching problems: Matching the Convolution Bound. In: 39Th Annual Symposium on Foundations of Computer Science, FOCS, pp. 166–173 (1998) Indyk, P.: Faster algorithms for string matching problems: Matching the Convolution Bound. In: 39Th Annual Symposium on Foundations of Computer Science, FOCS, pp. 166–173 (1998)
26.
Zurück zum Zitat Kalai, A.: Efficient pattern-matching with don’t cares. In: Proceedings of the Thirteenth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 655–656 (2002) Kalai, A.: Efficient pattern-matching with don’t cares. In: Proceedings of the Thirteenth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 655–656 (2002)
27.
Zurück zum Zitat Knuth, D.E., Morris, J.H. Jr, Pratt, V.R.: Fast pattern matching in strings. SIAM J. Comput. 6(2), 323–350 (1977)MathSciNetCrossRef Knuth, D.E., Morris, J.H. Jr, Pratt, V.R.: Fast pattern matching in strings. SIAM J. Comput. 6(2), 323–350 (1977)MathSciNetCrossRef
28.
Zurück zum Zitat Kane, D.M., Nelson, J., Woodruff, D.P.: An optimal algorithm for the distinct elements problem. In: Proceedings of the Twenty-Ninth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS, pp. 41–52 (2010) Kane, D.M., Nelson, J., Woodruff, D.P.: An optimal algorithm for the distinct elements problem. In: Proceedings of the Twenty-Ninth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS, pp. 41–52 (2010)
29.
Zurück zum Zitat Karp, R.M., Rabin, M.O.: Efficient randomized pattern-matching algorithms. IBM. J. Res. Dev. 31(2), 249–260 (1987)MATH Karp, R.M., Rabin, M.O.: Efficient randomized pattern-matching algorithms. IBM. J. Res. Dev. 31(2), 249–260 (1987)MATH
31.
Zurück zum Zitat Lewenstein, M., Nekrich, Y., Vitter, J.S.: Space-Efficient String Indexing for Wildcard Pattern Matching. In: 31St International Symposium on Theoretical Aspects of Computer Science (STACS), pp. 506–517 (2014) Lewenstein, M., Nekrich, Y., Vitter, J.S.: Space-Efficient String Indexing for Wildcard Pattern Matching. In: 31St International Symposium on Theoretical Aspects of Computer Science (STACS), pp. 506–517 (2014)
33.
Zurück zum Zitat Manea, F., Mercas, R., Tiseanu, C.: An algorithmic toolbox for periodic partial words. Discret. Appl. Math. 179, 174–192 (2014)MathSciNetCrossRef Manea, F., Mercas, R., Tiseanu, C.: An algorithmic toolbox for periodic partial words. Discret. Appl. Math. 179, 174–192 (2014)MathSciNetCrossRef
34.
Zurück zum Zitat Muthukrishnan, S., Ramesh, H.: String matching under a general matching relation. Inf. Comput. 122(1), 140–148 (1995)MathSciNetCrossRef Muthukrishnan, S., Ramesh, H.: String matching under a general matching relation. Inf. Comput. 122(1), 140–148 (1995)MathSciNetCrossRef
35.
Zurück zum Zitat Porat, E., Lipsky, O.: Improved Sketching of Hamming Distance with Error Correcting. In: Annual Symposium on Combinatorial Pattern Matching, pp. 173–182 (2007) Porat, E., Lipsky, O.: Improved Sketching of Hamming Distance with Error Correcting. In: Annual Symposium on Combinatorial Pattern Matching, pp. 173–182 (2007)
36.
Zurück zum Zitat Porat, B., Porat, E.: Exact and Approximate Pattern Matching in the Streaming Model. In: 50Th Annual IEEE Symposium on Foundations of Computer Science, FOCS, pp. 315–323 (2009) Porat, B., Porat, E.: Exact and Approximate Pattern Matching in the Streaming Model. In: 50Th Annual IEEE Symposium on Foundations of Computer Science, FOCS, pp. 315–323 (2009)
37.
Zurück zum Zitat Radoszewski, J., Starikovskaya, T.A.: Streaming K-Mismatch with Error Correcting and Applications. In: 2017 Data Compression Conference, DCC, pp. 290–299 (2017) Radoszewski, J., Starikovskaya, T.A.: Streaming K-Mismatch with Error Correcting and Applications. In: 2017 Data Compression Conference, DCC, pp. 290–299 (2017)
Metadaten
Titel
Periodicity in Data Streams with Wildcards
verfasst von
Funda Ergün
Elena Grigorescu
Erfan Sadeqi Azer
Samson Zhou
Publikationsdatum
20.12.2019
Verlag
Springer US
Erschienen in
Theory of Computing Systems / Ausgabe 1/2020
Print ISSN: 1432-4350
Elektronische ISSN: 1433-0490
DOI
https://doi.org/10.1007/s00224-019-09950-y

Weitere Artikel der Ausgabe 1/2020

Theory of Computing Systems 1/2020 Zur Ausgabe