Skip to main content
Erschienen in: Progress in Artificial Intelligence 1/2014

01.08.2014 | Regular Paper

Constructing fading histograms from data streams

verfasst von: Raquel Sebastião, João Gama, Teresa Mendonça

Erschienen in: Progress in Artificial Intelligence | Ausgabe 1/2014

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The ability to collect data is changing drastically. Nowadays, data are gathered in the form of transient and finite data streams. Memory restrictions preclude keeping all received data in memory. When dealing with massive data streams, it is mandatory to create compact representations of data, also known as synopses structures or summaries. Reducing memory occupancy is of utmost importance when handling a huge amount of data. This paper addresses the problem of constructing histograms from data streams under error constraints. When constructing online histograms from data streams there are two main characteristics to embrace: the updating facility and the error of the histogram. Moreover, in dynamic environments, besides the need of compact summaries to capture the most important properties of data, it is also essential to forget old data. Therefore, this paper presents sliding histograms and fading histograms, an abrupt and a smooth strategies to forget outdated data.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
1
The square error is one of the most used error measures in histogram construction. It is also known as the V-Optimal measure and was introduced by [14].
 
Literatur
1.
Zurück zum Zitat Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: Proceedings of the 21st ACM SIGMOD–SIGACT–SIGART Symposium on Principles of Database Systems, PODS ’02, pp. 1–16. ACM, New York (2002). doi10.1145/543613.543615 Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: Proceedings of the 21st ACM SIGMOD–SIGACT–SIGART Symposium on Principles of Database Systems, PODS ’02, pp. 1–16. ACM, New York (2002). doi10.​1145/​543613.​543615
3.
Zurück zum Zitat Chakrabarti, K., Garofalakis, M.N., Rastogi, R., Shim, K.: Approximate query processing using wavelets. In: Abbadi, A.E., Brodie, M.L., Chakravarthy, S., Dayal, U., Kamel, N., Schlageter, G., Whang, K.Y. (eds.) VLDB 2000. Proceedings of 26th International Conference on Very Large Data Bases, 10–14 September 2000, Cairo, pp. 111–122. Morgan Kaufmann, Burlington (2000) Chakrabarti, K., Garofalakis, M.N., Rastogi, R., Shim, K.: Approximate query processing using wavelets. In: Abbadi, A.E., Brodie, M.L., Chakravarthy, S., Dayal, U., Kamel, N., Schlageter, G., Whang, K.Y. (eds.) VLDB 2000. Proceedings of 26th International Conference on Very Large Data Bases, 10–14 September 2000, Cairo, pp. 111–122. Morgan Kaufmann, Burlington (2000)
5.
Zurück zum Zitat Cormode, G., Muthukrishnan, S.: What’s hot and what’s not: tracking most frequent items dynamically. ACM Trans. Database Syst. 30(1), 249–278 (2005). doi:10.1145/1061318.1061325 Cormode, G., Muthukrishnan, S.: What’s hot and what’s not: tracking most frequent items dynamically. ACM Trans. Database Syst. 30(1), 249–278 (2005). doi:10.​1145/​1061318.​1061325
7.
Zurück zum Zitat Freedman, D., Diaconis, P.: On the histogram as a density estimator: L2 theory. Probab. Theory Relat. Fields 57(4), 453–476 (1981). doi:10.1007/BF01025868 Freedman, D., Diaconis, P.: On the histogram as a density estimator: L2 theory. Probab. Theory Relat. Fields 57(4), 453–476 (1981). doi:10.​1007/​BF01025868
8.
9.
Zurück zum Zitat Gibbons, P.B., Matias, Y.: Synopsis data structures for massive data sets. In: ACM–SIAM Symposium on Discrete Algorithms, pp. 909–910 (1999). doi:10.1145/314500.315083 Gibbons, P.B., Matias, Y.: Synopsis data structures for massive data sets. In: ACM–SIAM Symposium on Discrete Algorithms, pp. 909–910 (1999). doi:10.​1145/​314500.​315083
10.
Zurück zum Zitat Gilbert, A.C., Kotidis, Y., Muthukrishnan, S., Strauss, M.J.: One-pass wavelet decompositions of data streams. IEEE Trans. Knowl. Data Eng. 15(3), 541–554 (2003). doi:10.1109/TKDE.2003.1198389 Gilbert, A.C., Kotidis, Y., Muthukrishnan, S., Strauss, M.J.: One-pass wavelet decompositions of data streams. IEEE Trans. Knowl. Data Eng. 15(3), 541–554 (2003). doi:10.​1109/​TKDE.​2003.​1198389
11.
Zurück zum Zitat Guha, S., Koudas, N., Shim, K.: Approximation and streaming algorithms for histogram construction problems. ACM Trans. Database Syst. 31(1), 396–438 (2006). doi:10.1145/1132863.1132873 Guha, S., Koudas, N., Shim, K.: Approximation and streaming algorithms for histogram construction problems. ACM Trans. Database Syst. 31(1), 396–438 (2006). doi:10.​1145/​1132863.​1132873
12.
Zurück zum Zitat Guha, S., Shim, K., Woo, J.: Rehist: relative error histogram construction algorithms. In: Proceedings of the 30th International Conference on Very Large Data Bases, pp. 300–311 (2004) Guha, S., Shim, K., Woo, J.: Rehist: relative error histogram construction algorithms. In: Proceedings of the 30th International Conference on Very Large Data Bases, pp. 300–311 (2004)
14.
Zurück zum Zitat Ioannidis, Y.E., Poosala, V.: Balancing histogram optimality and practicality for query result size estimation. In: Carey, M.J., Schneider, D.A. (eds.) Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data, San Jose, 22–25 May 1995, pp. 233–244. ACM Press, New York (1995) Ioannidis, Y.E., Poosala, V.: Balancing histogram optimality and practicality for query result size estimation. In: Carey, M.J., Schneider, D.A. (eds.) Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data, San Jose, 22–25 May 1995, pp. 233–244. ACM Press, New York (1995)
15.
Zurück zum Zitat Jagadish, H.V., Koudas, N., Muthukrishnan, S., Poosala, V., Sevcik, K.C., Suel, T.: Optimal histograms with quality guarantees. In: Proceedings of the 24th International Conference on Very Large Data Bases, VLDB ’98, pp. 275–286. Morgan Kaufmann Publishers Inc., San Francisco (1998). http://dl.acm.org/citation.cfm?id=645924.671191 Jagadish, H.V., Koudas, N., Muthukrishnan, S., Poosala, V., Sevcik, K.C., Suel, T.: Optimal histograms with quality guarantees. In: Proceedings of the 24th International Conference on Very Large Data Bases, VLDB ’98, pp. 275–286. Morgan Kaufmann Publishers Inc., San Francisco (1998). http://​dl.​acm.​org/​citation.​cfm?​id=​645924.​671191
17.
19.
Zurück zum Zitat Poosala, V., Ioannidis, Y.E., Haas, P.J., Shekita, E.J.: Improved histograms for selectivity estimation of range predicates. In: SIGMOD Conference, pp. 294–305 (1996) Poosala, V., Ioannidis, Y.E., Haas, P.J., Shekita, E.J.: Improved histograms for selectivity estimation of range predicates. In: SIGMOD Conference, pp. 294–305 (1996)
20.
Zurück zum Zitat Rodrigues, P., Gama, J., Sebastipo, R.: Memoryless fading windows in ubiquitous settings. In: Proceedings of Ubiquitous Data Mining (UDM) Workshop, in conjunction with the 19th European Conference on Artificial Intelligence, ECAI 2010, pp. 27–32 (2010) Rodrigues, P., Gama, J., Sebastipo, R.: Memoryless fading windows in ubiquitous settings. In: Proceedings of Ubiquitous Data Mining (UDM) Workshop, in conjunction with the 19th European Conference on Artificial Intelligence, ECAI 2010, pp. 27–32 (2010)
22.
Zurück zum Zitat Street, W.N., Kim, Y.: A streaming ensemble algorithm (sea) for large-scale classification. In: Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 377–382. ACM Press, New York (2001) Street, W.N., Kim, Y.: A streaming ensemble algorithm (sea) for large-scale classification. In: Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 377–382. ACM Press, New York (2001)
23.
Zurück zum Zitat Sturges, H.A.: The choice of a class interval. Am. Stat. Assoc. 21, 65–66 (1926)CrossRef Sturges, H.A.: The choice of a class interval. Am. Stat. Assoc. 21, 65–66 (1926)CrossRef
Metadaten
Titel
Constructing fading histograms from data streams
verfasst von
Raquel Sebastião
João Gama
Teresa Mendonça
Publikationsdatum
01.08.2014
Verlag
Springer Berlin Heidelberg
Erschienen in
Progress in Artificial Intelligence / Ausgabe 1/2014
Print ISSN: 2192-6352
Elektronische ISSN: 2192-6360
DOI
https://doi.org/10.1007/s13748-014-0050-9

Weitere Artikel der Ausgabe 1/2014

Progress in Artificial Intelligence 1/2014 Zur Ausgabe

Premium Partner