Skip to main content

2015 | OriginalPaper | Buchkapitel

Weighted Random Sampling over Data Streams

verfasst von : Pavlos S. Efraimidis

Erschienen in: Algorithms, Probability, Networks, and Games

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this work, we present a comprehensive treatment of weighted random sampling (WRS) over data streams. More precisely, we examine two natural interpretations of the item weights, describe an existing algorithm for each case [3, 8], discuss sampling with and without replacement and show adaptations of the algorithms for several WRS problems and evolving data streams.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
We say “supposed” because even though WRS is best described with a sequential sampling procedure, it is not inherently sequential. Algorithm A-ES [8] which we will use to solve WRS-W problems can be executed on sequential, parallel and distributed settings.
 
Literatur
1.
Zurück zum Zitat Aggarwal, C.C.: On biased reservoir sampling in the presence of stream evolution. In: VLDB 2006: Proceedings of the 32nd International Conference on Very Large Data Bases, pp. 607–618. VLDB Endowment (2006) Aggarwal, C.C.: On biased reservoir sampling in the presence of stream evolution. In: VLDB 2006: Proceedings of the 32nd International Conference on Very Large Data Bases, pp. 607–618. VLDB Endowment (2006)
2.
Zurück zum Zitat Al-Kateb, M., Lee, B.S.: Adaptive stratified reservoir sampling over heterogeneous data streams. Inf. Syst. 39, 199–216 (2014)CrossRef Al-Kateb, M., Lee, B.S.: Adaptive stratified reservoir sampling over heterogeneous data streams. Inf. Syst. 39, 199–216 (2014)CrossRef
4.
Zurück zum Zitat Cormode, G., Muthukrishnan, S., Yi, K., Zhang, Q.: Optimal sampling from distributed streams. In: Proceedings of the Twenty-ninth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2010, pp. 77–86. ACM, New York (2010) Cormode, G., Muthukrishnan, S., Yi, K., Zhang, Q.: Optimal sampling from distributed streams. In: Proceedings of the Twenty-ninth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2010, pp. 77–86. ACM, New York (2010)
5.
Zurück zum Zitat Cormode, G., Muthukrishnan, S., Yi, K., Zhang, Q.: Continuous sampling from distributed streams. J. ACM 59(2), 10:1–10:25 (2012)MathSciNetCrossRefMATH Cormode, G., Muthukrishnan, S., Yi, K., Zhang, Q.: Continuous sampling from distributed streams. J. ACM 59(2), 10:1–10:25 (2012)MathSciNetCrossRefMATH
6.
Zurück zum Zitat Cormode, G., Shkapenyuk, V., Srivastava, D., Xu, B.: Forward decay: a practical time decay model for streaming systems. In: Proceedings of the 2009 IEEE International Conference on Data Engineering, ICDE 2009, pp. 138–149. IEEE Computer Society, Washington, DC (2009) Cormode, G., Shkapenyuk, V., Srivastava, D., Xu, B.: Forward decay: a practical time decay model for streaming systems. In: Proceedings of the 2009 IEEE International Conference on Data Engineering, ICDE 2009, pp. 138–149. IEEE Computer Society, Washington, DC (2009)
7.
8.
9.
Zurück zum Zitat Goldberg, G., Harnik, D., Sotnikov, D.: The case for sampling on very large file systems. In: 30th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–11, June 2014 Goldberg, G., Harnik, D., Sotnikov, D.: The case for sampling on very large file systems. In: 30th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–11, June 2014
10.
Zurück zum Zitat Hu, X., Qiao, M., Tao, Y.: Independent range sampling. In: Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2014, pp. 246–255. ACM, New York (2014) Hu, X., Qiao, M., Tao, Y.: Independent range sampling. In: Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2014, pp. 246–255. ACM, New York (2014)
11.
Zurück zum Zitat Knuth, D.E.: The Art of Computer Programming: Seminumerical Algorithms, vol. 2, 2nd edn. Addison-Wesley Publishing Company, Reading (1981)MATH Knuth, D.E.: The Art of Computer Programming: Seminumerical Algorithms, vol. 2, 2nd edn. Addison-Wesley Publishing Company, Reading (1981)MATH
12.
Zurück zum Zitat Li, K.-H.: Reservoir-sampling algorithms of time complexity o(n(1 + log(n/n))). ACM Trans. Math. Softw. 20(4), 481–493 (1994)CrossRefMATH Li, K.-H.: Reservoir-sampling algorithms of time complexity o(n(1 + log(n/n))). ACM Trans. Math. Softw. 20(4), 481–493 (1994)CrossRefMATH
13.
Zurück zum Zitat Longbo, Z., Zhanhuai, L., Yiqiang, Z., Min, Y., Yang, Z.: A priority random sampling algorithm for time-based sliding windows over weighted streaming data. In: Proceedings of the 2007 ACM Symposium on Applied Computing, SAC 2007, pp. 453–456. ACM, New York (2007) Longbo, Z., Zhanhuai, L., Yiqiang, Z., Min, Y., Yang, Z.: A priority random sampling algorithm for time-based sliding windows over weighted streaming data. In: Proceedings of the 2007 ACM Symposium on Applied Computing, SAC 2007, pp. 453–456. ACM, New York (2007)
14.
Zurück zum Zitat Olken, F.: Random sampling from databases. Ph.D. thesis, Department of Computer Science, University of California at Berkeley (1993) Olken, F.: Random sampling from databases. Ph.D. thesis, Department of Computer Science, University of California at Berkeley (1993)
15.
Zurück zum Zitat Tirthapura, S., Woodruff, D.P.: Optimal random sampling from distributed streams revisited. In: Peleg, D. (ed.) Distributed Computing. LNCS, vol. 6950, pp. 283–297. Springer, Heidelberg (2011) CrossRef Tirthapura, S., Woodruff, D.P.: Optimal random sampling from distributed streams revisited. In: Peleg, D. (ed.) Distributed Computing. LNCS, vol. 6950, pp. 283–297. Springer, Heidelberg (2011) CrossRef
Metadaten
Titel
Weighted Random Sampling over Data Streams
verfasst von
Pavlos S. Efraimidis
Copyright-Jahr
2015
DOI
https://doi.org/10.1007/978-3-319-24024-4_12