2015 | OriginalPaper | Buchkapitel
Tipp
Weitere Kapitel dieses Buchs durch Wischen aufrufen
Erschienen in:
Algorithms, Probability, Networks, and Games
In this work, we present a comprehensive treatment of weighted random sampling (WRS) over data streams. More precisely, we examine two natural interpretations of the item weights, describe an existing algorithm for each case [3, 8], discuss sampling with and without replacement and show adaptations of the algorithms for several WRS problems and evolving data streams.
Bitte loggen Sie sich ein, um Zugang zu diesem Inhalt zu erhalten
Sie möchten Zugang zu diesem Inhalt erhalten? Dann informieren Sie sich jetzt über unsere Produkte:
Anzeige
1.
Zurück zum Zitat Aggarwal, C.C.: On biased reservoir sampling in the presence of stream evolution. In: VLDB 2006: Proceedings of the 32nd International Conference on Very Large Data Bases, pp. 607–618. VLDB Endowment (2006) Aggarwal, C.C.: On biased reservoir sampling in the presence of stream evolution. In: VLDB 2006: Proceedings of the 32nd International Conference on Very Large Data Bases, pp. 607–618. VLDB Endowment (2006)
2.
Zurück zum Zitat Al-Kateb, M., Lee, B.S.: Adaptive stratified reservoir sampling over heterogeneous data streams. Inf. Syst. 39, 199–216 (2014) CrossRef Al-Kateb, M., Lee, B.S.: Adaptive stratified reservoir sampling over heterogeneous data streams. Inf. Syst.
39, 199–216 (2014)
CrossRef
3.
Zurück zum Zitat Chao, M.T.: A general purpose unequal probability sampling plan. Biometrika 69(3), 653–656 (1982) MathSciNetCrossRefMATH Chao, M.T.: A general purpose unequal probability sampling plan. Biometrika
69(3), 653–656 (1982)
MathSciNetCrossRefMATH
4.
Zurück zum Zitat Cormode, G., Muthukrishnan, S., Yi, K., Zhang, Q.: Optimal sampling from distributed streams. In: Proceedings of the Twenty-ninth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2010, pp. 77–86. ACM, New York (2010) Cormode, G., Muthukrishnan, S., Yi, K., Zhang, Q.: Optimal sampling from distributed streams. In: Proceedings of the Twenty-ninth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2010, pp. 77–86. ACM, New York (2010)
5.
Zurück zum Zitat Cormode, G., Muthukrishnan, S., Yi, K., Zhang, Q.: Continuous sampling from distributed streams. J. ACM 59(2), 10:1–10:25 (2012) MathSciNetCrossRefMATH Cormode, G., Muthukrishnan, S., Yi, K., Zhang, Q.: Continuous sampling from distributed streams. J. ACM
59(2), 10:1–10:25 (2012)
MathSciNetCrossRefMATH
6.
Zurück zum Zitat Cormode, G., Shkapenyuk, V., Srivastava, D., Xu, B.: Forward decay: a practical time decay model for streaming systems. In: Proceedings of the 2009 IEEE International Conference on Data Engineering, ICDE 2009, pp. 138–149. IEEE Computer Society, Washington, DC (2009) Cormode, G., Shkapenyuk, V., Srivastava, D., Xu, B.: Forward decay: a practical time decay model for streaming systems. In: Proceedings of the 2009 IEEE International Conference on Data Engineering, ICDE 2009, pp. 138–149. IEEE Computer Society, Washington, DC (2009)
7.
Zurück zum Zitat Devroye, L.: Non-Uniform Random Variate Generation. Springer, New York (1986) CrossRefMATH Devroye, L.: Non-Uniform Random Variate Generation. Springer, New York (1986)
CrossRefMATH
8.
Zurück zum Zitat Efraimidis, P.S., Spirakis, P.G.: Weighted random sampling with a reservoir. Inf. Process. Lett. 97(5), 181–185 (2006) MathSciNetCrossRefMATH Efraimidis, P.S., Spirakis, P.G.: Weighted random sampling with a reservoir. Inf. Process. Lett.
97(5), 181–185 (2006)
MathSciNetCrossRefMATH
9.
Zurück zum Zitat Goldberg, G., Harnik, D., Sotnikov, D.: The case for sampling on very large file systems. In: 30th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–11, June 2014 Goldberg, G., Harnik, D., Sotnikov, D.: The case for sampling on very large file systems. In: 30th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–11, June 2014
10.
Zurück zum Zitat Hu, X., Qiao, M., Tao, Y.: Independent range sampling. In: Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2014, pp. 246–255. ACM, New York (2014) Hu, X., Qiao, M., Tao, Y.: Independent range sampling. In: Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2014, pp. 246–255. ACM, New York (2014)
11.
Zurück zum Zitat Knuth, D.E.: The Art of Computer Programming: Seminumerical Algorithms, vol. 2, 2nd edn. Addison-Wesley Publishing Company, Reading (1981) MATH Knuth, D.E.: The Art of Computer Programming: Seminumerical Algorithms, vol. 2, 2nd edn. Addison-Wesley Publishing Company, Reading (1981)
MATH
12.
Zurück zum Zitat Li, K.-H.: Reservoir-sampling algorithms of time complexity o(n(1 + log(n/n))). ACM Trans. Math. Softw. 20(4), 481–493 (1994) CrossRefMATH Li, K.-H.: Reservoir-sampling algorithms of time complexity o(n(1 + log(n/n))). ACM Trans. Math. Softw.
20(4), 481–493 (1994)
CrossRefMATH
13.
Zurück zum Zitat Longbo, Z., Zhanhuai, L., Yiqiang, Z., Min, Y., Yang, Z.: A priority random sampling algorithm for time-based sliding windows over weighted streaming data. In: Proceedings of the 2007 ACM Symposium on Applied Computing, SAC 2007, pp. 453–456. ACM, New York (2007) Longbo, Z., Zhanhuai, L., Yiqiang, Z., Min, Y., Yang, Z.: A priority random sampling algorithm for time-based sliding windows over weighted streaming data. In: Proceedings of the 2007 ACM Symposium on Applied Computing, SAC 2007, pp. 453–456. ACM, New York (2007)
14.
Zurück zum Zitat Olken, F.: Random sampling from databases. Ph.D. thesis, Department of Computer Science, University of California at Berkeley (1993) Olken, F.: Random sampling from databases. Ph.D. thesis, Department of Computer Science, University of California at Berkeley (1993)
15.
Zurück zum Zitat Tirthapura, S., Woodruff, D.P.: Optimal random sampling from distributed streams revisited. In: Peleg, D. (ed.) Distributed Computing. LNCS, vol. 6950, pp. 283–297. Springer, Heidelberg (2011) CrossRef Tirthapura, S., Woodruff, D.P.: Optimal random sampling from distributed streams revisited. In: Peleg, D. (ed.) Distributed Computing. LNCS, vol. 6950, pp. 283–297. Springer, Heidelberg (2011)
CrossRef
16.
Zurück zum Zitat Vitter, J.S.: Faster methods for random sampling. Commun. ACM 27(7), 703–718 (1984) MathSciNetCrossRefMATH Vitter, J.S.: Faster methods for random sampling. Commun. ACM
27(7), 703–718 (1984)
MathSciNetCrossRefMATH
17.
Zurück zum Zitat Vitter, J.S.: Random sampling with a reservoir. ACM Trans. Math. Softw. 11(1), 37–57 (1985) MathSciNetCrossRefMATH Vitter, J.S.: Random sampling with a reservoir. ACM Trans. Math. Softw.
11(1), 37–57 (1985)
MathSciNetCrossRefMATH
18.
Zurück zum Zitat WRS.: A stream sampler for weighted random sampling. https://euclid.ee.duth.gr/demo/wrs/ WRS.: A stream sampler for weighted random sampling.
https://euclid.ee.duth.gr/demo/wrs/
- Titel
- Weighted Random Sampling over Data Streams
- DOI
- https://doi.org/10.1007/978-3-319-24024-4_12
- Autor:
-
Pavlos S. Efraimidis
- Sequenznummer
- 12