2015 | OriginalPaper | Chapter
Hint
Swipe to navigate through the chapters of this book
Published in:
Algorithms, Probability, Networks, and Games
In this work, we present a comprehensive treatment of weighted random sampling (WRS) over data streams. More precisely, we examine two natural interpretations of the item weights, describe an existing algorithm for each case [3, 8], discuss sampling with and without replacement and show adaptations of the algorithms for several WRS problems and evolving data streams.
Please log in to get access to this content
To get access to this content you need the following product:
Advertisement
1.
go back to reference Aggarwal, C.C.: On biased reservoir sampling in the presence of stream evolution. In: VLDB 2006: Proceedings of the 32nd International Conference on Very Large Data Bases, pp. 607–618. VLDB Endowment (2006) Aggarwal, C.C.: On biased reservoir sampling in the presence of stream evolution. In: VLDB 2006: Proceedings of the 32nd International Conference on Very Large Data Bases, pp. 607–618. VLDB Endowment (2006)
2.
go back to reference Al-Kateb, M., Lee, B.S.: Adaptive stratified reservoir sampling over heterogeneous data streams. Inf. Syst. 39, 199–216 (2014) CrossRef Al-Kateb, M., Lee, B.S.: Adaptive stratified reservoir sampling over heterogeneous data streams. Inf. Syst.
39, 199–216 (2014)
CrossRef
3.
go back to reference Chao, M.T.: A general purpose unequal probability sampling plan. Biometrika 69(3), 653–656 (1982) MathSciNetCrossRefMATH Chao, M.T.: A general purpose unequal probability sampling plan. Biometrika
69(3), 653–656 (1982)
MathSciNetCrossRefMATH
4.
go back to reference Cormode, G., Muthukrishnan, S., Yi, K., Zhang, Q.: Optimal sampling from distributed streams. In: Proceedings of the Twenty-ninth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2010, pp. 77–86. ACM, New York (2010) Cormode, G., Muthukrishnan, S., Yi, K., Zhang, Q.: Optimal sampling from distributed streams. In: Proceedings of the Twenty-ninth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2010, pp. 77–86. ACM, New York (2010)
5.
go back to reference Cormode, G., Muthukrishnan, S., Yi, K., Zhang, Q.: Continuous sampling from distributed streams. J. ACM 59(2), 10:1–10:25 (2012) MathSciNetCrossRefMATH Cormode, G., Muthukrishnan, S., Yi, K., Zhang, Q.: Continuous sampling from distributed streams. J. ACM
59(2), 10:1–10:25 (2012)
MathSciNetCrossRefMATH
6.
go back to reference Cormode, G., Shkapenyuk, V., Srivastava, D., Xu, B.: Forward decay: a practical time decay model for streaming systems. In: Proceedings of the 2009 IEEE International Conference on Data Engineering, ICDE 2009, pp. 138–149. IEEE Computer Society, Washington, DC (2009) Cormode, G., Shkapenyuk, V., Srivastava, D., Xu, B.: Forward decay: a practical time decay model for streaming systems. In: Proceedings of the 2009 IEEE International Conference on Data Engineering, ICDE 2009, pp. 138–149. IEEE Computer Society, Washington, DC (2009)
7.
go back to reference Devroye, L.: Non-Uniform Random Variate Generation. Springer, New York (1986) CrossRefMATH Devroye, L.: Non-Uniform Random Variate Generation. Springer, New York (1986)
CrossRefMATH
8.
go back to reference Efraimidis, P.S., Spirakis, P.G.: Weighted random sampling with a reservoir. Inf. Process. Lett. 97(5), 181–185 (2006) MathSciNetCrossRefMATH Efraimidis, P.S., Spirakis, P.G.: Weighted random sampling with a reservoir. Inf. Process. Lett.
97(5), 181–185 (2006)
MathSciNetCrossRefMATH
9.
go back to reference Goldberg, G., Harnik, D., Sotnikov, D.: The case for sampling on very large file systems. In: 30th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–11, June 2014 Goldberg, G., Harnik, D., Sotnikov, D.: The case for sampling on very large file systems. In: 30th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–11, June 2014
10.
go back to reference Hu, X., Qiao, M., Tao, Y.: Independent range sampling. In: Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2014, pp. 246–255. ACM, New York (2014) Hu, X., Qiao, M., Tao, Y.: Independent range sampling. In: Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2014, pp. 246–255. ACM, New York (2014)
11.
go back to reference Knuth, D.E.: The Art of Computer Programming: Seminumerical Algorithms, vol. 2, 2nd edn. Addison-Wesley Publishing Company, Reading (1981) MATH Knuth, D.E.: The Art of Computer Programming: Seminumerical Algorithms, vol. 2, 2nd edn. Addison-Wesley Publishing Company, Reading (1981)
MATH
12.
go back to reference Li, K.-H.: Reservoir-sampling algorithms of time complexity o(n(1 + log(n/n))). ACM Trans. Math. Softw. 20(4), 481–493 (1994) CrossRefMATH Li, K.-H.: Reservoir-sampling algorithms of time complexity o(n(1 + log(n/n))). ACM Trans. Math. Softw.
20(4), 481–493 (1994)
CrossRefMATH
13.
go back to reference Longbo, Z., Zhanhuai, L., Yiqiang, Z., Min, Y., Yang, Z.: A priority random sampling algorithm for time-based sliding windows over weighted streaming data. In: Proceedings of the 2007 ACM Symposium on Applied Computing, SAC 2007, pp. 453–456. ACM, New York (2007) Longbo, Z., Zhanhuai, L., Yiqiang, Z., Min, Y., Yang, Z.: A priority random sampling algorithm for time-based sliding windows over weighted streaming data. In: Proceedings of the 2007 ACM Symposium on Applied Computing, SAC 2007, pp. 453–456. ACM, New York (2007)
14.
go back to reference Olken, F.: Random sampling from databases. Ph.D. thesis, Department of Computer Science, University of California at Berkeley (1993) Olken, F.: Random sampling from databases. Ph.D. thesis, Department of Computer Science, University of California at Berkeley (1993)
15.
go back to reference Tirthapura, S., Woodruff, D.P.: Optimal random sampling from distributed streams revisited. In: Peleg, D. (ed.) Distributed Computing. LNCS, vol. 6950, pp. 283–297. Springer, Heidelberg (2011) CrossRef Tirthapura, S., Woodruff, D.P.: Optimal random sampling from distributed streams revisited. In: Peleg, D. (ed.) Distributed Computing. LNCS, vol. 6950, pp. 283–297. Springer, Heidelberg (2011)
CrossRef
16.
go back to reference Vitter, J.S.: Faster methods for random sampling. Commun. ACM 27(7), 703–718 (1984) MathSciNetCrossRefMATH Vitter, J.S.: Faster methods for random sampling. Commun. ACM
27(7), 703–718 (1984)
MathSciNetCrossRefMATH
17.
go back to reference Vitter, J.S.: Random sampling with a reservoir. ACM Trans. Math. Softw. 11(1), 37–57 (1985) MathSciNetCrossRefMATH Vitter, J.S.: Random sampling with a reservoir. ACM Trans. Math. Softw.
11(1), 37–57 (1985)
MathSciNetCrossRefMATH
18.
go back to reference WRS.: A stream sampler for weighted random sampling. https://euclid.ee.duth.gr/demo/wrs/ WRS.: A stream sampler for weighted random sampling.
https://euclid.ee.duth.gr/demo/wrs/
- Title
- Weighted Random Sampling over Data Streams
- DOI
- https://doi.org/10.1007/978-3-319-24024-4_12
- Author:
-
Pavlos S. Efraimidis
- Publisher
- Springer International Publishing
- Sequence number
- 12