Skip to main content
Top

2020 | OriginalPaper | Chapter

Load-Aware Shedding in Stream Processing Systems

Authors : Nicoló Rivetti, Yann Busnel, Leonardo Querzoni

Published in: Transactions on Large-Scale Data- and Knowledge-Centered Systems XLVI

Publisher: Springer Berlin Heidelberg

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Distributed stream processing systems are today gaining momentum as a tool to perform analytics on continuous data streams. Load shedding is a technique used to handle unpredictable spikes in the input load whenever available computing resources are not adequately provisioned. In this paper, we propose Load-Aware Shedding (LAS), a novel load shedding solution that, unlike previous works, does not rely neither on a pre-defined cost model nor on any assumption on the tuple execution duration. Leveraging sketches, LAS efficiently estimates the execution duration of each tuple with small error bounds and uses this knowledge to proactively shed input streams at any operator to limiting queuing latencies while dropping as few tuples as possible. We provide a theoretical analysis proving that LAS is an \(({\varepsilon }, \delta )\)-approximation of the optimal online load shedder. Furthermore, through an extensive practical evaluation based on simulations and a prototype, we evaluate its impact on stream processing applications.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Footnotes
1
In the data streaming literature, the frequency is the number of occurrences not divided by time, which differs from the classical (physics) definition  [17].
 
2
This is not the only possible definition of the load shedding problem. Other variants are briefly discussed in Sect. 6.
 
3
This correction factor derives from the fact that \(\hat{w}(t)\) is a \((\varepsilon ,\delta )\)-approximation of w(t) as shown in Sect. 4.
 
4
For readability reasons, proofs of these theorems are available in Appendix A.
 
Literature
1.
go back to reference Abadi, D.J., et al.: Aurora: a new model and architecture for data stream management. Int. J. Very Large Data Bases (VLDB J.) 12(2), 120–139 (2003)CrossRef Abadi, D.J., et al.: Aurora: a new model and architecture for data stream management. Int. J. Very Large Data Bases (VLDB J.) 12(2), 120–139 (2003)CrossRef
2.
go back to reference Babcock, B., Datar, M., Motwani, R.: Load shedding for aggregation queries over data streams. In: Proceedings of the 20th International Conference on Data Engineering (ICDE 2004), pp. 350–361. IEEE (2004) Babcock, B., Datar, M., Motwani, R.: Load shedding for aggregation queries over data streams. In: Proceedings of the 20th International Conference on Data Engineering (ICDE 2004), pp. 350–361. IEEE (2004)
3.
go back to reference Borkowski, M., Hochreiner, C., Schulte, S.: Minimizing cost by reducing scaling operations in distributed stream processing. Proc. VLDB Endow. 12(7), 724–737 (2019)CrossRef Borkowski, M., Hochreiner, C., Schulte, S.: Minimizing cost by reducing scaling operations in distributed stream processing. Proc. VLDB Endow. 12(7), 724–737 (2019)CrossRef
4.
5.
go back to reference Cormode., G.: Sketch techniques for approximate query processing. In: Synposes for Approximate Query Processing: Samples, Histograms, Wavelets and Sketches, Foundations and Trends in Databases. NOW Publishers (2011) Cormode., G.: Sketch techniques for approximate query processing. In: Synposes for Approximate Query Processing: Samples, Histograms, Wavelets and Sketches, Foundations and Trends in Databases. NOW Publishers (2011)
6.
go back to reference Cormode, G., Muthukrishnan, S.: An improved data stream summary: the count-min sketch and its applications. J. Algorithms 55, 58–75 (2005)MathSciNetCrossRef Cormode, G., Muthukrishnan, S.: An improved data stream summary: the count-min sketch and its applications. J. Algorithms 55, 58–75 (2005)MathSciNetCrossRef
8.
go back to reference Gedik, B., Wu, K., Yu, P.S., Liu, L.: GrubJoin: an adaptive, multi-way, windowed stream join with time correlation-aware CPU load shedding. IEEE Trans. Knowl. Data Eng. 19(10), 1363–1380 (2007)CrossRef Gedik, B., Wu, K., Yu, P.S., Liu, L.: GrubJoin: an adaptive, multi-way, windowed stream join with time correlation-aware CPU load shedding. IEEE Trans. Knowl. Data Eng. 19(10), 1363–1380 (2007)CrossRef
10.
go back to reference He, Y., Barman, S., Naughton, J.F.: On load shedding in complex event processing. In: Proceedings of the 17th International Conference on Database Theory (ICDT 2014), pp. 213–224 (2014). OpenProceedings.org He, Y., Barman, S., Naughton, J.F.: On load shedding in complex event processing. In: Proceedings of the 17th International Conference on Database Theory (ICDT 2014), pp. 213–224 (2014). OpenProceedings.​org
11.
go back to reference Heinze, T., Aniello, L., Querzoni, L., Jerzak, Z.: Cloud-based data stream processing. In: Proceedings of the 8th ACM International Conference on Distributed Event-Based Systems (DEBS 2014), pp. 238–245. ACM (2014) Heinze, T., Aniello, L., Querzoni, L., Jerzak, Z.: Cloud-based data stream processing. In: Proceedings of the 8th ACM International Conference on Distributed Event-Based Systems (DEBS 2014), pp. 238–245. ACM (2014)
12.
go back to reference Ilarri, S., Wolfson, O., Mena, E., Illarramendi, A., Sistla, P.: A query processor for prediction-based monitoring of data streams. In: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, EDBT 2009, pp. 415–426. Association for Computing Machinery, New York (2009) Ilarri, S., Wolfson, O., Mena, E., Illarramendi, A., Sistla, P.: A query processor for prediction-based monitoring of data streams. In: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, EDBT 2009, pp. 415–426. Association for Computing Machinery, New York (2009)
13.
go back to reference Kalyvianaki, E., Charalambous, T., Fiscato, M., Pietzuch, P.: Overload management in data stream processing systems with latency guarantees. In: 7th IEEE International Workshop on Feedback Computing (Feedback Computing 2012) (2012) Kalyvianaki, E., Charalambous, T., Fiscato, M., Pietzuch, P.: Overload management in data stream processing systems with latency guarantees. In: 7th IEEE International Workshop on Feedback Computing (Feedback Computing 2012) (2012)
14.
go back to reference Kalyvianaki, E., Fiscato, M., Salonidis, T., Pietzuch, P.: THEMIS: fairness in federated stream processing under overload. In: Proceedings of the 2016 International Conference on Management of Data, pp. 541–553. ACM (2016) Kalyvianaki, E., Fiscato, M., Salonidis, T., Pietzuch, P.: THEMIS: fairness in federated stream processing under overload. In: Proceedings of the 2016 International Conference on Management of Data, pp. 541–553. ACM (2016)
15.
go back to reference Kammoun, A.: Enhancing stream processing and complex event processing systems. Ph.D. thesis, Université Jean Monnet, Saint-Etienne (2019) Kammoun, A.: Enhancing stream processing and complex event processing systems. Ph.D. thesis, Université Jean Monnet, Saint-Etienne (2019)
16.
go back to reference Katsipoulakis, N.R., Labrinidis, A., Chrysanthis, P.K.: Concept-driven load shedding: reducing size and error of voluminous and variable data streams. In: 2018 IEEE International Conference on Big Data (Big Data), pp. 418–427 (2018) Katsipoulakis, N.R., Labrinidis, A., Chrysanthis, P.K.: Concept-driven load shedding: reducing size and error of voluminous and variable data streams. In: 2018 IEEE International Conference on Big Data (Big Data), pp. 418–427 (2018)
17.
go back to reference Muthukrishnan, S.: Data Streams: Algorithms and Applications. Now Publishers Inc. (2005) Muthukrishnan, S.: Data Streams: Algorithms and Applications. Now Publishers Inc. (2005)
18.
go back to reference Olston, C., Jiang, J., Widom, J.: Adaptive filters for continuous queries over distributed data streams. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, SIGMOD 2003, pp. 563–574. Association for Computing Machinery, New York (2003) Olston, C., Jiang, J., Widom, J.: Adaptive filters for continuous queries over distributed data streams. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, SIGMOD 2003, pp. 563–574. Association for Computing Machinery, New York (2003)
19.
go back to reference Quoc, D.L., Chen, R., Bhatotia, P., Fetzer, C., Hilt, V., Strufe, T.: StreamApprox: approximate computing for stream analytics. In: Proceedings of the 18th ACM/IFIP/USENIX Middleware Conference, Middleware 2017, pp. 185–197. Association for Computing Machinery, New York (2017) Quoc, D.L., Chen, R., Bhatotia, P., Fetzer, C., Hilt, V., Strufe, T.: StreamApprox: approximate computing for stream analytics. In: Proceedings of the 18th ACM/IFIP/USENIX Middleware Conference, Middleware 2017, pp. 185–197. Association for Computing Machinery, New York (2017)
20.
go back to reference Reiss, F., Hellerstein, J.M.: Data triage: an adaptive architecture for load shedding in TelegraphCQ. In: Proceedings of the 21st International Conference on Data Engineering (ICDE 2005), pp. 155–156. IEEE (2005) Reiss, F., Hellerstein, J.M.: Data triage: an adaptive architecture for load shedding in TelegraphCQ. In: Proceedings of the 21st International Conference on Data Engineering (ICDE 2005), pp. 155–156. IEEE (2005)
21.
go back to reference Rivetti, N., Busnel, Y., Mostefaoui, A.: Efficiently summarizing data streams over sliding windows. In: Proceedings of the 14th IEEE International Symposium on Network Computing and Applications (NCA 2015), Boston, USA, Best Student Paper Award, September 2015 Rivetti, N., Busnel, Y., Mostefaoui, A.: Efficiently summarizing data streams over sliding windows. In: Proceedings of the 14th IEEE International Symposium on Network Computing and Applications (NCA 2015), Boston, USA, Best Student Paper Award, September 2015
22.
go back to reference Slo, A., Bhowmik, S., Flaig, A., Rothermel, K.: pSPICE: partial match shedding for complex event processing. In: 2019 IEEE International Conference on Big Data (Big Data), pp. 372–382. IEEE (2019) Slo, A., Bhowmik, S., Flaig, A., Rothermel, K.: pSPICE: partial match shedding for complex event processing. In: 2019 IEEE International Conference on Big Data (Big Data), pp. 372–382. IEEE (2019)
23.
go back to reference Slo, A., Bhowmik, S., Rothermel, K.: eSPICE: probabilistic load shedding from input event streams in complex event processing. In: Proceedings of the 20th International Middleware Conference, pp. 215–227 (2019) Slo, A., Bhowmik, S., Rothermel, K.: eSPICE: probabilistic load shedding from input event streams in complex event processing. In: Proceedings of the 20th International Middleware Conference, pp. 215–227 (2019)
24.
go back to reference Stanoi, I., Mihaila, G., Palpanas, T., Lang, C.: WhiteWater: distributed processing of fast streams. IEEE Trans. Knowl. Data Eng. 19(9), 1214–1226 (2007)CrossRef Stanoi, I., Mihaila, G., Palpanas, T., Lang, C.: WhiteWater: distributed processing of fast streams. IEEE Trans. Knowl. Data Eng. 19(9), 1214–1226 (2007)CrossRef
25.
go back to reference Tatbul, N., Çetintemel, U., Zdonik, S.: Staying fit: efficient load shedding techniques for distributed stream processing. In: Proceedings of the 33rd International Conference on Very Large Data Bases, pp. 159–170. VLDB Endowment (2007) Tatbul, N., Çetintemel, U., Zdonik, S.: Staying fit: efficient load shedding techniques for distributed stream processing. In: Proceedings of the 33rd International Conference on Very Large Data Bases, pp. 159–170. VLDB Endowment (2007)
26.
go back to reference Tatbul, N., Çetintemel, U., Zdonik, S., Cherniack, M., Stonebraker, M.: Load shedding in a data stream manager. In: Proceedings of the 29th International Conference on Very Large Data Bases (VLDB 2003), pp. 309–320. VLDB Endowment (2003) Tatbul, N., Çetintemel, U., Zdonik, S., Cherniack, M., Stonebraker, M.: Load shedding in a data stream manager. In: Proceedings of the 29th International Conference on Very Large Data Bases (VLDB 2003), pp. 309–320. VLDB Endowment (2003)
28.
go back to reference Tok, W.H., Bressan, S., Lee., M.-L.: A stratified approach to progressive approximate joins. In: Proceedings of the 11th International Conference on Extending Database Technology: Advances in Database Technology, EDBT 2008, pp. 582–593. Association for Computing Machinery, New York (2008) Tok, W.H., Bressan, S., Lee., M.-L.: A stratified approach to progressive approximate joins. In: Proceedings of the 11th International Conference on Extending Database Technology: Advances in Database Technology, EDBT 2008, pp. 582–593. Association for Computing Machinery, New York (2008)
29.
go back to reference Tu, Y.-C., Liu, S., Prabhakar, S., Yao, B.: Load shedding in stream databases: a control-based approach. In: Proceedings of the 32nd International Conference on Very Large Data Bases (VLDB 2006), pp. 787–798. VLDB Endowment (2006) Tu, Y.-C., Liu, S., Prabhakar, S., Yao, B.: Load shedding in stream databases: a control-based approach. In: Proceedings of the 32nd International Conference on Very Large Data Bases (VLDB 2006), pp. 787–798. VLDB Endowment (2006)
30.
go back to reference Zhang, Y., Huang, C., Huang, C.: A novel adaptive load shedding scheme for data stream processing. In: Future Generation Communication and Networking (FGCN 2007), pp. 378–384. IEEE (2007) Zhang, Y., Huang, C., Huang, C.: A novel adaptive load shedding scheme for data stream processing. In: Future Generation Communication and Networking (FGCN 2007), pp. 378–384. IEEE (2007)
Metadata
Title
Load-Aware Shedding in Stream Processing Systems
Authors
Nicoló Rivetti
Yann Busnel
Leonardo Querzoni
Copyright Year
2020
Publisher
Springer Berlin Heidelberg
DOI
https://doi.org/10.1007/978-3-662-62386-2_5

Premium Partner