Skip to main content

2020 | OriginalPaper | Buchkapitel

Load-Aware Shedding in Stream Processing Systems

verfasst von : Nicoló Rivetti, Yann Busnel, Leonardo Querzoni

Erschienen in: Transactions on Large-Scale Data- and Knowledge-Centered Systems XLVI

Verlag: Springer Berlin Heidelberg

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Distributed stream processing systems are today gaining momentum as a tool to perform analytics on continuous data streams. Load shedding is a technique used to handle unpredictable spikes in the input load whenever available computing resources are not adequately provisioned. In this paper, we propose Load-Aware Shedding (LAS), a novel load shedding solution that, unlike previous works, does not rely neither on a pre-defined cost model nor on any assumption on the tuple execution duration. Leveraging sketches, LAS efficiently estimates the execution duration of each tuple with small error bounds and uses this knowledge to proactively shed input streams at any operator to limiting queuing latencies while dropping as few tuples as possible. We provide a theoretical analysis proving that LAS is an \(({\varepsilon }, \delta )\)-approximation of the optimal online load shedder. Furthermore, through an extensive practical evaluation based on simulations and a prototype, we evaluate its impact on stream processing applications.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
1
In the data streaming literature, the frequency is the number of occurrences not divided by time, which differs from the classical (physics) definition  [17].
 
2
This is not the only possible definition of the load shedding problem. Other variants are briefly discussed in Sect. 6.
 
3
This correction factor derives from the fact that \(\hat{w}(t)\) is a \((\varepsilon ,\delta )\)-approximation of w(t) as shown in Sect. 4.
 
4
For readability reasons, proofs of these theorems are available in Appendix A.
 
Literatur
1.
Zurück zum Zitat Abadi, D.J., et al.: Aurora: a new model and architecture for data stream management. Int. J. Very Large Data Bases (VLDB J.) 12(2), 120–139 (2003)CrossRef Abadi, D.J., et al.: Aurora: a new model and architecture for data stream management. Int. J. Very Large Data Bases (VLDB J.) 12(2), 120–139 (2003)CrossRef
2.
Zurück zum Zitat Babcock, B., Datar, M., Motwani, R.: Load shedding for aggregation queries over data streams. In: Proceedings of the 20th International Conference on Data Engineering (ICDE 2004), pp. 350–361. IEEE (2004) Babcock, B., Datar, M., Motwani, R.: Load shedding for aggregation queries over data streams. In: Proceedings of the 20th International Conference on Data Engineering (ICDE 2004), pp. 350–361. IEEE (2004)
3.
Zurück zum Zitat Borkowski, M., Hochreiner, C., Schulte, S.: Minimizing cost by reducing scaling operations in distributed stream processing. Proc. VLDB Endow. 12(7), 724–737 (2019)CrossRef Borkowski, M., Hochreiner, C., Schulte, S.: Minimizing cost by reducing scaling operations in distributed stream processing. Proc. VLDB Endow. 12(7), 724–737 (2019)CrossRef
4.
5.
Zurück zum Zitat Cormode., G.: Sketch techniques for approximate query processing. In: Synposes for Approximate Query Processing: Samples, Histograms, Wavelets and Sketches, Foundations and Trends in Databases. NOW Publishers (2011) Cormode., G.: Sketch techniques for approximate query processing. In: Synposes for Approximate Query Processing: Samples, Histograms, Wavelets and Sketches, Foundations and Trends in Databases. NOW Publishers (2011)
6.
Zurück zum Zitat Cormode, G., Muthukrishnan, S.: An improved data stream summary: the count-min sketch and its applications. J. Algorithms 55, 58–75 (2005)MathSciNetCrossRef Cormode, G., Muthukrishnan, S.: An improved data stream summary: the count-min sketch and its applications. J. Algorithms 55, 58–75 (2005)MathSciNetCrossRef
8.
Zurück zum Zitat Gedik, B., Wu, K., Yu, P.S., Liu, L.: GrubJoin: an adaptive, multi-way, windowed stream join with time correlation-aware CPU load shedding. IEEE Trans. Knowl. Data Eng. 19(10), 1363–1380 (2007)CrossRef Gedik, B., Wu, K., Yu, P.S., Liu, L.: GrubJoin: an adaptive, multi-way, windowed stream join with time correlation-aware CPU load shedding. IEEE Trans. Knowl. Data Eng. 19(10), 1363–1380 (2007)CrossRef
9.
10.
Zurück zum Zitat He, Y., Barman, S., Naughton, J.F.: On load shedding in complex event processing. In: Proceedings of the 17th International Conference on Database Theory (ICDT 2014), pp. 213–224 (2014). OpenProceedings.org He, Y., Barman, S., Naughton, J.F.: On load shedding in complex event processing. In: Proceedings of the 17th International Conference on Database Theory (ICDT 2014), pp. 213–224 (2014). OpenProceedings.​org
11.
Zurück zum Zitat Heinze, T., Aniello, L., Querzoni, L., Jerzak, Z.: Cloud-based data stream processing. In: Proceedings of the 8th ACM International Conference on Distributed Event-Based Systems (DEBS 2014), pp. 238–245. ACM (2014) Heinze, T., Aniello, L., Querzoni, L., Jerzak, Z.: Cloud-based data stream processing. In: Proceedings of the 8th ACM International Conference on Distributed Event-Based Systems (DEBS 2014), pp. 238–245. ACM (2014)
12.
Zurück zum Zitat Ilarri, S., Wolfson, O., Mena, E., Illarramendi, A., Sistla, P.: A query processor for prediction-based monitoring of data streams. In: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, EDBT 2009, pp. 415–426. Association for Computing Machinery, New York (2009) Ilarri, S., Wolfson, O., Mena, E., Illarramendi, A., Sistla, P.: A query processor for prediction-based monitoring of data streams. In: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, EDBT 2009, pp. 415–426. Association for Computing Machinery, New York (2009)
13.
Zurück zum Zitat Kalyvianaki, E., Charalambous, T., Fiscato, M., Pietzuch, P.: Overload management in data stream processing systems with latency guarantees. In: 7th IEEE International Workshop on Feedback Computing (Feedback Computing 2012) (2012) Kalyvianaki, E., Charalambous, T., Fiscato, M., Pietzuch, P.: Overload management in data stream processing systems with latency guarantees. In: 7th IEEE International Workshop on Feedback Computing (Feedback Computing 2012) (2012)
14.
Zurück zum Zitat Kalyvianaki, E., Fiscato, M., Salonidis, T., Pietzuch, P.: THEMIS: fairness in federated stream processing under overload. In: Proceedings of the 2016 International Conference on Management of Data, pp. 541–553. ACM (2016) Kalyvianaki, E., Fiscato, M., Salonidis, T., Pietzuch, P.: THEMIS: fairness in federated stream processing under overload. In: Proceedings of the 2016 International Conference on Management of Data, pp. 541–553. ACM (2016)
15.
Zurück zum Zitat Kammoun, A.: Enhancing stream processing and complex event processing systems. Ph.D. thesis, Université Jean Monnet, Saint-Etienne (2019) Kammoun, A.: Enhancing stream processing and complex event processing systems. Ph.D. thesis, Université Jean Monnet, Saint-Etienne (2019)
16.
Zurück zum Zitat Katsipoulakis, N.R., Labrinidis, A., Chrysanthis, P.K.: Concept-driven load shedding: reducing size and error of voluminous and variable data streams. In: 2018 IEEE International Conference on Big Data (Big Data), pp. 418–427 (2018) Katsipoulakis, N.R., Labrinidis, A., Chrysanthis, P.K.: Concept-driven load shedding: reducing size and error of voluminous and variable data streams. In: 2018 IEEE International Conference on Big Data (Big Data), pp. 418–427 (2018)
17.
Zurück zum Zitat Muthukrishnan, S.: Data Streams: Algorithms and Applications. Now Publishers Inc. (2005) Muthukrishnan, S.: Data Streams: Algorithms and Applications. Now Publishers Inc. (2005)
18.
Zurück zum Zitat Olston, C., Jiang, J., Widom, J.: Adaptive filters for continuous queries over distributed data streams. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, SIGMOD 2003, pp. 563–574. Association for Computing Machinery, New York (2003) Olston, C., Jiang, J., Widom, J.: Adaptive filters for continuous queries over distributed data streams. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, SIGMOD 2003, pp. 563–574. Association for Computing Machinery, New York (2003)
19.
Zurück zum Zitat Quoc, D.L., Chen, R., Bhatotia, P., Fetzer, C., Hilt, V., Strufe, T.: StreamApprox: approximate computing for stream analytics. In: Proceedings of the 18th ACM/IFIP/USENIX Middleware Conference, Middleware 2017, pp. 185–197. Association for Computing Machinery, New York (2017) Quoc, D.L., Chen, R., Bhatotia, P., Fetzer, C., Hilt, V., Strufe, T.: StreamApprox: approximate computing for stream analytics. In: Proceedings of the 18th ACM/IFIP/USENIX Middleware Conference, Middleware 2017, pp. 185–197. Association for Computing Machinery, New York (2017)
20.
Zurück zum Zitat Reiss, F., Hellerstein, J.M.: Data triage: an adaptive architecture for load shedding in TelegraphCQ. In: Proceedings of the 21st International Conference on Data Engineering (ICDE 2005), pp. 155–156. IEEE (2005) Reiss, F., Hellerstein, J.M.: Data triage: an adaptive architecture for load shedding in TelegraphCQ. In: Proceedings of the 21st International Conference on Data Engineering (ICDE 2005), pp. 155–156. IEEE (2005)
21.
Zurück zum Zitat Rivetti, N., Busnel, Y., Mostefaoui, A.: Efficiently summarizing data streams over sliding windows. In: Proceedings of the 14th IEEE International Symposium on Network Computing and Applications (NCA 2015), Boston, USA, Best Student Paper Award, September 2015 Rivetti, N., Busnel, Y., Mostefaoui, A.: Efficiently summarizing data streams over sliding windows. In: Proceedings of the 14th IEEE International Symposium on Network Computing and Applications (NCA 2015), Boston, USA, Best Student Paper Award, September 2015
22.
Zurück zum Zitat Slo, A., Bhowmik, S., Flaig, A., Rothermel, K.: pSPICE: partial match shedding for complex event processing. In: 2019 IEEE International Conference on Big Data (Big Data), pp. 372–382. IEEE (2019) Slo, A., Bhowmik, S., Flaig, A., Rothermel, K.: pSPICE: partial match shedding for complex event processing. In: 2019 IEEE International Conference on Big Data (Big Data), pp. 372–382. IEEE (2019)
23.
Zurück zum Zitat Slo, A., Bhowmik, S., Rothermel, K.: eSPICE: probabilistic load shedding from input event streams in complex event processing. In: Proceedings of the 20th International Middleware Conference, pp. 215–227 (2019) Slo, A., Bhowmik, S., Rothermel, K.: eSPICE: probabilistic load shedding from input event streams in complex event processing. In: Proceedings of the 20th International Middleware Conference, pp. 215–227 (2019)
24.
Zurück zum Zitat Stanoi, I., Mihaila, G., Palpanas, T., Lang, C.: WhiteWater: distributed processing of fast streams. IEEE Trans. Knowl. Data Eng. 19(9), 1214–1226 (2007)CrossRef Stanoi, I., Mihaila, G., Palpanas, T., Lang, C.: WhiteWater: distributed processing of fast streams. IEEE Trans. Knowl. Data Eng. 19(9), 1214–1226 (2007)CrossRef
25.
Zurück zum Zitat Tatbul, N., Çetintemel, U., Zdonik, S.: Staying fit: efficient load shedding techniques for distributed stream processing. In: Proceedings of the 33rd International Conference on Very Large Data Bases, pp. 159–170. VLDB Endowment (2007) Tatbul, N., Çetintemel, U., Zdonik, S.: Staying fit: efficient load shedding techniques for distributed stream processing. In: Proceedings of the 33rd International Conference on Very Large Data Bases, pp. 159–170. VLDB Endowment (2007)
26.
Zurück zum Zitat Tatbul, N., Çetintemel, U., Zdonik, S., Cherniack, M., Stonebraker, M.: Load shedding in a data stream manager. In: Proceedings of the 29th International Conference on Very Large Data Bases (VLDB 2003), pp. 309–320. VLDB Endowment (2003) Tatbul, N., Çetintemel, U., Zdonik, S., Cherniack, M., Stonebraker, M.: Load shedding in a data stream manager. In: Proceedings of the 29th International Conference on Very Large Data Bases (VLDB 2003), pp. 309–320. VLDB Endowment (2003)
28.
Zurück zum Zitat Tok, W.H., Bressan, S., Lee., M.-L.: A stratified approach to progressive approximate joins. In: Proceedings of the 11th International Conference on Extending Database Technology: Advances in Database Technology, EDBT 2008, pp. 582–593. Association for Computing Machinery, New York (2008) Tok, W.H., Bressan, S., Lee., M.-L.: A stratified approach to progressive approximate joins. In: Proceedings of the 11th International Conference on Extending Database Technology: Advances in Database Technology, EDBT 2008, pp. 582–593. Association for Computing Machinery, New York (2008)
29.
Zurück zum Zitat Tu, Y.-C., Liu, S., Prabhakar, S., Yao, B.: Load shedding in stream databases: a control-based approach. In: Proceedings of the 32nd International Conference on Very Large Data Bases (VLDB 2006), pp. 787–798. VLDB Endowment (2006) Tu, Y.-C., Liu, S., Prabhakar, S., Yao, B.: Load shedding in stream databases: a control-based approach. In: Proceedings of the 32nd International Conference on Very Large Data Bases (VLDB 2006), pp. 787–798. VLDB Endowment (2006)
30.
Zurück zum Zitat Zhang, Y., Huang, C., Huang, C.: A novel adaptive load shedding scheme for data stream processing. In: Future Generation Communication and Networking (FGCN 2007), pp. 378–384. IEEE (2007) Zhang, Y., Huang, C., Huang, C.: A novel adaptive load shedding scheme for data stream processing. In: Future Generation Communication and Networking (FGCN 2007), pp. 378–384. IEEE (2007)
Metadaten
Titel
Load-Aware Shedding in Stream Processing Systems
verfasst von
Nicoló Rivetti
Yann Busnel
Leonardo Querzoni
Copyright-Jahr
2020
Verlag
Springer Berlin Heidelberg
DOI
https://doi.org/10.1007/978-3-662-62386-2_5