Skip to main content
Top
Published in: Cluster Computing 2/2020

21-05-2019

Reliable stream data processing for elastic distributed stream processing systems

Authors: Xiaohui Wei, Yuan Zhuang, Hongliang Li, Zhiliang Liu

Published in: Cluster Computing | Issue 2/2020

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Distributed stream processing system (DSPS) has proven to be an effective way to process and analyze large-scale data streams in real-time fashions. The reliability problem of DSPS is becoming a popular topic in recent years. Novel elastic DSPSs provide the ability to seamlessly adapt to stream workload changes, which introduce new reliability challenges: (1) operators can be scaled up and down at runtime, requiring fault tolerant methods to maintain data backup consistency under the runtime dynamics. (2) Rollback recovery to the last checkpoint may undo recent auto-scaling adjustments, which will introduce high cost and unacceptable impact to the system. In this paper, we put forward a novel fault-tolerant mechanism to deal with these issues. In particular, we propose a self-adaptive backup unit, elastic data slice (EDS), that can partition and merge data backups according to operator auto-scaling at runtime. The consistency of recovery is guaranteed by new upstream backup protocols, which restart the system from the status after auto-scaling instead of last checkpoint and avoid high recovery latency. Based on them, we implement a prototype system named SPATE. Evaluations on SPATE show that our mechanism supports auto-scaling changes with similar overhead compared to existing approaches, while achieving low recovery latency despite auto-scaling.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Abadi, D.J., Carney, D., Çetintemel, U., Cherniack, M., Convey, C., Lee, S., Stonebraker, M., Tatbul, N., Zdonik, S.: Aurora: a new model and architecture for data stream management. VLDB J. 12(2), 120–139 (2003)CrossRef Abadi, D.J., Carney, D., Çetintemel, U., Cherniack, M., Convey, C., Lee, S., Stonebraker, M., Tatbul, N., Zdonik, S.: Aurora: a new model and architecture for data stream management. VLDB J. 12(2), 120–139 (2003)CrossRef
2.
go back to reference Balazinska, M., Balakrishnan, H., Madden, S.R., Stonebraker, M.: Fault-tolerance in the borealis distributed stream processing system. ACM Trans. Database Syst. (TODS) 33(1), 3 (2008)CrossRef Balazinska, M., Balakrishnan, H., Madden, S.R., Stonebraker, M.: Fault-tolerance in the borealis distributed stream processing system. ACM Trans. Database Syst. (TODS) 33(1), 3 (2008)CrossRef
3.
go back to reference Brito, A., Martin, A., Knauth, T., Creutz, S., Becker, D., Weigert, S., Fetzer, C.: Scalable and low-latency data processing with stream mapreduce. In: Proceedings of the IEEE Third International Conference on Cloud Computing Technology and Science (CloudCom), 2011 , IEEE, pp. 48–58 (2011) Brito, A., Martin, A., Knauth, T., Creutz, S., Becker, D., Weigert, S., Fetzer, C.: Scalable and low-latency data processing with stream mapreduce. In: Proceedings of the IEEE Third International Conference on Cloud Computing Technology and Science (CloudCom), 2011 , IEEE, pp. 48–58 (2011)
4.
go back to reference Castro Fernandez, R., Migliavacca, M., Kalyvianaki, E., Pietzuch, P.: Integrating scale out and fault tolerance in stream processing using operator state management. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, ACM, pp. 725–736 (2013) Castro Fernandez, R., Migliavacca, M., Kalyvianaki, E., Pietzuch, P.: Integrating scale out and fault tolerance in stream processing using operator state management. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, ACM, pp. 725–736 (2013)
5.
go back to reference Chandrasekaran, S., Cooper, O., Deshpande, A., Franklin, M.J., Hellerstein, J.M., Hong, W., Krishnamurthy, S., Madden, S.R., Reiss, F., Shah, M.A.: Telegraphcq: continuous dataflow processing. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, ACM, pp. 668–668 (2003) Chandrasekaran, S., Cooper, O., Deshpande, A., Franklin, M.J., Hellerstein, J.M., Hong, W., Krishnamurthy, S., Madden, S.R., Reiss, F., Shah, M.A.: Telegraphcq: continuous dataflow processing. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, ACM, pp. 668–668 (2003)
6.
go back to reference Chen, Q., Hsu, M., Malu, C.: Fault tolerant distributed stream processing based on backtracking. Int. J. Netw. Distrib. Comput. 1(4), 226–238 (2013)CrossRef Chen, Q., Hsu, M., Malu, C.: Fault tolerant distributed stream processing based on backtracking. Int. J. Netw. Distrib. Comput. 1(4), 226–238 (2013)CrossRef
7.
go back to reference Cherniack, M., Balakrishnan, H., Balazinska, M., Carney, D., Cetintemel, U., Xing, Y., Zdonik, S.B.: Scalable distributed stream processing. CIDR 3, 257–268 (2003) Cherniack, M., Balakrishnan, H., Balazinska, M., Carney, D., Cetintemel, U., Xing, Y., Zdonik, S.B.: Scalable distributed stream processing. CIDR 3, 257–268 (2003)
8.
go back to reference de Assuncao, M.D., da Silva, A., Buyya, R.: Distributed data stream processing and edge computing: a survey on resource elasticity and future directions. J. Netw. Comput. Appl. 103, 1–17 (2018)CrossRef de Assuncao, M.D., da Silva, A., Buyya, R.: Distributed data stream processing and edge computing: a survey on resource elasticity and future directions. J. Netw. Comput. Appl. 103, 1–17 (2018)CrossRef
9.
go back to reference De Matteis, T., Mencagli, G.: Elastic scaling for distributed latency-sensitive data stream operators. In: Proccedings of the 2017 25th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP), IEEE, pp. 61–68 (2017) De Matteis, T., Mencagli, G.: Elastic scaling for distributed latency-sensitive data stream operators. In: Proccedings of the 2017 25th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP), IEEE, pp. 61–68 (2017)
10.
go back to reference Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRef Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRef
11.
go back to reference Gedik, B., Schneider, S., Hirzel, M., Wu, K.L.: Elastic scaling for data stream processing. Parallel Distrib. Syst. IEEE Trans. 25(25), 1447–1463 (2014)CrossRef Gedik, B., Schneider, S., Hirzel, M., Wu, K.L.: Elastic scaling for data stream processing. Parallel Distrib. Syst. IEEE Trans. 25(25), 1447–1463 (2014)CrossRef
12.
go back to reference Gu, Y., Zhang, Z., Ye, F., Yang, H., Kim, M., Lei, H., Liu, Z.: An empirical study of high availability in stream processing systems. In: Proceedings of the 10th ACM/IFIP/USENIX International Conference on Middleware, Springer-Verlag New York, Inc., p. 23 (2009) Gu, Y., Zhang, Z., Ye, F., Yang, H., Kim, M., Lei, H., Liu, Z.: An empirical study of high availability in stream processing systems. In: Proceedings of the 10th ACM/IFIP/USENIX International Conference on Middleware, Springer-Verlag New York, Inc., p. 23 (2009)
13.
go back to reference Gulisano, V., Jimenez-Peris, R., Patino-Martinez, M., Soriente, C., Valduriez, P.: Streamcloud: an elastic and scalable data streaming system. Parallel Distrib. Syst. IEEE Trans. 23(12), 2351–2365 (2012)CrossRef Gulisano, V., Jimenez-Peris, R., Patino-Martinez, M., Soriente, C., Valduriez, P.: Streamcloud: an elastic and scalable data streaming system. Parallel Distrib. Syst. IEEE Trans. 23(12), 2351–2365 (2012)CrossRef
14.
go back to reference He, B., Yang, M., Guo, Z., Chen, R., Su, B., Lin, W., Zhou, L.: Comet: batched stream processing for data intensive distributed computing. In: Proceedings of the 1st ACM Symposium on Cloud Computing, ACM, pp. 63–74 (2010) He, B., Yang, M., Guo, Z., Chen, R., Su, B., Lin, W., Zhou, L.: Comet: batched stream processing for data intensive distributed computing. In: Proceedings of the 1st ACM Symposium on Cloud Computing, ACM, pp. 63–74 (2010)
16.
go back to reference Heinze, T., Zia, M., Krahn, R., Jerzak, Z., Fetzer, C.: An adaptive replication scheme for elastic data stream processing systems. In: Proceedings of the 9th ACM International Conference on Distributed Event-Based Systems, ACM, pp. 150–161 (2015) Heinze, T., Zia, M., Krahn, R., Jerzak, Z., Fetzer, C.: An adaptive replication scheme for elastic data stream processing systems. In: Proceedings of the 9th ACM International Conference on Distributed Event-Based Systems, ACM, pp. 150–161 (2015)
17.
go back to reference Hunt, P., Konar, M., Junqueira, F.P., Reed, B.: Zookeeper: Wait-free coordination for internet-scale systems. In: USENIX Annual Technical Conference, Boston, MA, USA, vol. 8 (2010) Hunt, P., Konar, M., Junqueira, F.P., Reed, B.: Zookeeper: Wait-free coordination for internet-scale systems. In: USENIX Annual Technical Conference, Boston, MA, USA, vol. 8 (2010)
18.
go back to reference Hwang, J.H., Balazinska, M., Rasin, A., Çetintemel, U., Stonebraker, M., Zdonik, S.: High-availability algorithms for distributed stream processing. In: Proceedings of the 21st International Conference on Data Engineering. ICDE 2005, IEEE, pp. 779–790 (2005) Hwang, J.H., Balazinska, M., Rasin, A., Çetintemel, U., Stonebraker, M., Zdonik, S.: High-availability algorithms for distributed stream processing. In: Proceedings of the 21st International Conference on Data Engineering. ICDE 2005, IEEE, pp. 779–790 (2005)
19.
go back to reference Imai, S., Patterson, S., Varela, C.A.: Uncertainty-aware elastic virtual machine scheduling for stream processing systems. In: 2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), IEEE, pp. 62–71 (2018) Imai, S., Patterson, S., Varela, C.A.: Uncertainty-aware elastic virtual machine scheduling for stream processing systems. In: 2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), IEEE, pp. 62–71 (2018)
20.
go back to reference Javed, M.H., Lu, X., Panda, D.K.: Cutting the tail: designing high performance message brokers to reduce tail latencies in stream processing. In: 2018 IEEE International Conference on Cluster Computing (CLUSTER), IEEE, pp. 223–233 (2018) Javed, M.H., Lu, X., Panda, D.K.: Cutting the tail: designing high performance message brokers to reduce tail latencies in stream processing. In: 2018 IEEE International Conference on Cluster Computing (CLUSTER), IEEE, pp. 223–233 (2018)
21.
go back to reference Koldehofe, B., Mayer, R., Ramachandran, U., Rothermel, K., Völz, M.: Rollback-recovery without checkpoints in distributed event processing systems. In: Proceedings of the 7th ACM International Conference on Distributed Event-Based Systems, ACM, pp. 27–38 (2013) Koldehofe, B., Mayer, R., Ramachandran, U., Rothermel, K., Völz, M.: Rollback-recovery without checkpoints in distributed event processing systems. In: Proceedings of the 7th ACM International Conference on Distributed Event-Based Systems, ACM, pp. 27–38 (2013)
22.
go back to reference Li, H., Wu, J., Jiang, Z., Li, X., Wei, X.: Minimum backups for stream processing with recovery latency guarantees. IEEE Trans. Reliab. PP(99), 1–12 (2017) Li, H., Wu, J., Jiang, Z., Li, X., Wei, X.: Minimum backups for stream processing with recovery latency guarantees. IEEE Trans. Reliab. PP(99), 1–12 (2017)
23.
go back to reference Li, M., Tan, J., Wang, Y., Zhang, L., Salapura, V.: Sparkbench: a spark benchmarking suite characterizing large-scale in-memory data analytics. Clust. Comput. 20(3), 2575–2589 (2017)CrossRef Li, M., Tan, J., Wang, Y., Zhang, L., Salapura, V.: Sparkbench: a spark benchmarking suite characterizing large-scale in-memory data analytics. Clust. Comput. 20(3), 2575–2589 (2017)CrossRef
24.
go back to reference Liu Z., Huang H., He Q., Chiew K., Gao Y.: Rare category exploration on llnear time complexity. In: Renz M., Shahabi C., Zhou X., Cheema M. (eds) Database Systems for Advanced Applications. DASFAA 2015. Lecture Notes in Computer Science, vol. 9050, pp. 37–54. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-18123-3_3 Liu Z., Huang H., He Q., Chiew K., Gao Y.: Rare category exploration on llnear time complexity. In: Renz M., Shahabi C., Zhou X., Cheema M. (eds) Database Systems for Advanced Applications. DASFAA 2015. Lecture Notes in Computer Science, vol. 9050, pp. 37–54. Springer, Cham (2015). https://​doi.​org/​10.​1007/​978-3-319-18123-3_​3
25.
go back to reference Lohrmann, B., Janacik, P., Kao, O.: Elastic stream processing with latency guarantees. In: IEEE International Conference on Distributed Computing Systems, pp. 399–410 (2015) Lohrmann, B., Janacik, P., Kao, O.: Elastic stream processing with latency guarantees. In: IEEE International Conference on Distributed Computing Systems, pp. 399–410 (2015)
26.
go back to reference Lombardi, F., Aniello, L., Bonomi, S., Querzoni, L.: Elastic symbiotic scaling of operators and resources in stream processing systems. IEEE Trans. Parallel Distrib. Syst. 29(3), 572–585 (2018)CrossRef Lombardi, F., Aniello, L., Bonomi, S., Querzoni, L.: Elastic symbiotic scaling of operators and resources in stream processing systems. IEEE Trans. Parallel Distrib. Syst. 29(3), 572–585 (2018)CrossRef
27.
go back to reference Martin, A., Fetzer, C., Brito, A.: Active replication at (almost) no cost. In: 2011 30th IEEE Symposium on Reliable Distributed Systems (SRDS), IEEE, pp. 21–30 (2011) Martin, A., Fetzer, C., Brito, A.: Active replication at (almost) no cost. In: 2011 30th IEEE Symposium on Reliable Distributed Systems (SRDS), IEEE, pp. 21–30 (2011)
28.
go back to reference Marz, N.: Storm: distributed and fault-tolerant realtime computation (2013) Marz, N.: Storm: distributed and fault-tolerant realtime computation (2013)
29.
go back to reference Mencagli, G., Torquati, M., Danelutto, M.: Elastic-ppq: a two-level autonomic system for spatial preference query processing over dynamic data streams. Future Gener. Comput. Syst. 79, 862–877 (2018)CrossRef Mencagli, G., Torquati, M., Danelutto, M.: Elastic-ppq: a two-level autonomic system for spatial preference query processing over dynamic data streams. Future Gener. Comput. Syst. 79, 862–877 (2018)CrossRef
30.
go back to reference Neumeyer, L., Robbins, B., Nair, A., Kesari, A.: S4: Distributed stream computing platform. In: 2010 IEEE International Conference on Data Mining Workshops (ICDMW), IEEE, pp. 170–177 (2010) Neumeyer, L., Robbins, B., Nair, A., Kesari, A.: S4: Distributed stream computing platform. In: 2010 IEEE International Conference on Data Mining Workshops (ICDMW), IEEE, pp. 170–177 (2010)
31.
go back to reference Qian, Z., He, Y., Su, C., Wu, Z., Zhu, H., Zhang, T., Zhou, L., Yu, Y., Zhang, Z.: Timestream: reliable stream computation in the cloud. In: Proceedings of the 8th ACM European Conference on Computer Systems, ACM, pp. 1–14 (2013) Qian, Z., He, Y., Su, C., Wu, Z., Zhu, H., Zhang, T., Zhou, L., Yu, Y., Zhang, Z.: Timestream: reliable stream computation in the cloud. In: Proceedings of the 8th ACM European Conference on Computer Systems, ACM, pp. 1–14 (2013)
32.
go back to reference Sîrbu, A., Babaoglu, O.: Towards operator-less data centers through data-driven, predictive, proactive autonomics. Clust. Comput. 19(2), 865–878 (2016)CrossRef Sîrbu, A., Babaoglu, O.: Towards operator-less data centers through data-driven, predictive, proactive autonomics. Clust. Comput. 19(2), 865–878 (2016)CrossRef
34.
go back to reference Wang, H., Peh, L.S., Koukoumidis, E., Tao, S., Chan, M.C.: Meteor shower: a reliable stream processing system for commodity data centers. In: 2012 IEEE 26th International on Parallel & Distributed Processing Symposium (IPDPS), IEEE, pp. 1180–1191 (2012) Wang, H., Peh, L.S., Koukoumidis, E., Tao, S., Chan, M.C.: Meteor shower: a reliable stream processing system for commodity data centers. In: 2012 IEEE 26th International on Parallel & Distributed Processing Symposium (IPDPS), IEEE, pp. 1180–1191 (2012)
35.
go back to reference Wei, X., Xiang, L., Hongliang, L., Cong, L., Yuan, Z.: Flexible online mapreduce model and topology protocols supporting large-scale stream data processing. J. Jilin Univ. (Eng. Technol. Edn.) 46(4), 1222–1231 (2016) Wei, X., Xiang, L., Hongliang, L., Cong, L., Yuan, Z.: Flexible online mapreduce model and topology protocols supporting large-scale stream data processing. J. Jilin Univ. (Eng. Technol. Edn.) 46(4), 1222–1231 (2016)
36.
go back to reference Wei, X., Li, L., Li, X., Wang, X., Gao, S., Li, H.: Pec: proactive elastic collaborative resourcescheduling in data stream processing. In: Proceedings of the IEEE Transactions on Parallel and Distributed Systems (2019) Wei, X., Li, L., Li, X., Wang, X., Gao, S., Li, H.: Pec: proactive elastic collaborative resourcescheduling in data stream processing. In: Proceedings of the IEEE Transactions on Parallel and Distributed Systems (2019)
37.
go back to reference Wu, Y., Tan, K.L.: Chronostream: elastic stateful stream computation in the cloud. In: IEEE International Conference on Data Engineering, pp. 723–734 (2015) Wu, Y., Tan, K.L.: Chronostream: elastic stateful stream computation in the cloud. In: IEEE International Conference on Data Engineering, pp. 723–734 (2015)
38.
go back to reference Zaharia, M., Das, T., Li, H., Hunter, T., Shenker, S., Stoica, I.: Discretized streams: a fault-tolerant model for scalable stream processing. Technical Report, DTIC Document (2012) Zaharia, M., Das, T., Li, H., Hunter, T., Shenker, S., Stoica, I.: Discretized streams: a fault-tolerant model for scalable stream processing. Technical Report, DTIC Document (2012)
39.
go back to reference Zhang, Z., Gu, Y., Ye, F., Yang, H., Kim, M., Lei, H., Liu, Z.: A hybrid approach to high availability in stream processing systems. In: 2010 IEEE 30th International Conference on Distributed Computing Systems (ICDCS), IEEE, pp. 138–148 (2010) Zhang, Z., Gu, Y., Ye, F., Yang, H., Kim, M., Lei, H., Liu, Z.: A hybrid approach to high availability in stream processing systems. In: 2010 IEEE 30th International Conference on Distributed Computing Systems (ICDCS), IEEE, pp. 138–148 (2010)
Metadata
Title
Reliable stream data processing for elastic distributed stream processing systems
Authors
Xiaohui Wei
Yuan Zhuang
Hongliang Li
Zhiliang Liu
Publication date
21-05-2019
Publisher
Springer US
Published in
Cluster Computing / Issue 2/2020
Print ISSN: 1386-7857
Electronic ISSN: 1573-7543
DOI
https://doi.org/10.1007/s10586-019-02939-9

Other articles of this Issue 2/2020

Cluster Computing 2/2020 Go to the issue

Premium Partner