Skip to main content
Erschienen in: The VLDB Journal 5/2019

03.09.2019 | Regular Paper

On the performance and convergence of distributed stream processing via approximate fault tolerance

verfasst von: Zhinan Cheng, Qun Huang, Patrick P. C. Lee

Erschienen in: The VLDB Journal | Ausgabe 5/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Fault tolerance is critical for distributed stream processing systems, yet achieving error-free fault tolerance often incurs substantial performance overhead. We present AF-Stream, a distributed stream processing system that addresses the trade-off between performance and accuracy in fault tolerance. AF-Stream builds on a notion called approximate fault tolerance, whose idea is to mitigate backup overhead by adaptively issuing backups, while ensuring that the errors upon failures are bounded with theoretical guarantees. Specifically, AF-Stream allows users to specify bounds on both the state divergence and the loss of non-backup streaming items. It issues state and item backups only when the bounds are reached. Our AF-Stream design provides an extensible programming model for incorporating general streaming algorithms as well as exports only few threshold parameters for configuring approximation fault tolerance. Furthermore, we formally prove that AF-Stream preserves high algorithm-specific accuracy of streaming algorithms, and in particular the convergence guarantees of online learning. Experiments show that AF-Stream maintains high performance (compared to no fault tolerance) and high accuracy after multiple failures (compared to no failures) under various streaming algorithms.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
1
We also allow an operator to maintain an empty state (i.e., stateless).
 
Literatur
1.
Zurück zum Zitat Abadi, D.J., Carney, D., Cetintemel, U., Cherniack, M., Convey, C., Lee, S., Stonebraker, M., Tatbul, N., Zdonik, S.: Aurora: a new model and architecture for data stream management. VLDB J. 12, 120–139 (2003)CrossRef Abadi, D.J., Carney, D., Cetintemel, U., Cherniack, M., Convey, C., Lee, S., Stonebraker, M., Tatbul, N., Zdonik, S.: Aurora: a new model and architecture for data stream management. VLDB J. 12, 120–139 (2003)CrossRef
2.
Zurück zum Zitat Agarwal, S., Mozafari, B., Panda, A., Milner, H., Madden, S., Stoica, I.: BlinkDB: queries with bounded errors and bounded response times on very large data. In: Proceedings of EuroSys, pp. 29–42 (2013) Agarwal, S., Mozafari, B., Panda, A., Milner, H., Madden, S., Stoica, I.: BlinkDB: queries with bounded errors and bounded response times on very large data. In: Proceedings of EuroSys, pp. 29–42 (2013)
3.
Zurück zum Zitat Agarwal, S., Zeng, K.: BlinkDB and G-OLA: supporting continuous answers with error bars in SparkSQL. In: Spark Summit (2015) Agarwal, S., Zeng, K.: BlinkDB and G-OLA: supporting continuous answers with error bars in SparkSQL. In: Spark Summit (2015)
4.
Zurück zum Zitat Ahmed, A., Aly, M., Gonzalez, J., Narayanamurthy, S., Smola, A.J.: Scalable inference in latent variable models. In: Proceedings of WSDM (2012) Ahmed, A., Aly, M., Gonzalez, J., Narayanamurthy, S., Smola, A.J.: Scalable inference in latent variable models. In: Proceedings of WSDM (2012)
5.
Zurück zum Zitat Akidau, T., Balikov, A., Bekiro, K., Chernyak, S., Haberman, J., Lax, R., Mcveety, S., Mills, D., Nordstrom, P., Whittle, S.: MillWheel: fault-tolerant stream processing at internet scale. Proc. VLDB Endow. 6, 1033–1044 (2013)CrossRef Akidau, T., Balikov, A., Bekiro, K., Chernyak, S., Haberman, J., Lax, R., Mcveety, S., Mills, D., Nordstrom, P., Whittle, S.: MillWheel: fault-tolerant stream processing at internet scale. Proc. VLDB Endow. 6, 1033–1044 (2013)CrossRef
8.
Zurück zum Zitat Balazinska, M., Balakrishnan, H., Madden, S.R., Stonebraker, M.: Fault-tolerance in the borealis distributed stream processing system. In: Proceedings of SIGMOD, pp. 13–24 (2005) Balazinska, M., Balakrishnan, H., Madden, S.R., Stonebraker, M.: Fault-tolerance in the borealis distributed stream processing system. In: Proceedings of SIGMOD, pp. 13–24 (2005)
9.
Zurück zum Zitat Baldi, P., Sadowski, P., Whiteson, D.: Searching for exotic particles in high-energy physics with deep learning. Nat. Commun. 5, 4308:1–4308:9 (2014) Baldi, P., Sadowski, P., Whiteson, D.: Searching for exotic particles in high-energy physics with deep learning. Nat. Commun. 5, 4308:1–4308:9 (2014)
10.
Zurück zum Zitat Bellavista, P., Corradi, A., Kotoulas, S., Reale, A.: Adaptive fault-tolerance for dynamic resource provisioning in distributed stream processing systems. In: Proceedings of IEEE EDBT, pp. 85–96 (2014) Bellavista, P., Corradi, A., Kotoulas, S., Reale, A.: Adaptive fault-tolerance for dynamic resource provisioning in distributed stream processing systems. In: Proceedings of IEEE EDBT, pp. 85–96 (2014)
11.
Zurück zum Zitat Bhatotia, P., Wieder, A., Rodrigues, R., Acar, U., Pasquin, R.: Incoop: Mapreduce for incremental computations. In: Proceedings of SoCC, pp. 7:1–7:14 (2011) Bhatotia, P., Wieder, A., Rodrigues, R., Acar, U., Pasquin, R.: Incoop: Mapreduce for incremental computations. In: Proceedings of SoCC, pp. 7:1–7:14 (2011)
12.
Zurück zum Zitat Bottou, L.: Online algorithms and stochastic approximations. In: Saad, D. (ed.) Online Learning and Neural Networks. Cambridge University Press, Cambridge, UK (1998)MATH Bottou, L.: Online algorithms and stochastic approximations. In: Saad, D. (ed.) Online Learning and Neural Networks. Cambridge University Press, Cambridge, UK (1998)MATH
13.
Zurück zum Zitat Bottou, L.: Large-scale machine learning with stochastic gradient descent. In: Proceedings of COMPSTAT (2010)CrossRef Bottou, L.: Large-scale machine learning with stochastic gradient descent. In: Proceedings of COMPSTAT (2010)CrossRef
14.
Zurück zum Zitat Bottou, L., Bousquet, O.: The tradeoffs of large scale learning. In: Proceedings of NIPS (2007) Bottou, L., Bousquet, O.: The tradeoffs of large scale learning. In: Proceedings of NIPS (2007)
15.
Zurück zum Zitat Bradley, J.K., Kyrola, A., Bickson, D., Guestrin, C.: Parallel coordinate descent for L1-regularized loss minimization. In: Proceedings of ICML (2011) Bradley, J.K., Kyrola, A., Bickson, D., Guestrin, C.: Parallel coordinate descent for L1-regularized loss minimization. In: Proceedings of ICML (2011)
16.
Zurück zum Zitat Busuttil, S., Kalnishkan, Y.: Online regression competitive with changing predictors. In: Proceedings of Alogrithmic Learning Theory, pp. 181–195 (2007) Busuttil, S., Kalnishkan, Y.: Online regression competitive with changing predictors. In: Proceedings of Alogrithmic Learning Theory, pp. 181–195 (2007)
18.
Zurück zum Zitat Canini, K.R.K., Shi, L., Griffiths, T.L.: Online inference of topics with latent Dirichlet allocation. In: Proceedings of AISTATS (2009) Canini, K.R.K., Shi, L., Griffiths, T.L.: Online inference of topics with latent Dirichlet allocation. In: Proceedings of AISTATS (2009)
19.
Zurück zum Zitat Carbone, P., Katsifodimos, A., Ewen, S., Markl, V., Haridi, S., Tzoumas, K.: Apache Flink: stream and batch processing in a single engine. In: Bulletin of the IEEE Computer Society Technical Committee on Data Engineering (2015) Carbone, P., Katsifodimos, A., Ewen, S., Markl, V., Haridi, S., Tzoumas, K.: Apache Flink: stream and batch processing in a single engine. In: Bulletin of the IEEE Computer Society Technical Committee on Data Engineering (2015)
20.
Zurück zum Zitat Cherniack, M., Balakrishnan, H., Balazinska, M., Carney, D., Cetintemel, U., Xing, Y., Zdonik, S.: Scalable distributed stream processing. In: CIDR (2003) Cherniack, M., Balakrishnan, H., Balazinska, M., Carney, D., Cetintemel, U., Xing, Y., Zdonik, S.: Scalable distributed stream processing. In: CIDR (2003)
21.
Zurück zum Zitat Condie, T., Conway, N., Alvaro, P., Hellerstein, J.M., Elmeleegy, K., Sears, R.: MapReduce online. In: Proceedings of NSDI, pp. 21–21 (2010) Condie, T., Conway, N., Alvaro, P., Hellerstein, J.M., Elmeleegy, K., Sears, R.: MapReduce online. In: Proceedings of NSDI, pp. 21–21 (2010)
22.
Zurück zum Zitat Cormode, G., Garofalakis, M., Haas, P.J., Jermaine, C.: Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches. Now Publishers Inc., Hanover (2012)MATH Cormode, G., Garofalakis, M., Haas, P.J., Jermaine, C.: Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches. Now Publishers Inc., Hanover (2012)MATH
23.
Zurück zum Zitat Cormode, G., Muthukrishnan, S.: What’s new: finding significant differences in network data streams. In: Proceedings of INFOCOM, pp. 1534–1545 (2004) Cormode, G., Muthukrishnan, S.: What’s new: finding significant differences in network data streams. In: Proceedings of INFOCOM, pp. 1534–1545 (2004)
24.
Zurück zum Zitat Cormode, G., Muthukrishnan, S.: An improved data stream summary: the count-min sketch and its applications. J. Algorithms 55(1), 58–75 (2005)MathSciNetCrossRefMATH Cormode, G., Muthukrishnan, S.: An improved data stream summary: the count-min sketch and its applications. J. Algorithms 55(1), 58–75 (2005)MathSciNetCrossRefMATH
25.
Zurück zum Zitat Dai, W., Kumar, A., Wei, J., Ho, Q., Gibson, G., Xing, E.P.: High-performance distributed ML at scale through parameter server consistency models. In: Proceedings of AAAI (2015) Dai, W., Kumar, A., Wei, J., Ho, Q., Gibson, G., Xing, E.P.: High-performance distributed ML at scale through parameter server consistency models. In: Proceedings of AAAI (2015)
26.
Zurück zum Zitat Das, T., Zhong, Y., Stoica, I., Shenker, S.: Adaptive stream processing using dynamic batch sizing. In: Proceedings of SoCC, pp. 16:1–16:13 (2014) Das, T., Zhong, Y., Stoica, I., Shenker, S.: Adaptive stream processing using dynamic batch sizing. In: Proceedings of SoCC, pp. 16:1–16:13 (2014)
27.
Zurück zum Zitat Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: Proceedings of OSDI, pp. 107–113 (2004)CrossRef Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: Proceedings of OSDI, pp. 107–113 (2004)CrossRef
28.
Zurück zum Zitat Estan, C., Varghese, G.: New directions in traffic measurement and accounting. In: Proceedings of SIGCOMM, pp. 323–336 (2002)CrossRef Estan, C., Varghese, G.: New directions in traffic measurement and accounting. In: Proceedings of SIGCOMM, pp. 323–336 (2002)CrossRef
29.
Zurück zum Zitat Fernandez, R.C., Migliavacca, M., Kalyvianaki, E., Pietzuch, P.: Integrating scale out and fault tolerance in stream processing using operator state management. In: Proceedings of SIGMOD, pp. 725–736 (2013) Fernandez, R.C., Migliavacca, M., Kalyvianaki, E., Pietzuch, P.: Integrating scale out and fault tolerance in stream processing using operator state management. In: Proceedings of SIGMOD, pp. 725–736 (2013)
30.
Zurück zum Zitat Fernandez, R.C., Migliavacca, M., Kalyvianaki, E., Pietzuch, P.: Making state explicit for imperative big data processing. In: Proceedings of USENIX ATC, pp. 49–60 (2014) Fernandez, R.C., Migliavacca, M., Kalyvianaki, E., Pietzuch, P.: Making state explicit for imperative big data processing. In: Proceedings of USENIX ATC, pp. 49–60 (2014)
31.
Zurück zum Zitat Gulisano, V., Jimenez-Peris, R., Patino-Martinez, M., Valduriez, P.: StreamCloud: a large scale data streaming system. In: Proceedings of ICDCS, pp. 126–137 (2010) Gulisano, V., Jimenez-Peris, R., Patino-Martinez, M., Valduriez, P.: StreamCloud: a large scale data streaming system. In: Proceedings of ICDCS, pp. 126–137 (2010)
32.
Zurück zum Zitat Haas, P.J., Hellerstein, J.M.: Ripple joins for online aggregation. In: Proceedings of SIGMOD, pp. 287–298 (1999)CrossRef Haas, P.J., Hellerstein, J.M.: Ripple joins for online aggregation. In: Proceedings of SIGMOD, pp. 287–298 (1999)CrossRef
33.
Zurück zum Zitat Hazan, E., Agarwal, A., Kale, S.: Logarithmic regret algorithms for online convex optimization. J. Mach. Learn. 69(2–3), 169–192 (2007)CrossRefMATH Hazan, E., Agarwal, A., Kale, S.: Logarithmic regret algorithms for online convex optimization. J. Mach. Learn. 69(2–3), 169–192 (2007)CrossRefMATH
34.
Zurück zum Zitat He, B., Yang, M., Guo, Z., Chen, R., Su, B., Lin, W., Zhou, L.: Comet : Batched stream processing for data intensive distributed computing. In: Proceedings of SoCC, pp. 63–74 (2010) He, B., Yang, M., Guo, Z., Chen, R., Su, B., Lin, W., Zhou, L.: Comet : Batched stream processing for data intensive distributed computing. In: Proceedings of SoCC, pp. 63–74 (2010)
35.
Zurück zum Zitat Ho, Q., Cipar, J., Cui, H., Kim, J.K., Lee, S., Gibbons, P.B., Gibson, G.A., Ganger, G.R., Xing, E.P.: More effective distributed ML via a stale synchronous parallel parameter server. In: Proceedings of NIPS, pp. 1223–1231 (2013) Ho, Q., Cipar, J., Cui, H., Kim, J.K., Lee, S., Gibbons, P.B., Gibson, G.A., Ganger, G.R., Xing, E.P.: More effective distributed ML via a stale synchronous parallel parameter server. In: Proceedings of NIPS, pp. 1223–1231 (2013)
36.
Zurück zum Zitat Hoffman, M., Blei, D., Bach, F.: Online learning for latent Dirichlet allocation. In: Proceedings of NIPS, pp. 856–864 (2010) Hoffman, M., Blei, D., Bach, F.: Online learning for latent Dirichlet allocation. In: Proceedings of NIPS, pp. 856–864 (2010)
37.
Zurück zum Zitat Hoffman, M.D., Blei, D.M., Wang, C., Paisley, J.: Stochastic variational inference. J. Mach. Learn. Res. 14(1), 1303–1347 (2013)MathSciNetMATH Hoffman, M.D., Blei, D.M., Wang, C., Paisley, J.: Stochastic variational inference. J. Mach. Learn. Res. 14(1), 1303–1347 (2013)MathSciNetMATH
38.
Zurück zum Zitat Hu, L., Schwan, K., Amur, H., Chen, X.: ELF: efficient lightweight fast stream processing at scale. In: Proceedings of USENIX ATC, pp. 25–36 (2014) Hu, L., Schwan, K., Amur, H., Chen, X.: ELF: efficient lightweight fast stream processing at scale. In: Proceedings of USENIX ATC, pp. 25–36 (2014)
39.
Zurück zum Zitat Huang, Q., Lee, P.P.C.: Toward high-performance distributed stream processing via approximate fault tolerance. Proc. VLDB Endow. 10(3), 73–84 (2016)CrossRef Huang, Q., Lee, P.P.C.: Toward high-performance distributed stream processing via approximate fault tolerance. Proc. VLDB Endow. 10(3), 73–84 (2016)CrossRef
40.
Zurück zum Zitat Hunt, P., Konar, M., Junqueira, F.P., Reed, B.: ZooKeeper: wait-free coordination for internet-scale systems. In: Proceedings of USENIX ATC, pp. 11–11 (2010) Hunt, P., Konar, M., Junqueira, F.P., Reed, B.: ZooKeeper: wait-free coordination for internet-scale systems. In: Proceedings of USENIX ATC, pp. 11–11 (2010)
41.
Zurück zum Zitat Hwang, J.-H., Balazinska, M., Rasin, A., Çetintemel, U., Stonebraker, M., Zdonik, S.: High-availability algorithms for distributed stream processing. In: Proceedings of ICDE, pp. 779–790 (2005) Hwang, J.-H., Balazinska, M., Rasin, A., Çetintemel, U., Stonebraker, M., Zdonik, S.: High-availability algorithms for distributed stream processing. In: Proceedings of ICDE, pp. 779–790 (2005)
42.
Zurück zum Zitat Hwang, J.-H., Xing, Y., Cetintemel, U., Zdonik, S.: A cooperative, self-configuring high-availability solution for stream processing. In: Proceedings of ICDE, pp. 176 – 185 (2007) Hwang, J.-H., Xing, Y., Cetintemel, U., Zdonik, S.: A cooperative, self-configuring high-availability solution for stream processing. In: Proceedings of ICDE, pp. 176 – 185 (2007)
43.
Zurück zum Zitat Jain, N., Mahajan, P., Kit, D., Yalagandula, P., Dahlin, M., Zhang, Y.: Network imprecision: a new consistency metric for scalable monitoring. In: Proceedings of OSDI, pp. 87–102 (2008) Jain, N., Mahajan, P., Kit, D., Yalagandula, P., Dahlin, M., Zhang, Y.: Network imprecision: a new consistency metric for scalable monitoring. In: Proceedings of OSDI, pp. 87–102 (2008)
44.
Zurück zum Zitat Krishnamurthy, S., Franklin, M.J., Davis, J., Farina, D., Golovko, P., Li, A., Thombre, N.: Continuous analytics over discontinuous streams. In: Proceedings of SIGMOD, pp. 1081–1092 (2010) Krishnamurthy, S., Franklin, M.J., Davis, J., Farina, D., Golovko, P., Li, A., Thombre, N.: Continuous analytics over discontinuous streams. In: Proceedings of SIGMOD, pp. 1081–1092 (2010)
45.
Zurück zum Zitat Kulkarni, S., Bhagat, N., Fu, M., Kedigehalli, V., Kellogg, C., Mittal, S., Patel, J.M., Ramasamy, K., Taneja, S.: Twitter Heron: stream processing at scale. In: Proceedings of SIGMOD, pp. 239–250 (2015) Kulkarni, S., Bhagat, N., Fu, M., Kedigehalli, V., Kellogg, C., Mittal, S., Patel, J.M., Ramasamy, K., Taneja, S.: Twitter Heron: stream processing at scale. In: Proceedings of SIGMOD, pp. 239–250 (2015)
46.
Zurück zum Zitat Langford, J., Smola, A., Zinkevich, M.: Slow learners are fast. In: Proceedings of NIPS, pp. 2331–2339 (2009) Langford, J., Smola, A., Zinkevich, M.: Slow learners are fast. In: Proceedings of NIPS, pp. 2331–2339 (2009)
47.
Zurück zum Zitat Li, M., Andersen, D.G., Park, J.W., Smola, A.J., Ahmed, A., Josifovski, V., Long, J., Shekita, E.J., Su, B.-Y.: Scaling distributed machine learning with the parameter server. In: Proceedings of OSDI, pp. 583–598 (2014) Li, M., Andersen, D.G., Park, J.W., Smola, A.J., Ahmed, A., Josifovski, V., Long, J., Shekita, E.J., Su, B.-Y.: Scaling distributed machine learning with the parameter server. In: Proceedings of OSDI, pp. 583–598 (2014)
48.
Zurück zum Zitat Lin, W., Qian, Z., Xu, J., Yang, S., Zhou, J., Zhou, L.: StreamScope: continuous reliable distributed processing of big data streams. In: Proceedings of NSDI, pp. 439–454 (2016) Lin, W., Qian, Z., Xu, J., Yang, S., Zhou, J., Zhou, L.: StreamScope: continuous reliable distributed processing of big data streams. In: Proceedings of NSDI, pp. 439–454 (2016)
49.
Zurück zum Zitat Liu, Q., Lui, J.C., He, C., Pan, L., Fan, W., Shi, Y.: SAND: a fault-tolerant streaming architecture for network traffic analytics. In: Proceedings of DSN, pp. 80–87 (2014) Liu, Q., Lui, J.C., He, C., Pan, L., Fan, W., Shi, Y.: SAND: a fault-tolerant streaming architecture for network traffic analytics. In: Proceedings of DSN, pp. 80–87 (2014)
50.
Zurück zum Zitat Logothetis, D., Olston, C., Reed, B., Webb, K.C., Yocum, K.: Stateful bulk processing for incremental analytics. In: Proceedings of SoCC, pp. 51–62 (2010) Logothetis, D., Olston, C., Reed, B., Webb, K.C., Yocum, K.: Stateful bulk processing for incremental analytics. In: Proceedings of SoCC, pp. 51–62 (2010)
51.
Zurück zum Zitat Logothetis, D., Trezzo, C., Webb, K.C., Yocum, K.: In-situ MapReduce for log processing. In: Proceedings of USENIX ATC, pp. 9–9 (2011) Logothetis, D., Trezzo, C., Webb, K.C., Yocum, K.: In-situ MapReduce for log processing. In: Proceedings of USENIX ATC, pp. 9–9 (2011)
52.
Zurück zum Zitat Luo, G., Ellmann, C.J., Haas, P.J., Naughton, J.F.: A scalable hash ripple join algorithm. In: Proceedings of SIGMOD, pp. 252–262 (2002) Luo, G., Ellmann, C.J., Haas, P.J., Naughton, J.F.: A scalable hash ripple join algorithm. In: Proceedings of SIGMOD, pp. 252–262 (2002)
53.
Zurück zum Zitat Martin, A., Knauth, T., Creutz, S., Becker, D., Weigert, S., Fetzer, C., Brito, A.: Low-overhead fault tolerance for high-throughput data processing systems. In: Proceedings of ICDCS, pp. 689–699 (2011) Martin, A., Knauth, T., Creutz, S., Becker, D., Weigert, S., Fetzer, C., Brito, A.: Low-overhead fault tolerance for high-throughput data processing systems. In: Proceedings of ICDCS, pp. 689–699 (2011)
54.
Zurück zum Zitat Meng, X., Bradley, J., Yavuz, B., Sparks, E., Venkataraman, S., Liu, D., Freeman, J., Tsai, D., Amde, M., Owen, S., et al.: Mllib: machine learning in apache spark. J. Mach. Learn. Res. 17(34), 1–7 (2016)MathSciNetMATH Meng, X., Bradley, J., Yavuz, B., Sparks, E., Venkataraman, S., Liu, D., Freeman, J., Tsai, D., Amde, M., Owen, S., et al.: Mllib: machine learning in apache spark. J. Mach. Learn. Res. 17(34), 1–7 (2016)MathSciNetMATH
55.
Zurück zum Zitat Murray, D.G., McSherry, F., Isaacs, R., Isard, M., Barham, P., Abadi, M.: Naiad: a timely dataflow system. In: Proceedings of SOSP, pp. 439–455 (2013) Murray, D.G., McSherry, F., Isaacs, R., Isard, M., Barham, P., Abadi, M.: Naiad: a timely dataflow system. In: Proceedings of SOSP, pp. 439–455 (2013)
57.
Zurück zum Zitat Neumeyer, L., Robbins, B., Nair, A., Kesari, A.: S4: distributed stream computing platform. In: KDCloud, pp. 170 – 177 (2010) Neumeyer, L., Robbins, B., Nair, A., Kesari, A.: S4: distributed stream computing platform. In: KDCloud, pp. 170 – 177 (2010)
58.
Zurück zum Zitat Niu, Y., Wang, Y., Sun, G., Yue, A., Dalessandro, B., Perlich, C., Hamner, B.: The tencent dataset and KDD-Cup’12. In: KDD-Cup Workshop (2012) Niu, Y., Wang, Y., Sun, G., Yue, A., Dalessandro, B., Perlich, C., Hamner, B.: The tencent dataset and KDD-Cup’12. In: KDD-Cup Workshop (2012)
60.
Zurück zum Zitat Pundir, M., Leslie, L.M., Gupta, I., Campbell, R.H.: Zorro: zero-cost reactive failure recovery in distributed graph processing. In: Proceedings of SoCC, pp. 195–208 (2015) Pundir, M., Leslie, L.M., Gupta, I., Campbell, R.H.: Zorro: zero-cost reactive failure recovery in distributed graph processing. In: Proceedings of SoCC, pp. 195–208 (2015)
61.
Zurück zum Zitat Qian, Z., He, Y., Su, C., Wu, Z., Zhu, H., Zhang, T., Zhou, L., Yu, Y., Zhang, Z.: TimeStream: reliable stream computation in the cloud. In: Proceedings of EuroSys, pp. 1–14 (2013) Qian, Z., He, Y., Su, C., Wu, Z., Zhu, H., Zhang, T., Zhou, L., Yu, Y., Zhang, Z.: TimeStream: reliable stream computation in the cloud. In: Proceedings of EuroSys, pp. 1–14 (2013)
62.
Zurück zum Zitat Quanrud, K., Khashabi, D.: Online learning with adversarial delays. In: Proceedings of NIPS (2015) Quanrud, K., Khashabi, D.: Online learning with adversarial delays. In: Proceedings of NIPS (2015)
63.
Zurück zum Zitat Rabkin, A., Arye, M., Sen, S., Pai, V.S., Freedman, M.J.: Aggregation and degradation in JetStream: streaming analytics in the wide area. In: Proceedings of NSDI, pp. 275–288 (2014) Rabkin, A., Arye, M., Sen, S., Pai, V.S., Freedman, M.J.: Aggregation and degradation in JetStream: streaming analytics in the wide area. In: Proceedings of NSDI, pp. 275–288 (2014)
64.
Zurück zum Zitat Shah, M.a., Hellerstein, J.M., Brewer, E.: highly available, fault-tolerant, parallel dataflows. In: Proceedings of SIGMOD, pp. 827–838 (2004) Shah, M.a., Hellerstein, J.M., Brewer, E.: highly available, fault-tolerant, parallel dataflows. In: Proceedings of SIGMOD, pp. 827–838 (2004)
65.
Zurück zum Zitat Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The Hadoop distributed file system. In: Proceedings of IEEE MSST, pp. 1–10 (2010) Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The Hadoop distributed file system. In: Proceedings of IEEE MSST, pp. 1–10 (2010)
66.
Zurück zum Zitat Smola, A., Narayanamurthy, S.: An architecture for parallel topic models. Proc. VLDB Endow. 3, 703–710 (2010)CrossRef Smola, A., Narayanamurthy, S.: An architecture for parallel topic models. Proc. VLDB Endow. 3, 703–710 (2010)CrossRef
67.
Zurück zum Zitat Song, H.H., Cho, T.W., Dave, V., Zhang, Y., Qiu, L.: Scalable proximity estimation and link prediction in online social networks. In: Proceedings of IMC, pp. 322–335 (2009) Song, H.H., Cho, T.W., Dave, V., Zhang, Y., Qiu, L.: Scalable proximity estimation and link prediction in online social networks. In: Proceedings of IMC, pp. 322–335 (2009)
71.
Zurück zum Zitat Tatbul, N., Çetintemel, U., Zdonik, S., Cherniack, M., Stonebraker, M.: Load shedding in a data stream manager. Proc. VLDB 29, 309–320 (2003) Tatbul, N., Çetintemel, U., Zdonik, S., Cherniack, M., Stonebraker, M.: Load shedding in a data stream manager. Proc. VLDB 29, 309–320 (2003)
72.
Zurück zum Zitat Tatbul, N., Zdonik, S.: Staying FIT: efficient load shedding techniques for distributed stream processing. In: Proceedings of VLDB, pp. 159–170 (2007) Tatbul, N., Zdonik, S.: Staying FIT: efficient load shedding techniques for distributed stream processing. In: Proceedings of VLDB, pp. 159–170 (2007)
73.
Zurück zum Zitat Toshniwal, A., Taneja, S., Shukla, A., Ramasamy, K., Patel, J.M., Kulkarni, S., Jackson, J., Gade, K., Fu, M., Donham, J., et al.: Storm@Twitter. In: Proceedings of ACM SIGMOD (2014) Toshniwal, A., Taneja, S., Shukla, A., Ramasamy, K., Patel, J.M., Kulkarni, S., Jackson, J., Gade, K., Fu, M., Donham, J., et al.: Storm@Twitter. In: Proceedings of ACM SIGMOD (2014)
75.
Zurück zum Zitat Wang, H., Peh, L.S., Koukoumidis, E., Tao, S., Chan, M.C.: Meteor shower: a reliable stream processing system for commodity data centers. In: Proceedings of IPDPS, pp. 1180–1191 (2012) Wang, H., Peh, L.S., Koukoumidis, E., Tao, S., Chan, M.C.: Meteor shower: a reliable stream processing system for commodity data centers. In: Proceedings of IPDPS, pp. 1180–1191 (2012)
76.
Zurück zum Zitat Wei, J., Dai, W., Qiao, A., Ho, Q., Cui, H., Ganger, G.R., Gibbons, P.B., Gibson, G.A., Xing, E.P.: Managed communication and consistency for fast data-parallel iterative analytics. In: SoCC, pp. 381–394 (2015) Wei, J., Dai, W., Qiao, A., Ho, Q., Cui, H., Ganger, G.R., Gibbons, P.B., Gibson, G.A., Xing, E.P.: Managed communication and consistency for fast data-parallel iterative analytics. In: SoCC, pp. 381–394 (2015)
77.
Zurück zum Zitat Xing, E.P., Ho, Q., Dai, W., Kim, J.K., Wei, J., Lee, S., Zheng, X.: Petuum: a new platform for distributed machine learning on big data. In: Proceedings of KDD, pp. 49–67 (2015)CrossRef Xing, E.P., Ho, Q., Dai, W., Kim, J.K., Wei, J., Lee, S., Zheng, X.: Petuum: a new platform for distributed machine learning on big data. In: Proceedings of KDD, pp. 49–67 (2015)CrossRef
78.
Zurück zum Zitat Yao, L., Mimno, D., McCallum, A.: Efficient methods for topic model inference on streaming document collections. In: Proceedings of KDD (2009) Yao, L., Mimno, D., McCallum, A.: Efficient methods for topic model inference on streaming document collections. In: Proceedings of KDD (2009)
79.
Zurück zum Zitat Yu, M., Jose, L., Miao, R.: Software defined traffic measurement with OpenSketch. In: Proceedings of NSDI, pp. 29–42 (2013) Yu, M., Jose, L., Miao, R.: Software defined traffic measurement with OpenSketch. In: Proceedings of NSDI, pp. 29–42 (2013)
80.
Zurück zum Zitat Zaharia, M., Das, T., Li, H., Hunter, T., Shenker, S., Stoica, I.: Discretized streams: fault-tolerant streaming computation at scale. In: Proceedings of SOSP, pp. 423–438 (2013) Zaharia, M., Das, T., Li, H., Hunter, T., Shenker, S., Stoica, I.: Discretized streams: fault-tolerant streaming computation at scale. In: Proceedings of SOSP, pp. 423–438 (2013)
83.
Zurück zum Zitat Zinkevich, M.: Online convex programming and generalized infinitesimal gradient ascent. In: Proceedings of ICML, pp. 928–936 (2003) Zinkevich, M.: Online convex programming and generalized infinitesimal gradient ascent. In: Proceedings of ICML, pp. 928–936 (2003)
84.
Zurück zum Zitat Zinkevich, M., Weimer, M., Smola, A.J., Li, L.: Parallelized stochastic gradient descent. In: Proceedings of NIPS (2010) Zinkevich, M., Weimer, M., Smola, A.J., Li, L.: Parallelized stochastic gradient descent. In: Proceedings of NIPS (2010)
Metadaten
Titel
On the performance and convergence of distributed stream processing via approximate fault tolerance
verfasst von
Zhinan Cheng
Qun Huang
Patrick P. C. Lee
Publikationsdatum
03.09.2019
Verlag
Springer Berlin Heidelberg
Erschienen in
The VLDB Journal / Ausgabe 5/2019
Print ISSN: 1066-8888
Elektronische ISSN: 0949-877X
DOI
https://doi.org/10.1007/s00778-019-00565-w

Weitere Artikel der Ausgabe 5/2019

The VLDB Journal 5/2019 Zur Ausgabe