Skip to main content
Top

2017 | OriginalPaper | Chapter

A Case for Abstract Cost Models for Distributed Execution of Analytics Operators

Authors : Rundong Li, Ningfang Mi, Mirek Riedewald, Yizhou Sun, Yi Yao

Published in: Big Data Analytics and Knowledge Discovery

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

We consider data analytics workloads on distributed architectures, in particular clusters of commodity machines. To find a job partitioning that minimizes running time, a cost model, which we more accurately refer to as makespan model, is needed. In attempting to find the simplest possible, but sufficiently accurate, such model, we explore piecewise linear functions of input, output, and computational complexity. They are abstract in the sense that they capture fundamental algorithm properties, but do not require explicit modeling of system and implementation details such as the number of disk accesses. We show how the simplified functional structure can be exploited by directly integrating the model into the makespan optimization process, reducing complexity by orders of magnitude. Experimental results provide evidence of good prediction quality and successful makespan optimization across a variety of cluster architectures.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Agarwal, R.C., Balle, S.M., Gustavson, F.G., Joshi, M., Palkar, P.: A three-dimensional approach to parallel matrix multiplication. IBM J. Res. Dev. 39(5), 575–582 (1995)CrossRef Agarwal, R.C., Balle, S.M., Gustavson, F.G., Joshi, M., Palkar, P.: A three-dimensional approach to parallel matrix multiplication. IBM J. Res. Dev. 39(5), 575–582 (1995)CrossRef
2.
go back to reference Akdere, M., Cetintemel, U., Riondato, M., Upfal, E., Zdonik, S.: Learning-based query performance modeling and prediction. In: Proceedings of the ICDE, pp. 390–401 (2012) Akdere, M., Cetintemel, U., Riondato, M., Upfal, E., Zdonik, S.: Learning-based query performance modeling and prediction. In: Proceedings of the ICDE, pp. 390–401 (2012)
3.
go back to reference Asanovic, K., Bodik, R., Demmel, J., Keaveny, T., Keutzer, K., Kubiatowicz, J., Morgan, N., Patterson, D., Sen, K., Wawrzynek, J., Wessel, D., Yelick, K.: A view of the parallel computing landscape. Commun. ACM 52(10), 56–67 (2009)CrossRef Asanovic, K., Bodik, R., Demmel, J., Keaveny, T., Keutzer, K., Kubiatowicz, J., Morgan, N., Patterson, D., Sen, K., Wawrzynek, J., Wessel, D., Yelick, K.: A view of the parallel computing landscape. Commun. ACM 52(10), 56–67 (2009)CrossRef
4.
go back to reference Ballard, G., Buluc, A., Demmel, J., Grigori, L., Lipshitz, B., Schwartz, O., Toledo, S.: Communication optimal parallel multiplication of sparse random matrices. In: Proceedings of the SPAA, pp. 222–231 (2013) Ballard, G., Buluc, A., Demmel, J., Grigori, L., Lipshitz, B., Schwartz, O., Toledo, S.: Communication optimal parallel multiplication of sparse random matrices. In: Proceedings of the SPAA, pp. 222–231 (2013)
5.
go back to reference Duggan, J., Cetintemel, U., Papaemmanouil, O., Upfal, E.: Performance prediction for concurrent database workloads. In: Proceedings of the SIGMOD, pp. 337–348 (2011) Duggan, J., Cetintemel, U., Papaemmanouil, O., Upfal, E.: Performance prediction for concurrent database workloads. In: Proceedings of the SIGMOD, pp. 337–348 (2011)
6.
go back to reference Duggan, J., Papaemmanouil, O., Çetintemel, U., Upfal, E.: Contender: A resource modeling approach for concurrent query performance prediction. In: Proceedings of the EDBT, pp. 109–120 (2014) Duggan, J., Papaemmanouil, O., Çetintemel, U., Upfal, E.: Contender: A resource modeling approach for concurrent query performance prediction. In: Proceedings of the EDBT, pp. 109–120 (2014)
7.
go back to reference Elmroth, E., Gustavson, F., Jonsson, I., Kågström, B.: Recursive blocked algorithms and hybrid data structures for dense matrix library software. SIAM Rev. 46(1), 3–45 (2004)MathSciNetCrossRefMATH Elmroth, E., Gustavson, F., Jonsson, I., Kågström, B.: Recursive blocked algorithms and hybrid data structures for dense matrix library software. SIAM Rev. 46(1), 3–45 (2004)MathSciNetCrossRefMATH
8.
go back to reference Ganapathi, A., Kuno, H.A., Dayal, U., Wiener, J.L., Fox, A., Jordan, M.I., Patterson, D.A.: Predicting multiple metrics for queries: Better decisions enabled by machine learning. In: Proceedings of the ICDE, pp. 592–603 (2009) Ganapathi, A., Kuno, H.A., Dayal, U., Wiener, J.L., Fox, A., Jordan, M.I., Patterson, D.A.: Predicting multiple metrics for queries: Better decisions enabled by machine learning. In: Proceedings of the ICDE, pp. 592–603 (2009)
9.
go back to reference van de Geijn, R.A., Watts, J.: Summa: Scalable universal matrix multiplication algorithm. University of Texas at Austin, Technical report (1995) van de Geijn, R.A., Watts, J.: Summa: Scalable universal matrix multiplication algorithm. University of Texas at Austin, Technical report (1995)
10.
go back to reference Herodotou, H., Babu, S.: Profiling, what-if analysis, and cost-based optimization of mapreduce programs. VLDB 4(11), 1111–1122 (2011) Herodotou, H., Babu, S.: Profiling, what-if analysis, and cost-based optimization of mapreduce programs. VLDB 4(11), 1111–1122 (2011)
11.
go back to reference Huang, B., Babu, S., Yang, J.: Cumulon: optimizing statistical data analysis in the cloud. In: Proceedings of the SIGMOD, pp. 1–12 (2013) Huang, B., Babu, S., Yang, J.: Cumulon: optimizing statistical data analysis in the cloud. In: Proceedings of the SIGMOD, pp. 1–12 (2013)
12.
go back to reference Irony, D., Toledo, S., Tiskin, A.: Communication lower bounds for distributed-memory matrix multiplication. J. Parallel Distrib. Comput. 64(9), 1017–1026 (2004)CrossRefMATH Irony, D., Toledo, S., Tiskin, A.: Communication lower bounds for distributed-memory matrix multiplication. J. Parallel Distrib. Comput. 64(9), 1017–1026 (2004)CrossRefMATH
13.
go back to reference Lichman, M.: UCI machine learning repository (2013) Lichman, M.: UCI machine learning repository (2013)
14.
go back to reference Morton, K., Balazinska, M., Grossman, D.: Paratimer: a progress indicator for mapreduce DAGs. In: Proceedings of the SIGMOD, pp. 507–518 (2010) Morton, K., Balazinska, M., Grossman, D.: Paratimer: a progress indicator for mapreduce DAGs. In: Proceedings of the SIGMOD, pp. 507–518 (2010)
15.
go back to reference Quinlan, J.R., et al.: Learning with continuous classes. In: Australian Joint Conference on Artificial Intelligence, vol. 92, pp. 343–348 (1992) Quinlan, J.R., et al.: Learning with continuous classes. In: Australian Joint Conference on Artificial Intelligence, vol. 92, pp. 343–348 (1992)
16.
go back to reference Ramakrishnan, R., Gehrke, J.: Database Management Systems, 3rd edn. McGraw-Hill, New York (2003)MATH Ramakrishnan, R., Gehrke, J.: Database Management Systems, 3rd edn. McGraw-Hill, New York (2003)MATH
17.
go back to reference Solomonik, E., Demmel, J.: Communication-optimal parallel 2.5D matrix multiplication and LU factorization algorithms. In: Jeannot, E., Namyst, R., Roman, J. (eds.) Euro-Par 2011. LNCS, vol. 6853, pp. 90–109. Springer, Heidelberg (2011). doi:10.1007/978-3-642-23397-5_10 CrossRef Solomonik, E., Demmel, J.: Communication-optimal parallel 2.5D matrix multiplication and LU factorization algorithms. In: Jeannot, E., Namyst, R., Roman, J. (eds.) Euro-Par 2011. LNCS, vol. 6853, pp. 90–109. Springer, Heidelberg (2011). doi:10.​1007/​978-3-642-23397-5_​10 CrossRef
18.
go back to reference Valsalam, V., Skjellum, A.: A framework for high-performance matrix multiplication based on hierarchical abstractions, algorithms and optimized low-level kernels. In: Concurrency and Computation: Practice and Experience, vol. 14(10), pp. 805–839 (2002) Valsalam, V., Skjellum, A.: A framework for high-performance matrix multiplication based on hierarchical abstractions, algorithms and optimized low-level kernels. In: Concurrency and Computation: Practice and Experience, vol. 14(10), pp. 805–839 (2002)
19.
go back to reference Venkataraman, S., Yang, Z., Franklin, M., Recht, B., Stoica, I.: Ernest: efficient performance prediction for large-scale advanced analytics. In: NSDI, pp. 363–378 (2016) Venkataraman, S., Yang, Z., Franklin, M., Recht, B., Stoica, I.: Ernest: efficient performance prediction for large-scale advanced analytics. In: NSDI, pp. 363–378 (2016)
20.
go back to reference Vieth, E.: Fitting piecewise linear regression functions to biological responses. J. Appl. Physiol. 67(1), 390–396 (1989) Vieth, E.: Fitting piecewise linear regression functions to biological responses. J. Appl. Physiol. 67(1), 390–396 (1989)
21.
go back to reference Wang, G., Chan, C.Y.: Multi-query optimization in mapreduce framework. In: Proceedings of the VLDB, pp. 145–156 (2013) Wang, G., Chan, C.Y.: Multi-query optimization in mapreduce framework. In: Proceedings of the VLDB, pp. 145–156 (2013)
22.
go back to reference White, B., Lepreau, J., Stoller, L., Ricci, R., Guruprasad, S., Newbold, M., Hibler, M., Barb, C., Joglekar, A.: An integrated experimental environment for distributed systems and networks. In: Proceedings of the OSDI, pp. 255–270 (2002) White, B., Lepreau, J., Stoller, L., Ricci, R., Guruprasad, S., Newbold, M., Hibler, M., Barb, C., Joglekar, A.: An integrated experimental environment for distributed systems and networks. In: Proceedings of the OSDI, pp. 255–270 (2002)
23.
go back to reference Wu, W., Chi, Y., Hacígümüş, H., Naughton, J.F.: Towards predicting query execution time for concurrent and dynamic database workloads. Proc. VLDB 6(10), 925–936 (2013)CrossRef Wu, W., Chi, Y., Hacígümüş, H., Naughton, J.F.: Towards predicting query execution time for concurrent and dynamic database workloads. Proc. VLDB 6(10), 925–936 (2013)CrossRef
24.
go back to reference Zhang, X., Chen, L., Wang, M.: Efficient multi-way theta-join processing using mapreduce. Proc. VLDB 5(11), 1184–1195 (2012)CrossRef Zhang, X., Chen, L., Wang, M.: Efficient multi-way theta-join processing using mapreduce. Proc. VLDB 5(11), 1184–1195 (2012)CrossRef
Metadata
Title
A Case for Abstract Cost Models for Distributed Execution of Analytics Operators
Authors
Rundong Li
Ningfang Mi
Mirek Riedewald
Yizhou Sun
Yi Yao
Copyright Year
2017
DOI
https://doi.org/10.1007/978-3-319-64283-3_11

Premium Partner