Skip to main content
Top
Published in: Cluster Computing 1/2015

01-03-2015

A highly-accurate and low-overhead prediction model for transfer throughput optimization

Authors: JangYoung Kim, Esma Yildirim, Tevfik Kosar

Published in: Cluster Computing | Issue 1/2015

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

An important bottleneck for data-intensive scalable computing systems is efficient utilization of the network links that connect the collaborating institutions with their remote partners, data sources, and computational sites. To alleviate this bottleneck, we propose an application-layer throughput optimization model based on parallel stream number prediction. This new model extends our two previous models (Partial C-order and Full Second-order) to achieve higher accuracy and lower overhead predictions. Our new model, called Full C-order, outperforms both of our previous models as well as the three most relevant models by others (the Partial Second-order, Hacker et al., and Altman et al. models) in terms of both accuracy and efficiency. We test and compare these six models on emulated testbeds and on production environments using a wide variety of data set sizes, RTT, and bandwidth combinations. Our comprehensive experiments confirm the superiority of our new model to the other five models.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
5.
go back to reference Garfienkel, S.: An evaluation of Amazon’s Grid computing services: EC2, S3 and SQS. Tech. Rep. TR-08-07, Aug. 2007 Garfienkel, S.: An evaluation of Amazon’s Grid computing services: EC2, S3 and SQS. Tech. Rep. TR-08-07, Aug. 2007
6.
go back to reference Cho, B., Gupta, I.: Budget-constrained bulk data transfer via Internet and shipping networks. In: The 8th International Conference on Autonomic Computing (ICAC) (2011) Cho, B., Gupta, I.: Budget-constrained bulk data transfer via Internet and shipping networks. In: The 8th International Conference on Autonomic Computing (ICAC) (2011)
7.
go back to reference Sivakumar, H., Bailey, S., Grossman, R.L.: Psockets: the case for application-level network striping for data intensive applications using high speed wide area networks. In: Proc. of Supercomputing (2000) Sivakumar, H., Bailey, S., Grossman, R.L.: Psockets: the case for application-level network striping for data intensive applications using high speed wide area networks. In: Proc. of Supercomputing (2000)
8.
go back to reference Lee, J., Gunter, D., Tierney, B., Allcock, B., Bester, J., Bresnahan, J., Tuecke, S.: Applied techniques for high bandwidth data transfers across wide area networks. In: Proc. International Conference on Computing in High Energy and Nuclear Physics (CHEP01) (2001) Lee, J., Gunter, D., Tierney, B., Allcock, B., Bester, J., Bresnahan, J., Tuecke, S.: Applied techniques for high bandwidth data transfers across wide area networks. In: Proc. International Conference on Computing in High Energy and Nuclear Physics (CHEP01) (2001)
9.
go back to reference Balakrishman, H., Padmanabhan, V.N., Seshan, S., Stemm, R.H.K.M.: Tcp behavior of a busy Internet server: analysis and improvements. In: Proc. of INFOCOM (1998) Balakrishman, H., Padmanabhan, V.N., Seshan, S., Stemm, R.H.K.M.: Tcp behavior of a busy Internet server: analysis and improvements. In: Proc. of INFOCOM (1998)
10.
go back to reference Hacker, T.J., Noble, B.D., Atley, B.D.: The end-to-end performance effects of parallel tcp sockets on a lossy wide area network. In: Proc. of IPDPS (2002) Hacker, T.J., Noble, B.D., Atley, B.D.: The end-to-end performance effects of parallel tcp sockets on a lossy wide area network. In: Proc. of IPDPS (2002)
11.
go back to reference Eggert, L., Heideman, J., Tough, J.: Effects of ensemble tcp. ACM Comput. Commun. Rev. 30(1), 15–29 (2000) CrossRef Eggert, L., Heideman, J., Tough, J.: Effects of ensemble tcp. ACM Comput. Commun. Rev. 30(1), 15–29 (2000) CrossRef
12.
go back to reference Kola, G., Kosar, T., Livny, M.: Run-time adaptation of grid data-placement jobs. Scalable Comput., Pract. Exp. 6(3), 33–43 (2005) Kola, G., Kosar, T., Livny, M.: Run-time adaptation of grid data-placement jobs. Scalable Comput., Pract. Exp. 6(3), 33–43 (2005)
13.
go back to reference Karrer, R.P., Park, J., Kim, J.: Adaptive data block scheduling for parallel streams. Tech. Report, vol. 17(2) (2006) Karrer, R.P., Park, J., Kim, J.: Adaptive data block scheduling for parallel streams. Tech. Report, vol. 17(2) (2006)
14.
go back to reference Yildirim, E., Suslu, I.H., Kosar, T.: Which network measurement tool is right for you? A multidimensional comparison study. In: Proc. of the 2008 9th IEEE/ACM International Conference on Grid Computing (GRID’08), Sep. 2008 Yildirim, E., Suslu, I.H., Kosar, T.: Which network measurement tool is right for you? A multidimensional comparison study. In: Proc. of the 2008 9th IEEE/ACM International Conference on Grid Computing (GRID’08), Sep. 2008
15.
go back to reference Lu, D., Qiao, Y., Dinda, P.A.: Characterizing and predicting tcp throughput on the wide area network. In: Proc. IEEE International Conference on Distributed Computing Systems (ICDCS05) (2005) Lu, D., Qiao, Y., Dinda, P.A.: Characterizing and predicting tcp throughput on the wide area network. In: Proc. IEEE International Conference on Distributed Computing Systems (ICDCS05) (2005)
16.
go back to reference Yildirim, E., Yin, D., Kosar, T.: Prediction of optimal parallelism level in wide area data transfers. IEEE Trans. Parallel Distrib. Syst. (TPDS) 22(12) (2011) Yildirim, E., Yin, D., Kosar, T.: Prediction of optimal parallelism level in wide area data transfers. IEEE Trans. Parallel Distrib. Syst. (TPDS) 22(12) (2011)
17.
go back to reference Yin, D., Yildirim, E., Kosar, T.: A data throughput prediction and optimization service for widely distributed many-task computing. IEEE Trans. Parallel Distrib. Syst. 22(6) (2011) Yin, D., Yildirim, E., Kosar, T.: A data throughput prediction and optimization service for widely distributed many-task computing. IEEE Trans. Parallel Distrib. Syst. 22(6) (2011)
18.
go back to reference Allcock, W.: Gridftp protocol specification. GGF (2003) Allcock, W.: Gridftp protocol specification. GGF (2003)
19.
go back to reference Yildirim, E., Kosar, T.: Network-aware end-to-end data throughput optimization. In: Proc. of the Network-Aware Data Management Workshop (NDM 2011) (2012) Yildirim, E., Kosar, T.: Network-aware end-to-end data throughput optimization. In: Proc. of the Network-Aware Data Management Workshop (NDM 2011) (2012)
20.
go back to reference Lu, D., Qiao, Y., Dinda, P.A., Bustamante, F.E.: Modeling and taming parallel tcp on the wide area network. In: Proc. of IPDPS (2005) Lu, D., Qiao, Y., Dinda, P.A., Bustamante, F.E.: Modeling and taming parallel tcp on the wide area network. In: Proc. of IPDPS (2005)
21.
go back to reference Altman, E., Barman, D., Tuffin, B., Vojnovic, M.: Parallel tcp sockets: simple model, throughput and validation. In: Proc. IEEE Conference on Computer Communications (INFOCOM06) (2006) Altman, E., Barman, D., Tuffin, B., Vojnovic, M.: Parallel tcp sockets: simple model, throughput and validation. In: Proc. IEEE Conference on Computer Communications (INFOCOM06) (2006)
25.
go back to reference Crowcroft, J., Oechslin, P.: Differentiated end-to-end Internet services using a weighted proportional fair sharing tcp. ACM SIGCOMM Comput. Commun. Rev. 28(3), 53–69 (1998) CrossRef Crowcroft, J., Oechslin, P.: Differentiated end-to-end Internet services using a weighted proportional fair sharing tcp. ACM SIGCOMM Comput. Commun. Rev. 28(3), 53–69 (1998) CrossRef
26.
go back to reference Kola, G., Vernon, M.K.: Target bandwidth sharing using endhost measures. Perform. Eval. 64(9–12), 948–964 (2007) CrossRef Kola, G., Vernon, M.K.: Target bandwidth sharing using endhost measures. Perform. Eval. 64(9–12), 948–964 (2007) CrossRef
27.
go back to reference Yildirim, E., Kim, J., Kosar, T.: Optimizing the sample size for a cloud-hosted data scheduling service. In: Proc. 2nd International Workshop on Cloud Computing and Scientific Applications (CCSA in Conjunction with CCGRID’12) (2012) Yildirim, E., Kim, J., Kosar, T.: Optimizing the sample size for a cloud-hosted data scheduling service. In: Proc. 2nd International Workshop on Cloud Computing and Scientific Applications (CCSA in Conjunction with CCGRID’12) (2012)
28.
go back to reference Mathis, M., Heffner, J., Reddy, R.: Web100: extended tcp instrumentation for research, education and diagnosis. ACM Comput. Commun. Rev. 33(3) (2003) Mathis, M., Heffner, J., Reddy, R.: Web100: extended tcp instrumentation for research, education and diagnosis. ACM Comput. Commun. Rev. 33(3) (2003)
29.
go back to reference Kosar, T., Livny, M.: Stork: making data placement a first class citizen in the grid. In: Proceedings of ICDCS’04, pp. 342–349, March 2004 Kosar, T., Livny, M.: Stork: making data placement a first class citizen in the grid. In: Proceedings of ICDCS’04, pp. 342–349, March 2004
30.
go back to reference Kosar, T., Balman, M.: A new paradigm: data-aware scheduling in grid computing. Future Gener. Comput. Syst. 25(4), 406–413 (2009) CrossRef Kosar, T., Balman, M.: A new paradigm: data-aware scheduling in grid computing. Future Gener. Comput. Syst. 25(4), 406–413 (2009) CrossRef
31.
go back to reference Kosar, T., Balman, M., Yildirim, E., Kulasekaran, S., Ross, B.: Stork data scheduler: mitigating the data bottleneck in e-science. Philos. Trans. R. Soc. Lond. A 369, 3254–3267 (2011) CrossRef Kosar, T., Balman, M., Yildirim, E., Kulasekaran, S., Ross, B.: Stork data scheduler: mitigating the data bottleneck in e-science. Philos. Trans. R. Soc. Lond. A 369, 3254–3267 (2011) CrossRef
Metadata
Title
A highly-accurate and low-overhead prediction model for transfer throughput optimization
Authors
JangYoung Kim
Esma Yildirim
Tevfik Kosar
Publication date
01-03-2015
Publisher
Springer US
Published in
Cluster Computing / Issue 1/2015
Print ISSN: 1386-7857
Electronic ISSN: 1573-7543
DOI
https://doi.org/10.1007/s10586-013-0305-4

Other articles of this Issue 1/2015

Cluster Computing 1/2015 Go to the issue

Premium Partner