Top

Wireless Networks

Published in:

06-07-2019

Distributed machine learning load balancing strategy in cloud computing services

Authors: Mingwei Li, Jilin Zhang, Jian Wan, Yongjian Ren, Li Zhou, Baofu Wu, Rui Yang, Jue Wang

Published in: Wireless Networks | Issue 8/2020

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Mobile service computing is a new cloud computing model that provides various cloud services for mobile intelligent terminal users through mobile internet access. The quality of service is an essential problem faced by mobile service computing. In this paper, we demonstrate a series of research studies on how to accelerate the training of a distributed machine learning (ML) model based on cloud service. Distributed ML has become the mainstream way of today’s ML models training. In traditional distributed ML based on bulk synchronous parallel, the temporary slowdown of any node in the cluster will delay the calculation of other nodes because of the frequent occurrence of synchronous barriers, resulting in overall performance degradation. Our paper proposes a load balancing strategy named adaptive fast reassignment (AdaptFR). Based on this, we built a distributed parallel computing model called adaptive-dynamic synchronous parallel (A-DSP). A-DSP uses a more relaxed synchronization model to reduce the performance consumption caused by synchronous operations while ensuring the consistency of the model. At the same time, A-DSP also implements the AdaptFR load balancing strategy, which addresses the straggler problem caused by the performance difference between nodes under the premise of ensuring the accuracy of the model. The experiments show that A-DSP can effectively improve the training speed while ensuring the accuracy of the model in the distributed ML model training.

previous article Local community detection for multi-layer mobile network based on the trust relation

next article Computation offloading for multimedia workflows with deadline constraints in cloudlet-based mobile cloud

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Gorbenko, A., Kharchenko, V. S., Tarasyuk, O., Chen, Y., & Romanovsky, A. (2008). The threat of uncertainty in service-oriented architecture. In Serene 2008, rise/efts joint international workshop on software engineering for resilient systems (pp. 49–54). Newcastle Upon Tyne.

Qi, H., Iyengar, S., & Chakrabarty, K. (2001). Multiresolution data integration using mobile agents in distributed sensor networks. Piscataway: IEEE Press.CrossRef

Haghighi, V., & Moayedian, N. S. (2018). An offloading strategy in mobile cloud computing considering energy and delay constraints. IEEE Access, PP(99), 1.

Xia, W., & Shen, L. (2018). Joint resource allocation using evolutionary algorithms in heterogeneous mobile cloud computing networks. China Communications, 15(8), 189–204.CrossRef

Gao, H., Miao, H., Liu, L., Kai, J., & Zhao, K. (2018). Automated quantitative verification for service-based system design: A visualization transform tool perspective. In International journal of software engineering and knowledge engineering(IJSEKE) (Vol. 28, No. 10, pp. 1369–1397).

Gao, H., Duan, Y., Miao, H., & Yin, Y. (2017). An approach to data consistency checking for the dynamic replacement of service process. IEEE Access, 5, 11700–11711.CrossRef

Zhang, C., Zhao, H., & Deng, S. (2018). A density-based offloading strategy for IoT devices in edge computing systems. IEEE Access, 6, 73520–73530.CrossRef

Deng, S., Xiang, Z., Yin, J., Taheri, J., & Zomaya, A. Y. (2018). Composition-driven IoT service provisioning in distributed edges. IEEE Access, 6, 54258–54269.CrossRef

McColl, W. F. (1995). Bulk synchronous parallel computing. In Programming languages for parallel processing (pp. 335–357). Washington: IEEE Computer Society Press.

10.

Gerbessiotis, A. V., & Valiant, L. G. (1994). Direct bulk-synchronous parallel algorithms. Journal of parallel and distributed computing, 22(2), 251–267.CrossRef

11.

Smola, A. J., & Narayanamurthy, S. (2010). An architecture for parallel topic models. In: VLDB endowment.

12.

Li, M. (2014). Scaling distributed machine learning with the parameter server. In International conference on big data science and computing (p. 1).

13.

Dean, J., Corrado, G. S., Monga, R., Chen, K., Devin, M., Le, Q. V., et al. (2012) Large scale distributed deep networks. In International conference on neural information processing systems (pp. 1223–1231).

14.

Ahmed, A., Aly, M., Gonzalez, J., Narayanamurthy, S., & Smola, A. J. (2012). Scalable inference in latent variable models. In Web search and data mining (pp. 123–132).

15.

Cui, H., Tumanov, A., Wei, J., Xu, L., Dai, W., Haber-Kucharsky, J., et al. (2014) Exploiting iterative-ness for parallel ML computations. In ACM Symposium on Cloud Computing (pp. 1–14).

16.

Zhang, J., Tu, H., Ren, Y., Wan, J., Zhou, L., Li, M., et al. (2017). A parameter communication optimization strategy for distributed machine learning in sensors. Sensors, 17(10), 2172.CrossRef

17.

Zheng, X., Kim, J. K., Ho, Q., & Xing, E. P. (2014). Model-parallel inference for big topic models. arXiv preprint arXiv:1411.2305.

18.

Recht, B., Re, C., Wright, S., & Niu, F. (2011) Hogwild: A lock-free approach to parallelizing stochastic gradient descent. In Advances in neural information processing systems (pp. 693–701).

19.

Zhao, S. Y., & Li, W. J. (2016) Fast asynchronous parallel stochastic gradient descent: A lock-free approach with convergence guarantee. In Thirtieth AAAI conference on artificial intelligence (pp. 2379–2385).

20.

Zhang, J. L., Yuan, J. F., Jian, W., Jie, M., & Wang, J. (2016). Efficient parallel implementation of incompressible pipe flow algorithm based on SIMPLE. Concurrency and Computation Practice and Experience, 28(6), 1751–1766.CrossRef

21.

Zhang, J., Wan, J., Li, F., Mao, J., Zhuang, L., Yuan, J., et al. (2016). Efficient sparse matrix–vector multiplication using cache oblivious extension quadtree storage format. Future Generation Computer Systems, 54, 490–500.CrossRef

22.

Ho, Q., Cipar, J., Cui, H., Kim, J. K., Lee, S., Gibbons, P. B., et al. (2013). More effective distributed ml via a stale synchronous parallel parameter server. Advances in Neural Information Processing Systems, 2013(2013), 1223.

23.

Terry, D. (2013). Replicated data consistency explained through baseball. Communications of the ACM, 56(12), 82–89.CrossRef

24.

Xing, E. P., Ho, Q., Xie, P., & Wei, D. (2016). Strategies and principles of distributed machine learning on big data. Engineering, 2(2), 179–195.CrossRef

25.

Yu, J., Hong, C., Rui, Y., & Tao, D. (2018). Multitask autoencoder model for recovering human poses. IEEE Transactions on Industrial Electronics, 65(6), 5060–5068.CrossRef

26.

Yin, Y., Chen, L., & Wan, J. (2018). Location-aware service recommendation with enhanced probabilistic matrix factorization. IEEE Access, 6, 62815–62825.CrossRef

27.

Yin, Y., Yu, F., Xu, Y., Yu, L., & Mu, J. (2017). Network location-aware service recommendation with random walk in cyber-physical systems. Sensors, 17(9), 2059.CrossRef

28.

Gao, H., Huang, W., Yang, X., Duan, Y., & Yin, Y. (2018). Towards service selection for workflow reconfiguration: An interface-based computing. Future Generation Computer Systems, 87, 298–311.CrossRef

29.

Gao, H., Zhang, K., Yang, J., Wu, F., & Liu, H. (2018). Applying improved particle swarm optimization for dynamic service composition focusing on quality of service evaluations under hybrid networks. International Journal of Distributed Sensor Networks (IJDSN), 14(2), 1–14.

30.

Gao, H., Chu, D., Duan, Y., & Yin, Y. (2017). The probabilistic model checking based service selection method for business process modeling. International Journal of Software Engineering and Knowledge Engineering, 27(06), 897–923.CrossRef

31.

Gao, H., Mao, S., Huang, W., & Yang, X. (2018). Applying probabilistic model checking to financial production risk evaluation and control: A case study of Alibaba’s Yu’e Bao. IEEE Transactions on Computational Social Systems, 5(3), 785–795.CrossRef

32.

Yu, J., Kuang, Z., Zhang, B., Wei, Z., & Fan, J. (2018). Leveraging content sensitiveness and user trustworthiness to recommend fine-grained privacy settings for social image sharing. IEEE Transactions on Information Forensics and Security, 13(5), 1317–1332.CrossRef

33.

Zhang, J., Geng, J., Jian, W., Zhang, Y., & Xiong, N. N. (2018). An automatically learning and discovering human fishing behaviors scheme for CPSCN. IEEE Access, PP(99), 1.

34.

Gonzalez, J. E., Low, Y., Gu, H., Bickson, D., & Guestrin, C. (2012) PowerGraph: Distributed graph-parallel computation on natural graphs. In Usenix conference on operating systems design and implementation (pp. 17–30).

35.

Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., et al. (2012) Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proceedings of the 9th USENIX conference on networked systems design and implementation (p. 2). USENIX Association.

36.

Xin, R. S., Gonzalez, J. E., Franklin, M. J., & Stoica, I. (2013) Graphx: A resilient distributed graph system on spark. In First international workshop on graph data management experiences and systems (p. 2). ACM.

37.

Chilimbi, T. M., Suzue, Y., Apacible, J., & Kalyanaraman, K. (2014) Project adam: Building an efficient and scalable deep learning training system. In OSDI (Vol. 14, pp. 571–582).

38.

Xing, E., Ho, Q., Dai, W., Kim, J. K., Wei, J., Lee, S., et al. (2015). Petuum: A new platform for distributed machine learning on big data. IEEE Transactions on Big Data, 1(2), 49–67.CrossRef

39.

Wei, J., Dai, W., Qiao, A., Ho, Q., Cui, H., Ganger, G. R., et al. (2015) Managed communication and consistency for fast data-parallel iterative analytics. In Proceedings of the Sixth ACM Symposium on Cloud Computing (pp. 381–394). ACM.

40.

Zhang, J., Tu, H., Ren, Y., Jian, W., & Wang, J. (2018). An adaptive synchronous parallel strategy for distributed machine learning. IEEE Access, 6(99), 19222–19230.CrossRef

41.

Zhang, J., Xiao, J., Wan, J., Yang, J., Ren, Y., Si, H., et al. (2017). A parallel strategy for convolutional neural network based on heterogeneous cluster for mobile information system. Mobile Information Systems, 2017, 3824765. https://doi.org/10.1155/2017/3824765CrossRef

42.

Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., et al. (2014) Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM international conference on Multimedia (pp. 675–678). ACM.

43.

Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., et al. (2014). Going deeper with convolutions. In Computer vision and pattern recognition (pp. 1–9).

44.

Dai, W., Kumar, A., Wei, J., Ho, Q., Gibson, G., & Xing, E. P. (2014). High-performance distributed ML at scale through parameter server consistency models. In National conference on artificial intelligence (pp. 79–87).

45.

Li, M., Zhou, L., Yang, Z., Li, A., Xia, F., Andersen, D. G., et al. (2013) Parameter server for distributed machine learning. In Big learning NIPS workshop (Vol. 6, p. 2).

46.

Cun, Y. L., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., et al. (1990). Handwritten digit recognition with a back-propagation network. Advances in Neural Information Processing Systems, 2(2), 396–404.

47.

Deng, L. (2012). The MNIST database of handwritten digit images for machine learning research [Best of the Web]. IEEE Signal Processing Magazine, 29(6), 141–142.CrossRef

48.

Zhang, J., Sha, C., Wu, Y., Jian, W., Li, Z., Ren, Y., et al. (2016). The novel implicit LU-SGS parallel iterative method based on the diffusion equation of nuclear reactor on GPU cluster. Computer Physics Communications, 211, S0010465516301965.

Title: Distributed machine learning load balancing strategy in cloud computing services
Authors: Mingwei Li
Jilin Zhang
Jian Wan
Yongjian Ren
Li Zhou
Baofu Wu
Rui Yang
Jue Wang
Publication date: 06-07-2019
Publisher: Springer US
Published in: Wireless Networks / Issue 8/2020
Print ISSN: 1022-0038
Electronic ISSN: 1572-8196
DOI: https://doi.org/10.1007/s11276-019-02042-2

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Other articles of this Issue 8/2020

QoS aware cross layer paradigm for urban development applications in IoT

Construction of an indoor radio environment map using gradient boosting decision tree

A novel approach of dynamic base station switching strategy based on Markov decision process for interference alignment in VANETs

Gradient-based adaptive modeling for IoT data transmission reduction

Implementation and parametric analysis of single and dual band planar filtering antennas for WLAN applications

Evolutionary intelligence in wireless sensor network: routing, clustering, localization and coverage