Skip to main content
Erschienen in: Mobile Networks and Applications 2/2017

03.06.2016

Automatically Setting Parameter-Exchanging Interval for Deep Learning

verfasst von: Siyuan Wang, Xiaofei Liao, Xuepeng Fan, Hai Jin, Qiongjie Yao, Yu Zhang

Erschienen in: Mobile Networks and Applications | Ausgabe 2/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Parameter-server frameworks play an important role in scaling-up distributed deep learning algorithms. However, the constant growth of neural network size has led to a serious bottleneck on exchanging parameters across machines. Recent efforts rely on manually setting a parameter-exchanging interval to reduce communication overhead, regardless of the parameter-server’s resource availability as well. It may face poor performance or inaccurate results for inappropriate interval. Meanwhile, request burst may occur, exacerbating the bottleneck.
In this paper, we propose an approach to automatically set the optimal exchanging interval, aiming to remove the parameter-exchanging bottleneck and to evenly utilize resources without losing training accuracy. The key idea is to increase the interval on different training nodes on the basis of the knowledge of available resources and choose different intervals for each slave node to avoid request bursts. We adopted this method to optimize the parallel Stochastic Gradient Descent algorithm, through which we successfully sped up parameter-exchanging process by eight times.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Weitere Produktempfehlungen anzeigen
Literatur
3.
Zurück zum Zitat Ahmed A, Aly M, Gonzalez J, Narayanamurthy S, Smola AJ (2012) Scalable inference in latent variable models. In: Proceedings of the 5th ACM international conference on web search and data mining, pp. 123–132 Ahmed A, Aly M, Gonzalez J, Narayanamurthy S, Smola AJ (2012) Scalable inference in latent variable models. In: Proceedings of the 5th ACM international conference on web search and data mining, pp. 123–132
4.
Zurück zum Zitat Bengio Y, Lamblin P, Popovici D, Larochelle H (2007) Greedy layer-wise training of deep networks. In: Proceedings of the 19th advances in neural information processing systems, pp. 153–160 Bengio Y, Lamblin P, Popovici D, Larochelle H (2007) Greedy layer-wise training of deep networks. In: Proceedings of the 19th advances in neural information processing systems, pp. 153–160
5.
Zurück zum Zitat Bottou L (2010) Large-scale machine learning with stochastic gradient descent. In: Proceedings of the 19th international conference on computational statistics, pp. 177–186 Bottou L (2010) Large-scale machine learning with stochastic gradient descent. In: Proceedings of the 19th international conference on computational statistics, pp. 177–186
6.
Zurück zum Zitat Chilimbi T, Suzue Y, Apacible J, Kalyanaraman K (2014) Project adam: Building an efficient and scalable deep learning training system. In: Proceedings of the 11th USENIX symposium on operating systems design and implementation, pp. 571–582 Chilimbi T, Suzue Y, Apacible J, Kalyanaraman K (2014) Project adam: Building an efficient and scalable deep learning training system. In: Proceedings of the 11th USENIX symposium on operating systems design and implementation, pp. 571–582
7.
Zurück zum Zitat Dean J, Corrado G, Monga R, Chen K, Devin M, Mao M, Senior A, Tucker P, Yang K, Le QV et al (2012) Large scale distributed deep networks. In: Proceedings of the 25th advances in neural information processing systems, pp. 1223–1231 Dean J, Corrado G, Monga R, Chen K, Devin M, Mao M, Senior A, Tucker P, Yang K, Le QV et al (2012) Large scale distributed deep networks. In: Proceedings of the 25th advances in neural information processing systems, pp. 1223–1231
8.
Zurück zum Zitat Gemulla R, Nijkamp E, Haas PJ, Sismanis Y (2011) Large-scale matrix factorization with distributed stochastic gradient descent. In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 69– 77 Gemulla R, Nijkamp E, Haas PJ, Sismanis Y (2011) Large-scale matrix factorization with distributed stochastic gradient descent. In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 69– 77
9.
Zurück zum Zitat Hall KB, Gilpin S, Mann G (2010) Mapreduce/bigtable for distributed optimization. In: Proceedings of the NIPS 2010 workshop on learning on cores, clusters and clouds, pp. 1–7 Hall KB, Gilpin S, Mann G (2010) Mapreduce/bigtable for distributed optimization. In: Proceedings of the NIPS 2010 workshop on learning on cores, clusters and clouds, pp. 1–7
10.
Zurück zum Zitat Hinton GE, Zemel RS (1994) Autoencoders, minimum description length, and helmholtz free energy. In: Proceedings of the 6th advances in neural information processing systems, pp. 3–10 Hinton GE, Zemel RS (1994) Autoencoders, minimum description length, and helmholtz free energy. In: Proceedings of the 6th advances in neural information processing systems, pp. 3–10
11.
Zurück zum Zitat Le Q, Ranzato M, Monga R, Devin M, Chen K, Corrado G, Dean J, Ng A (2012) Building high-level features using large scale unsupervised learning. In: Proceedings of the 29th international conference on machine learning, pp. 81–88 Le Q, Ranzato M, Monga R, Devin M, Chen K, Corrado G, Dean J, Ng A (2012) Building high-level features using large scale unsupervised learning. In: Proceedings of the 29th international conference on machine learning, pp. 81–88
12.
Zurück zum Zitat LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324CrossRef LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324CrossRef
13.
Zurück zum Zitat Li M, Andersen DG, Park JW, Smola AJ, Ahmed A, Josifovski V, Long J, Shekita EJ, Su BY (2014) Scaling distributed machine learning with the parameter server. In: Proceedings of the 11th USENIX conference on operating systems design and implementation, pp. 583–598 Li M, Andersen DG, Park JW, Smola AJ, Ahmed A, Josifovski V, Long J, Shekita EJ, Su BY (2014) Scaling distributed machine learning with the parameter server. In: Proceedings of the 11th USENIX conference on operating systems design and implementation, pp. 583–598
14.
Zurück zum Zitat McDonald R, Hall K, Mann G (2010) Distributed training strategies for the structured perceptron. In: Proceedings of the 2010 annual conference of the north american chapter of the association for computational linguistics, pp. 456–464 McDonald R, Hall K, Mann G (2010) Distributed training strategies for the structured perceptron. In: Proceedings of the 2010 annual conference of the north american chapter of the association for computational linguistics, pp. 456–464
15.
Zurück zum Zitat Mcdonald R, Mohri M, Silberman N, Walker D, Mann GS (2009) Efficient large-scale distributed training of conditional maximum entropy models. In: Proceedings of the 22th advances in neural information processing systems, pp. 1231–1239 Mcdonald R, Mohri M, Silberman N, Walker D, Mann GS (2009) Efficient large-scale distributed training of conditional maximum entropy models. In: Proceedings of the 22th advances in neural information processing systems, pp. 1231–1239
16.
Zurück zum Zitat Recht B, Re C, Wright S, Niu F (2011) Hogwild: A lock-free approach to parallelizing stochastic gradient descent. In: Proceedings of the 24th advances in neural information processing systems, pp. 693–701 Recht B, Re C, Wright S, Niu F (2011) Hogwild: A lock-free approach to parallelizing stochastic gradient descent. In: Proceedings of the 24th advances in neural information processing systems, pp. 693–701
17.
Zurück zum Zitat Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323:533–536CrossRef Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323:533–536CrossRef
18.
Zurück zum Zitat Saaty TL (1961) Elements of queueing theory McGraw-Hill Saaty TL (1961) Elements of queueing theory McGraw-Hill
19.
Zurück zum Zitat Schiller J (2003) Mobile Communication Addison Wesley Schiller J (2003) Mobile Communication Addison Wesley
20.
Zurück zum Zitat Smola A, Narayanamurthy S (2010) An architecture for parallel topic models. Proceedings of the VLDB Endowment 3(1-2):703–710CrossRef Smola A, Narayanamurthy S (2010) An architecture for parallel topic models. Proceedings of the VLDB Endowment 3(1-2):703–710CrossRef
21.
Zurück zum Zitat Wang P, Xu B, Wu Y, Zhou X (2015) Link prediction in social networks: the state-of-the-art. SCIENCE CHINA Inf. Sci. 58(1):1–38 Wang P, Xu B, Wu Y, Zhou X (2015) Link prediction in social networks: the state-of-the-art. SCIENCE CHINA Inf. Sci. 58(1):1–38
22.
Zurück zum Zitat Zhuang Y, Chin WS, Juan YC, Lin CJ (2013) A fast parallel sgd for matrix factorization in shared memory systems. In: Proceedings of the 7th ACM conference on recommender systems, pp. 249–256 Zhuang Y, Chin WS, Juan YC, Lin CJ (2013) A fast parallel sgd for matrix factorization in shared memory systems. In: Proceedings of the 7th ACM conference on recommender systems, pp. 249–256
23.
Zurück zum Zitat Zinkevich M, Weimer M, Li L, Smola AJ (2010) Parallelized stochastic gradient descent Proceedings of the 23th advances in neural information processing systems, pp. 2595–2603 Zinkevich M, Weimer M, Li L, Smola AJ (2010) Parallelized stochastic gradient descent Proceedings of the 23th advances in neural information processing systems, pp. 2595–2603
Metadaten
Titel
Automatically Setting Parameter-Exchanging Interval for Deep Learning
verfasst von
Siyuan Wang
Xiaofei Liao
Xuepeng Fan
Hai Jin
Qiongjie Yao
Yu Zhang
Publikationsdatum
03.06.2016
Verlag
Springer US
Erschienen in
Mobile Networks and Applications / Ausgabe 2/2017
Print ISSN: 1383-469X
Elektronische ISSN: 1572-8153
DOI
https://doi.org/10.1007/s11036-016-0740-6

Weitere Artikel der Ausgabe 2/2017

Mobile Networks and Applications 2/2017 Zur Ausgabe

Neuer Inhalt