Skip to main content

2020 | OriginalPaper | Buchkapitel

Trade-Offs in Large-Scale Distributed Tuplewise Estimation And Learning

verfasst von : Robin Vogel, Aurélien Bellet, Stephan Clémençon, Ons Jelassi, Guillaume Papa

Erschienen in: Machine Learning and Knowledge Discovery in Databases

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The development of cluster computing frameworks has allowed practitioners to scale out various statistical estimation and machine learning algorithms with minimal programming effort. This is especially true for machine learning problems whose objective function is nicely separable across individual data points, such as classification and regression. In contrast, statistical learning tasks involving pairs (or more generally tuples) of data points—such as metric learning, clustering or ranking—do not lend themselves as easily to data-parallelism and in-memory computing. In this paper, we investigate how to balance between statistical performance and computational efficiency in such distributed tuplewise statistical problems. We first propose a simple strategy based on occasionally repartitioning data across workers between parallel computation stages, where the number of repartitioning steps rules the trade-off between accuracy and runtime. We then present some theoretical results highlighting the benefits brought by the proposed method in terms of variance reduction, and extend our results to design distributed stochastic gradient descent algorithms for tuplewise empirical risk minimization. Our results are supported by numerical experiments in pairwise statistical estimation and learning on synthetic and real-world datasets.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
1
Alternatively, for scalability purposes, one may instead work with their incomplete counterparts, namely (7) and (9) respectively.
 
2
When H is nonsmooth in \(\theta \), a subgradient may be used instead of the gradient.
 
Literatur
1.
Zurück zum Zitat Arjevani, Y., Shamir, O.: Communication complexity of distributed convex learning and optimization. In: NIPS (2015) Arjevani, Y., Shamir, O.: Communication complexity of distributed convex learning and optimization. In: NIPS (2015)
2.
Zurück zum Zitat Balcan, M.F., Blum, A., Fine, S., Mansour, Y.: Distributed learning, communication complexity and privacy. In: COLT (2012) Balcan, M.F., Blum, A., Fine, S., Mansour, Y.: Distributed learning, communication complexity and privacy. In: COLT (2012)
3.
Zurück zum Zitat Bekkerman, R., Bilenko, M., Langford, J.: Scaling Up Machine Learning: Parallel and Distributed Approaches. Cambridge University Press, Cambridge (2011)CrossRef Bekkerman, R., Bilenko, M., Langford, J.: Scaling Up Machine Learning: Parallel and Distributed Approaches. Cambridge University Press, Cambridge (2011)CrossRef
4.
Zurück zum Zitat Bellet, A., Liang, Y., Garakani, A.B., Balcan, M.F., Sha, F.: A distributed frank-wolfe algorithm for communication-efficient sparse learning. In: SDM (2015) Bellet, A., Liang, Y., Garakani, A.B., Balcan, M.F., Sha, F.: A distributed frank-wolfe algorithm for communication-efficient sparse learning. In: SDM (2015)
5.
7.
Zurück zum Zitat Bottou, L., Bousquet, O.: The Tradeoffs of large scale learning. In: NIPS (2007) Bottou, L., Bousquet, O.: The Tradeoffs of large scale learning. In: NIPS (2007)
8.
Zurück zum Zitat Boyd, S.P., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)MATHCrossRef Boyd, S.P., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)MATHCrossRef
9.
Zurück zum Zitat Bubeck, S.: Convex optimization: algorithms and complexity. Found. Trends Mach. Learn. 8(3–4), 231–357 (2015)MATHCrossRef Bubeck, S.: Convex optimization: algorithms and complexity. Found. Trends Mach. Learn. 8(3–4), 231–357 (2015)MATHCrossRef
10.
Zurück zum Zitat Carbone, P., Katsifodimos, A., Ewen, S., Markl, V., Haridi, S., Tzoumas, K.: Apache Flink™: stream and batch processing in a single engine. IEEE Data Eng. Bull. 38(4), 28–38 (2015) Carbone, P., Katsifodimos, A., Ewen, S., Markl, V., Haridi, S., Tzoumas, K.: Apache Flink™: stream and batch processing in a single engine. IEEE Data Eng. Bull. 38(4), 28–38 (2015)
11.
Zurück zum Zitat Clémençon, S.: A statistical view of clustering performance through the theory of U-processes. J. Multivar. Anal. 124, 42–56 (2014)MathSciNetMATHCrossRef Clémençon, S.: A statistical view of clustering performance through the theory of U-processes. J. Multivar. Anal. 124, 42–56 (2014)MathSciNetMATHCrossRef
12.
Zurück zum Zitat Clémençon, S., Bellet, A., Colin, I.: Scaling-up empirical risk minimization: optimization of incomplete U-statistics. J. Mach. Learn. Res. 13, 165–202 (2016)MathSciNetMATH Clémençon, S., Bellet, A., Colin, I.: Scaling-up empirical risk minimization: optimization of incomplete U-statistics. J. Mach. Learn. Res. 13, 165–202 (2016)MathSciNetMATH
13.
Zurück zum Zitat Clémençon, S., Lugosi, G., Vayatis, N.: Ranking and empirical risk minimization of \({U}\)-statistics. Ann. Stat. 36(2), 844–874 (2008)MATHCrossRef Clémençon, S., Lugosi, G., Vayatis, N.: Ranking and empirical risk minimization of \({U}\)-statistics. Ann. Stat. 36(2), 844–874 (2008)MATHCrossRef
14.
Zurück zum Zitat Clémençon, S., Robbiano, S.: Building confidence regions for the ROC surface. Pattern Recogn. Lett. 46, 67–74 (2014)CrossRef Clémençon, S., Robbiano, S.: Building confidence regions for the ROC surface. Pattern Recogn. Lett. 46, 67–74 (2014)CrossRef
15.
Zurück zum Zitat Daumé III, H., Phillips, J.M., Saha, A., Venkatasubramanian, S.: Protocols for learning classifiers on distributed data. In: AISTATS (2012) Daumé III, H., Phillips, J.M., Saha, A., Venkatasubramanian, S.: Protocols for learning classifiers on distributed data. In: AISTATS (2012)
16.
Zurück zum Zitat Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRef Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRef
17.
19.
Zurück zum Zitat Lee, A.: \({U}\)-statistics: Theory and Practice. Marcel Dekker Inc., New York (1990)MATH Lee, A.: \({U}\)-statistics: Theory and Practice. Marcel Dekker Inc., New York (1990)MATH
20.
Zurück zum Zitat Papa, G., Bellet, A., Clémençon, S.: SGD algorithms based on incomplete U-statistics: large-scale minimization of empirical risk. In: NIPS (2015) Papa, G., Bellet, A., Clémençon, S.: SGD algorithms based on incomplete U-statistics: large-scale minimization of empirical risk. In: NIPS (2015)
22.
Zurück zum Zitat Smith, V., Forte, S., Ma, C., Takác, M., Jordan, M.I., Jaggi, M.: CoCoA: a general framework for communication-efficient distributed optimization. J. Mach. Learn. Res. 18(230), 1–49 (2018)MathSciNetMATH Smith, V., Forte, S., Ma, C., Takác, M., Jordan, M.I., Jaggi, M.: CoCoA: a general framework for communication-efficient distributed optimization. J. Mach. Learn. Res. 18(230), 1–49 (2018)MathSciNetMATH
23.
Zurück zum Zitat Van Der Vaart, A.: Asymptotic Statistics. Cambridge University Press, Cambridge (2000) Van Der Vaart, A.: Asymptotic Statistics. Cambridge University Press, Cambridge (2000)
24.
Zurück zum Zitat Vogel, R., Bellet, A., Clémençon, S.: A probabilistic theory of supervised similarity learning for pointwise ROC curve optimization. In: ICML (2018) Vogel, R., Bellet, A., Clémençon, S.: A probabilistic theory of supervised similarity learning for pointwise ROC curve optimization. In: ICML (2018)
25.
Zurück zum Zitat Xing, E.P., et al.: Petuum: a new platform for distributed machine learning on big data. IEEE Trans. Big Data 1(2), 49–67 (2015)CrossRef Xing, E.P., et al.: Petuum: a new platform for distributed machine learning on big data. IEEE Trans. Big Data 1(2), 49–67 (2015)CrossRef
26.
Zurück zum Zitat Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: HotCloud (2012) Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: HotCloud (2012)
Metadaten
Titel
Trade-Offs in Large-Scale Distributed Tuplewise Estimation And Learning
verfasst von
Robin Vogel
Aurélien Bellet
Stephan Clémençon
Ons Jelassi
Guillaume Papa
Copyright-Jahr
2020
DOI
https://doi.org/10.1007/978-3-030-46147-8_14