Skip to main content
Erschienen in: Theory of Computing Systems 1/2016

01.01.2016

On Two Continuum Armed Bandit Problems in High Dimensions

verfasst von: Hemant Tyagi, Sebastian U. Stich, Bernd Gärtner

Erschienen in: Theory of Computing Systems | Ausgabe 1/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

We consider the problem of continuum armed bandits where the arms are indexed by a compact subset of \(\mathbb {R}^{d}\). For large d, it is well known that mere smoothness assumptions on the reward functions lead to regret bounds that suffer from the curse of dimensionality. A typical way to tackle this in the literature has been to make further assumptions on the structure of reward functions. In this work we assume the reward functions to be intrinsically of low dimension kd and consider two models: (i) The reward functions depend on only an unknown subset of k coordinate variables and, (ii) a generalization of (i) where the reward functions depend on an unknown k dimensional subspace of \(\mathbb {R}^{d}\). By placing suitable assumptions on the smoothness of the rewards we derive randomized algorithms for both problems that achieve nearly optimal regret bounds in terms of the number of rounds n.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
1
Rewards sampled at each round in an i.i.d manner from an unknown probability distribution.
 
2
See Remark 1 in Section 3.1 for discussion on how the logn factor can be removed.
 
3
See Definition 2 in Section 2.1.
 
4
Indeed, any function that depends on k k coordinates also depends on at most k coordinates.
 
5
Indeed for a compact domain, any C 2 function is Lipschitz continuous but the converse is not necessarily true. Therefore, the mean reward functions that we consider, belong to a slightly restricted class of Lipschitz continuous functions.
 
6
This theorem is stated again in Section 4 for completeness.
 
7
This is actually true for k≥3. For k=1,2 the \(\left (n/\log n\right )^{\frac {4}{k+2}}\) factor dominates. See Remark 3 in Section 4.3 for details.
 
8
The interested reader can find a full analysis for the case k=1 in [42].
 
9
The above sampling scheme was first considered by Fornasier et al. [22], and later by Tyagi et al. [40], for the problem of approximating functions of the form f(x)=g(A x) from point queries.
 
10
Of course in practice we will not be able to solve (32) exactly, but will instead obtain a solution that can be made to come arbitrarily close to the actual solution. This difference will hence appear as an additional error term in the error bound of Lemma 4.
 
11
In the absence of external stochastic noise (i.e. σ=0) we can actually take 𝜖 to be arbitrarily small as shown by Tyagi et al. [41, Lemma 2]. This is also verified from (33), by plugging σ=0.
 
Literatur
1.
Zurück zum Zitat Abbasi-yadkori, Y., Pal, D., Szepesvari, C.: Online-to-confidence-set conversions and application to sparse stochastic bandits. In: Proceedings of AIStats (2012) Abbasi-yadkori, Y., Pal, D., Szepesvari, C.: Online-to-confidence-set conversions and application to sparse stochastic bandits. In: Proceedings of AIStats (2012)
2.
Zurück zum Zitat Abernethy, J., Hazan, E., Rakhlin, A.: Competing in the dark: An efficient algorithm for bandit linear optimization. In: Proceedings of the 21st Annual Conference on Learning Theory (COLT) (2008) Abernethy, J., Hazan, E., Rakhlin, A.: Competing in the dark: An efficient algorithm for bandit linear optimization. In: Proceedings of the 21st Annual Conference on Learning Theory (COLT) (2008)
4.
Zurück zum Zitat Audibert, J.Y., Bubeck, S.: Regret bounds and minimax policies under partial monitoring. J. Mach. Learn. Res. 11, 2635–2686 (2010)MathSciNet Audibert, J.Y., Bubeck, S.: Regret bounds and minimax policies under partial monitoring. J. Mach. Learn. Res. 11, 2635–2686 (2010)MathSciNet
5.
Zurück zum Zitat Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 47 (2-3), 235–256 (2002)MATHCrossRef Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 47 (2-3), 235–256 (2002)MATHCrossRef
6.
Zurück zum Zitat Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.: Gambling in a rigged casino: The adversarial multi-armed bandit problem. In: Proceedings of 36th Annual Symposium on Foundations of Computer Science, 1995, pp. 322–331 (1995) Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.: Gambling in a rigged casino: The adversarial multi-armed bandit problem. In: Proceedings of 36th Annual Symposium on Foundations of Computer Science, 1995, pp. 322–331 (1995)
7.
Zurück zum Zitat Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.: The nonstochastic multiarmed bandit problem. SIAM J. Comput. 32 (1), 48–77 (2003)MathSciNetCrossRef Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.: The nonstochastic multiarmed bandit problem. SIAM J. Comput. 32 (1), 48–77 (2003)MathSciNetCrossRef
8.
Zurück zum Zitat Auer, P., Ortner, R., Szepesvari, C.: Improved rates for the stochastic continuum-armed bandit problem. In: Proceedings of 20th Conference on Learning Theory (COLT), pp. 454–468 (2007) Auer, P., Ortner, R., Szepesvari, C.: Improved rates for the stochastic continuum-armed bandit problem. In: Proceedings of 20th Conference on Learning Theory (COLT), pp. 454–468 (2007)
9.
Zurück zum Zitat Bansal, N., Blum, A., Chawla, S., Meyerson, A.: Online oblivious routing. In: Proceedings of ACM Symposium in Parallelism in Algorithms and Architectures, pp. 44–49 (2003) Bansal, N., Blum, A., Chawla, S., Meyerson, A.: Online oblivious routing. In: Proceedings of ACM Symposium in Parallelism in Algorithms and Architectures, pp. 44–49 (2003)
10.
Zurück zum Zitat Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 15, 1373–1396 (2003)MATHCrossRef Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 15, 1373–1396 (2003)MATHCrossRef
11.
Zurück zum Zitat Blum, A., Kumar, V., Rudra, A., Wu, F.: Online learning in online auctions. In: Proceedings of 14th Symp. on Discrete Alg., pp. 202–204 (2003) Blum, A., Kumar, V., Rudra, A., Wu, F.: Online learning in online auctions. In: Proceedings of 14th Symp. on Discrete Alg., pp. 202–204 (2003)
12.
Zurück zum Zitat Bubeck, S., Munos, R., Stoltz, G., Szepesvari, C.: X-armed bandits. J. Mach. Learn. Res. (JMLR) 12, 1587–1627 (2011)MathSciNet Bubeck, S., Munos, R., Stoltz, G., Szepesvari, C.: X-armed bandits. J. Mach. Learn. Res. (JMLR) 12, 1587–1627 (2011)MathSciNet
13.
Zurück zum Zitat Bubeck, S., Stoltz, G., Yu, J.: Lipschitz bandits without the Lipschitz constant. In: Proceedings of the 22nd International Conference on Algorithmic Learning Theory (ALT), pp. 144–158 (2011) Bubeck, S., Stoltz, G., Yu, J.: Lipschitz bandits without the Lipschitz constant. In: Proceedings of the 22nd International Conference on Algorithmic Learning Theory (ALT), pp. 144–158 (2011)
14.
Zurück zum Zitat Candès, E., Plan, Y.: Tight oracle bounds for low-rank matrix recovery from a minimal number of random measurements. CoRR abs/1001.0339 (2010) Candès, E., Plan, Y.: Tight oracle bounds for low-rank matrix recovery from a minimal number of random measurements. CoRR abs/1001.0339 (2010)
15.
Zurück zum Zitat Carpentier, A., Munos, R.: Bandit theory meets compressed sensing for high dimensional stochastic linear bandit. In: Proceedings of AIStats, pp. 190–198 (2012) Carpentier, A., Munos, R.: Bandit theory meets compressed sensing for high dimensional stochastic linear bandit. In: Proceedings of AIStats, pp. 190–198 (2012)
16.
Zurück zum Zitat Chen, B., Castro, R., Krause, A.: Joint optimization and variable selection of high-dimensional gaussian processes. In: Proceedings International Conference on Machine Learning (ICML) (2012) Chen, B., Castro, R., Krause, A.: Joint optimization and variable selection of high-dimensional gaussian processes. In: Proceedings International Conference on Machine Learning (ICML) (2012)
18.
Zurück zum Zitat Cope, E.: Regret and convergence bounds for a class of continuum-armed bandit problems. IEEE Trans. Autom. Control 54, 1243–1253 (2009)MathSciNetCrossRef Cope, E.: Regret and convergence bounds for a class of continuum-armed bandit problems. IEEE Trans. Autom. Control 54, 1243–1253 (2009)MathSciNetCrossRef
19.
Zurück zum Zitat DeVore, R., Petrova, G., Wojtaszczyk, P.: Approximation of functions of few variables in high dimensions. Constr. Approx 33, 125–143 (2011)MATHMathSciNetCrossRef DeVore, R., Petrova, G., Wojtaszczyk, P.: Approximation of functions of few variables in high dimensions. Constr. Approx 33, 125–143 (2011)MATHMathSciNetCrossRef
20.
Zurück zum Zitat Djolonga, J., Krause, A., Cevher, V.: High dimensional gaussian process bandits. In: To Appear in Neural Information Processing Systems (NIPS) (2013) Djolonga, J., Krause, A., Cevher, V.: High dimensional gaussian process bandits. In: To Appear in Neural Information Processing Systems (NIPS) (2013)
21.
Zurück zum Zitat Flaxman, A., Kalai, A., McMahan, H.: Online convex optimization in the bandit setting: gradient descent without a gradient. In: Proceedings of the Sixteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 385–394 (2005) Flaxman, A., Kalai, A., McMahan, H.: Online convex optimization in the bandit setting: gradient descent without a gradient. In: Proceedings of the Sixteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 385–394 (2005)
22.
Zurück zum Zitat Fornasier, M., Schnass, K., Vybiral, J.: Learning functions of few arbitrary linear parameters in high dimensions. Found. Comput. Math. 12 (2), 229–262 (2012)MATHMathSciNetCrossRef Fornasier, M., Schnass, K., Vybiral, J.: Learning functions of few arbitrary linear parameters in high dimensions. Found. Comput. Math. 12 (2), 229–262 (2012)MATHMathSciNetCrossRef
23.
Zurück zum Zitat Fredman, M., Komlós, J.: On the size of separating systems and families of perfect hash functions. SIAM. J. Algebr. Discret. Methods 5, 61–68 (1984)MATHCrossRef Fredman, M., Komlós, J.: On the size of separating systems and families of perfect hash functions. SIAM. J. Algebr. Discret. Methods 5, 61–68 (1984)MATHCrossRef
24.
Zurück zum Zitat Fredman, M., Komlós, J., Szemerédi, E.: Storing a sparse table with 0(1) worst case access time. J. ACM 31 (3), 538–544 (1984)MATHCrossRef Fredman, M., Komlós, J., Szemerédi, E.: Storing a sparse table with 0(1) worst case access time. J. ACM 31 (3), 538–544 (1984)MATHCrossRef
25.
Zurück zum Zitat Greenshtein, E.: Best subset selection, persistence in high dimensional statistical learning and optimization under ℓ 1 constraint. Ann. Stat. 34, 2367–2386 (2006)MATHMathSciNetCrossRef Greenshtein, E.: Best subset selection, persistence in high dimensional statistical learning and optimization under 1 constraint. Ann. Stat. 34, 2367–2386 (2006)MATHMathSciNetCrossRef
26.
Zurück zum Zitat Kleinberg, R.: Nearly tight bounds for the continuum-armed bandit problem. In: 18th Advances in Neural Information Processing Systems (2004) Kleinberg, R.: Nearly tight bounds for the continuum-armed bandit problem. In: 18th Advances in Neural Information Processing Systems (2004)
27.
Zurück zum Zitat Kleinberg, R.: Online decision problems with large strategy sets. Ph.D. thesis. MIT, Boston (2005) Kleinberg, R.: Online decision problems with large strategy sets. Ph.D. thesis. MIT, Boston (2005)
28.
Zurück zum Zitat Kleinberg, R., Leighton, T.: The value of knowing a demand curve: Bounds on regret for online posted-price auctions. In: Proceedings of Foundations of Computer Science, 2003., pp. 594–605 (2003) Kleinberg, R., Leighton, T.: The value of knowing a demand curve: Bounds on regret for online posted-price auctions. In: Proceedings of Foundations of Computer Science, 2003., pp. 594–605 (2003)
29.
Zurück zum Zitat Kleinberg, R., Slivkins, A., Upfal, E.: Multi-armed bandits in metric spaces. In: Proceedings of the 40th Annual ACM Symposium on Theory of Computing, STOC ’08, pp. 681–690 (2008) Kleinberg, R., Slivkins, A., Upfal, E.: Multi-armed bandits in metric spaces. In: Proceedings of the 40th Annual ACM Symposium on Theory of Computing, STOC ’08, pp. 681–690 (2008)
30.
Zurück zum Zitat Körner, J.: Fredmankomlós bounds and information theory. SIAM J. Algebraic Discret. Methods 7 (4), 560–570 (1986)MATHCrossRef Körner, J.: Fredmankomlós bounds and information theory. SIAM J. Algebraic Discret. Methods 7 (4), 560–570 (1986)MATHCrossRef
31.
Zurück zum Zitat Laurent, B., Massart, P.: Adaptive estimation of a quadratic functional by model selection. Ann. Stat. 28 (5), 1302–1338 (2000)MATHMathSciNetCrossRef Laurent, B., Massart, P.: Adaptive estimation of a quadratic functional by model selection. Ann. Stat. 28 (5), 1302–1338 (2000)MATHMathSciNetCrossRef
32.
Zurück zum Zitat Li, Q., Racine, J.: Nonparametric econometrics: Theory and practice (2007) Li, Q., Racine, J.: Nonparametric econometrics: Theory and practice (2007)
33.
Zurück zum Zitat McMahan, B., Blum, A.: Online geometric optimization in the bandit setting against an adaptive adversary. In: Proceedings of the 17th Annual Conference on Learning Theory (COLT), pp. 109–123 (2004) McMahan, B., Blum, A.: Online geometric optimization in the bandit setting against an adaptive adversary. In: Proceedings of the 17th Annual Conference on Learning Theory (COLT), pp. 109–123 (2004)
34.
Zurück zum Zitat Mossel, E., O’Donnell, R., Servedio, R.: Learning juntas. In: Proceedings of the Thirty-Fifth Annual ACM Symposium on Theory of Computing, STOC, pp. 206–212. ACM (2003) Mossel, E., O’Donnell, R., Servedio, R.: Learning juntas. In: Proceedings of the Thirty-Fifth Annual ACM Symposium on Theory of Computing, STOC, pp. 206–212. ACM (2003)
35.
Zurück zum Zitat Naor, M., Schulman, L., Srinivasan, A.: Splitters and near-optimal derandomization. In: Proceedings of the 36th Annual Symposium on Foundations of Computer Science, pp. 182–191 (1995) Naor, M., Schulman, L., Srinivasan, A.: Splitters and near-optimal derandomization. In: Proceedings of the 36th Annual Symposium on Foundations of Computer Science, pp. 182–191 (1995)
37.
Zurück zum Zitat Orlitsky, A.: Worst-case interactive communication i: Two messages are almost optimal. IEEE Trans. Inf. Theory 36, 1111–1126 (1990)MATHMathSciNetCrossRef Orlitsky, A.: Worst-case interactive communication i: Two messages are almost optimal. IEEE Trans. Inf. Theory 36, 1111–1126 (1990)MATHMathSciNetCrossRef
38.
Zurück zum Zitat Recht, B., Fazel, M., Parrilo, P.: Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Rev. 52, 471–501 (2010)MATHMathSciNetCrossRef Recht, B., Fazel, M., Parrilo, P.: Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Rev. 52, 471–501 (2010)MATHMathSciNetCrossRef
40.
Zurück zum Zitat Tyagi, H., Cevher, V.: Active learning of multi-index function models. In: Advances in Neural Information Processing Systems, vol. 25, pp. 1475–1483 (2012) Tyagi, H., Cevher, V.: Active learning of multi-index function models. In: Advances in Neural Information Processing Systems, vol. 25, pp. 1475–1483 (2012)
41.
Zurück zum Zitat Tyagi, H., Cevher, V.: Learning non-parametric basis independent models from point queries via low-rank methods. Appl. Comput. Harmonic Anal. (2014) Tyagi, H., Cevher, V.: Learning non-parametric basis independent models from point queries via low-rank methods. Appl. Comput. Harmonic Anal. (2014)
42.
Zurück zum Zitat Tyagi, H., Gärtner, B.: Continuum armed bandit problem of few variables in high dimensions. CoRR abs/1304.5793 (2013) Tyagi, H., Gärtner, B.: Continuum armed bandit problem of few variables in high dimensions. CoRR abs/1304.5793 (2013)
43.
Zurück zum Zitat Wang, Z., Zoghi, M., Hutter, F., Matheson, D., de Freitas, N.: Bayesian optimization in high dimensions via random embeddings. In: Proc. IJCAI (2013) Wang, Z., Zoghi, M., Hutter, F., Matheson, D., de Freitas, N.: Bayesian optimization in high dimensions via random embeddings. In: Proc. IJCAI (2013)
45.
Zurück zum Zitat Weyl, H.: Das asymptotische verteilungsgesetz der eigenwerte linearer partieller differentialgleichungen (mit einer anwendung auf die theorie der hohlraumstrahlung). Mathematische Annalen 71, 441–479 (1912)MATHMathSciNetCrossRef Weyl, H.: Das asymptotische verteilungsgesetz der eigenwerte linearer partieller differentialgleichungen (mit einer anwendung auf die theorie der hohlraumstrahlung). Mathematische Annalen 71, 441–479 (1912)MATHMathSciNetCrossRef
Metadaten
Titel
On Two Continuum Armed Bandit Problems in High Dimensions
verfasst von
Hemant Tyagi
Sebastian U. Stich
Bernd Gärtner
Publikationsdatum
01.01.2016
Verlag
Springer US
Erschienen in
Theory of Computing Systems / Ausgabe 1/2016
Print ISSN: 1432-4350
Elektronische ISSN: 1433-0490
DOI
https://doi.org/10.1007/s00224-014-9570-8

Weitere Artikel der Ausgabe 1/2016

Theory of Computing Systems 1/2016 Zur Ausgabe

Premium Partner