Skip to main content
Top
Published in: Theory of Computing Systems 1/2016

01-01-2016

On Two Continuum Armed Bandit Problems in High Dimensions

Authors: Hemant Tyagi, Sebastian U. Stich, Bernd Gärtner

Published in: Theory of Computing Systems | Issue 1/2016

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

We consider the problem of continuum armed bandits where the arms are indexed by a compact subset of \(\mathbb {R}^{d}\). For large d, it is well known that mere smoothness assumptions on the reward functions lead to regret bounds that suffer from the curse of dimensionality. A typical way to tackle this in the literature has been to make further assumptions on the structure of reward functions. In this work we assume the reward functions to be intrinsically of low dimension kd and consider two models: (i) The reward functions depend on only an unknown subset of k coordinate variables and, (ii) a generalization of (i) where the reward functions depend on an unknown k dimensional subspace of \(\mathbb {R}^{d}\). By placing suitable assumptions on the smoothness of the rewards we derive randomized algorithms for both problems that achieve nearly optimal regret bounds in terms of the number of rounds n.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Footnotes
1
Rewards sampled at each round in an i.i.d manner from an unknown probability distribution.
 
2
See Remark 1 in Section 3.1 for discussion on how the logn factor can be removed.
 
3
See Definition 2 in Section 2.1.
 
4
Indeed, any function that depends on k k coordinates also depends on at most k coordinates.
 
5
Indeed for a compact domain, any C 2 function is Lipschitz continuous but the converse is not necessarily true. Therefore, the mean reward functions that we consider, belong to a slightly restricted class of Lipschitz continuous functions.
 
6
This theorem is stated again in Section 4 for completeness.
 
7
This is actually true for k≥3. For k=1,2 the \(\left (n/\log n\right )^{\frac {4}{k+2}}\) factor dominates. See Remark 3 in Section 4.3 for details.
 
8
The interested reader can find a full analysis for the case k=1 in [42].
 
9
The above sampling scheme was first considered by Fornasier et al. [22], and later by Tyagi et al. [40], for the problem of approximating functions of the form f(x)=g(A x) from point queries.
 
10
Of course in practice we will not be able to solve (32) exactly, but will instead obtain a solution that can be made to come arbitrarily close to the actual solution. This difference will hence appear as an additional error term in the error bound of Lemma 4.
 
11
In the absence of external stochastic noise (i.e. σ=0) we can actually take 𝜖 to be arbitrarily small as shown by Tyagi et al. [41, Lemma 2]. This is also verified from (33), by plugging σ=0.
 
Literature
1.
go back to reference Abbasi-yadkori, Y., Pal, D., Szepesvari, C.: Online-to-confidence-set conversions and application to sparse stochastic bandits. In: Proceedings of AIStats (2012) Abbasi-yadkori, Y., Pal, D., Szepesvari, C.: Online-to-confidence-set conversions and application to sparse stochastic bandits. In: Proceedings of AIStats (2012)
2.
go back to reference Abernethy, J., Hazan, E., Rakhlin, A.: Competing in the dark: An efficient algorithm for bandit linear optimization. In: Proceedings of the 21st Annual Conference on Learning Theory (COLT) (2008) Abernethy, J., Hazan, E., Rakhlin, A.: Competing in the dark: An efficient algorithm for bandit linear optimization. In: Proceedings of the 21st Annual Conference on Learning Theory (COLT) (2008)
4.
go back to reference Audibert, J.Y., Bubeck, S.: Regret bounds and minimax policies under partial monitoring. J. Mach. Learn. Res. 11, 2635–2686 (2010)MathSciNet Audibert, J.Y., Bubeck, S.: Regret bounds and minimax policies under partial monitoring. J. Mach. Learn. Res. 11, 2635–2686 (2010)MathSciNet
5.
go back to reference Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 47 (2-3), 235–256 (2002)MATHCrossRef Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 47 (2-3), 235–256 (2002)MATHCrossRef
6.
go back to reference Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.: Gambling in a rigged casino: The adversarial multi-armed bandit problem. In: Proceedings of 36th Annual Symposium on Foundations of Computer Science, 1995, pp. 322–331 (1995) Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.: Gambling in a rigged casino: The adversarial multi-armed bandit problem. In: Proceedings of 36th Annual Symposium on Foundations of Computer Science, 1995, pp. 322–331 (1995)
7.
go back to reference Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.: The nonstochastic multiarmed bandit problem. SIAM J. Comput. 32 (1), 48–77 (2003)MathSciNetCrossRef Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.: The nonstochastic multiarmed bandit problem. SIAM J. Comput. 32 (1), 48–77 (2003)MathSciNetCrossRef
8.
go back to reference Auer, P., Ortner, R., Szepesvari, C.: Improved rates for the stochastic continuum-armed bandit problem. In: Proceedings of 20th Conference on Learning Theory (COLT), pp. 454–468 (2007) Auer, P., Ortner, R., Szepesvari, C.: Improved rates for the stochastic continuum-armed bandit problem. In: Proceedings of 20th Conference on Learning Theory (COLT), pp. 454–468 (2007)
9.
go back to reference Bansal, N., Blum, A., Chawla, S., Meyerson, A.: Online oblivious routing. In: Proceedings of ACM Symposium in Parallelism in Algorithms and Architectures, pp. 44–49 (2003) Bansal, N., Blum, A., Chawla, S., Meyerson, A.: Online oblivious routing. In: Proceedings of ACM Symposium in Parallelism in Algorithms and Architectures, pp. 44–49 (2003)
10.
go back to reference Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 15, 1373–1396 (2003)MATHCrossRef Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 15, 1373–1396 (2003)MATHCrossRef
11.
go back to reference Blum, A., Kumar, V., Rudra, A., Wu, F.: Online learning in online auctions. In: Proceedings of 14th Symp. on Discrete Alg., pp. 202–204 (2003) Blum, A., Kumar, V., Rudra, A., Wu, F.: Online learning in online auctions. In: Proceedings of 14th Symp. on Discrete Alg., pp. 202–204 (2003)
12.
go back to reference Bubeck, S., Munos, R., Stoltz, G., Szepesvari, C.: X-armed bandits. J. Mach. Learn. Res. (JMLR) 12, 1587–1627 (2011)MathSciNet Bubeck, S., Munos, R., Stoltz, G., Szepesvari, C.: X-armed bandits. J. Mach. Learn. Res. (JMLR) 12, 1587–1627 (2011)MathSciNet
13.
go back to reference Bubeck, S., Stoltz, G., Yu, J.: Lipschitz bandits without the Lipschitz constant. In: Proceedings of the 22nd International Conference on Algorithmic Learning Theory (ALT), pp. 144–158 (2011) Bubeck, S., Stoltz, G., Yu, J.: Lipschitz bandits without the Lipschitz constant. In: Proceedings of the 22nd International Conference on Algorithmic Learning Theory (ALT), pp. 144–158 (2011)
14.
go back to reference Candès, E., Plan, Y.: Tight oracle bounds for low-rank matrix recovery from a minimal number of random measurements. CoRR abs/1001.0339 (2010) Candès, E., Plan, Y.: Tight oracle bounds for low-rank matrix recovery from a minimal number of random measurements. CoRR abs/1001.0339 (2010)
15.
go back to reference Carpentier, A., Munos, R.: Bandit theory meets compressed sensing for high dimensional stochastic linear bandit. In: Proceedings of AIStats, pp. 190–198 (2012) Carpentier, A., Munos, R.: Bandit theory meets compressed sensing for high dimensional stochastic linear bandit. In: Proceedings of AIStats, pp. 190–198 (2012)
16.
go back to reference Chen, B., Castro, R., Krause, A.: Joint optimization and variable selection of high-dimensional gaussian processes. In: Proceedings International Conference on Machine Learning (ICML) (2012) Chen, B., Castro, R., Krause, A.: Joint optimization and variable selection of high-dimensional gaussian processes. In: Proceedings International Conference on Machine Learning (ICML) (2012)
18.
go back to reference Cope, E.: Regret and convergence bounds for a class of continuum-armed bandit problems. IEEE Trans. Autom. Control 54, 1243–1253 (2009)MathSciNetCrossRef Cope, E.: Regret and convergence bounds for a class of continuum-armed bandit problems. IEEE Trans. Autom. Control 54, 1243–1253 (2009)MathSciNetCrossRef
19.
go back to reference DeVore, R., Petrova, G., Wojtaszczyk, P.: Approximation of functions of few variables in high dimensions. Constr. Approx 33, 125–143 (2011)MATHMathSciNetCrossRef DeVore, R., Petrova, G., Wojtaszczyk, P.: Approximation of functions of few variables in high dimensions. Constr. Approx 33, 125–143 (2011)MATHMathSciNetCrossRef
20.
go back to reference Djolonga, J., Krause, A., Cevher, V.: High dimensional gaussian process bandits. In: To Appear in Neural Information Processing Systems (NIPS) (2013) Djolonga, J., Krause, A., Cevher, V.: High dimensional gaussian process bandits. In: To Appear in Neural Information Processing Systems (NIPS) (2013)
21.
go back to reference Flaxman, A., Kalai, A., McMahan, H.: Online convex optimization in the bandit setting: gradient descent without a gradient. In: Proceedings of the Sixteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 385–394 (2005) Flaxman, A., Kalai, A., McMahan, H.: Online convex optimization in the bandit setting: gradient descent without a gradient. In: Proceedings of the Sixteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 385–394 (2005)
22.
go back to reference Fornasier, M., Schnass, K., Vybiral, J.: Learning functions of few arbitrary linear parameters in high dimensions. Found. Comput. Math. 12 (2), 229–262 (2012)MATHMathSciNetCrossRef Fornasier, M., Schnass, K., Vybiral, J.: Learning functions of few arbitrary linear parameters in high dimensions. Found. Comput. Math. 12 (2), 229–262 (2012)MATHMathSciNetCrossRef
23.
go back to reference Fredman, M., Komlós, J.: On the size of separating systems and families of perfect hash functions. SIAM. J. Algebr. Discret. Methods 5, 61–68 (1984)MATHCrossRef Fredman, M., Komlós, J.: On the size of separating systems and families of perfect hash functions. SIAM. J. Algebr. Discret. Methods 5, 61–68 (1984)MATHCrossRef
24.
go back to reference Fredman, M., Komlós, J., Szemerédi, E.: Storing a sparse table with 0(1) worst case access time. J. ACM 31 (3), 538–544 (1984)MATHCrossRef Fredman, M., Komlós, J., Szemerédi, E.: Storing a sparse table with 0(1) worst case access time. J. ACM 31 (3), 538–544 (1984)MATHCrossRef
25.
go back to reference Greenshtein, E.: Best subset selection, persistence in high dimensional statistical learning and optimization under ℓ 1 constraint. Ann. Stat. 34, 2367–2386 (2006)MATHMathSciNetCrossRef Greenshtein, E.: Best subset selection, persistence in high dimensional statistical learning and optimization under 1 constraint. Ann. Stat. 34, 2367–2386 (2006)MATHMathSciNetCrossRef
26.
go back to reference Kleinberg, R.: Nearly tight bounds for the continuum-armed bandit problem. In: 18th Advances in Neural Information Processing Systems (2004) Kleinberg, R.: Nearly tight bounds for the continuum-armed bandit problem. In: 18th Advances in Neural Information Processing Systems (2004)
27.
go back to reference Kleinberg, R.: Online decision problems with large strategy sets. Ph.D. thesis. MIT, Boston (2005) Kleinberg, R.: Online decision problems with large strategy sets. Ph.D. thesis. MIT, Boston (2005)
28.
go back to reference Kleinberg, R., Leighton, T.: The value of knowing a demand curve: Bounds on regret for online posted-price auctions. In: Proceedings of Foundations of Computer Science, 2003., pp. 594–605 (2003) Kleinberg, R., Leighton, T.: The value of knowing a demand curve: Bounds on regret for online posted-price auctions. In: Proceedings of Foundations of Computer Science, 2003., pp. 594–605 (2003)
29.
go back to reference Kleinberg, R., Slivkins, A., Upfal, E.: Multi-armed bandits in metric spaces. In: Proceedings of the 40th Annual ACM Symposium on Theory of Computing, STOC ’08, pp. 681–690 (2008) Kleinberg, R., Slivkins, A., Upfal, E.: Multi-armed bandits in metric spaces. In: Proceedings of the 40th Annual ACM Symposium on Theory of Computing, STOC ’08, pp. 681–690 (2008)
30.
go back to reference Körner, J.: Fredmankomlós bounds and information theory. SIAM J. Algebraic Discret. Methods 7 (4), 560–570 (1986)MATHCrossRef Körner, J.: Fredmankomlós bounds and information theory. SIAM J. Algebraic Discret. Methods 7 (4), 560–570 (1986)MATHCrossRef
31.
32.
go back to reference Li, Q., Racine, J.: Nonparametric econometrics: Theory and practice (2007) Li, Q., Racine, J.: Nonparametric econometrics: Theory and practice (2007)
33.
go back to reference McMahan, B., Blum, A.: Online geometric optimization in the bandit setting against an adaptive adversary. In: Proceedings of the 17th Annual Conference on Learning Theory (COLT), pp. 109–123 (2004) McMahan, B., Blum, A.: Online geometric optimization in the bandit setting against an adaptive adversary. In: Proceedings of the 17th Annual Conference on Learning Theory (COLT), pp. 109–123 (2004)
34.
go back to reference Mossel, E., O’Donnell, R., Servedio, R.: Learning juntas. In: Proceedings of the Thirty-Fifth Annual ACM Symposium on Theory of Computing, STOC, pp. 206–212. ACM (2003) Mossel, E., O’Donnell, R., Servedio, R.: Learning juntas. In: Proceedings of the Thirty-Fifth Annual ACM Symposium on Theory of Computing, STOC, pp. 206–212. ACM (2003)
35.
go back to reference Naor, M., Schulman, L., Srinivasan, A.: Splitters and near-optimal derandomization. In: Proceedings of the 36th Annual Symposium on Foundations of Computer Science, pp. 182–191 (1995) Naor, M., Schulman, L., Srinivasan, A.: Splitters and near-optimal derandomization. In: Proceedings of the 36th Annual Symposium on Foundations of Computer Science, pp. 182–191 (1995)
37.
go back to reference Orlitsky, A.: Worst-case interactive communication i: Two messages are almost optimal. IEEE Trans. Inf. Theory 36, 1111–1126 (1990)MATHMathSciNetCrossRef Orlitsky, A.: Worst-case interactive communication i: Two messages are almost optimal. IEEE Trans. Inf. Theory 36, 1111–1126 (1990)MATHMathSciNetCrossRef
38.
go back to reference Recht, B., Fazel, M., Parrilo, P.: Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Rev. 52, 471–501 (2010)MATHMathSciNetCrossRef Recht, B., Fazel, M., Parrilo, P.: Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Rev. 52, 471–501 (2010)MATHMathSciNetCrossRef
40.
go back to reference Tyagi, H., Cevher, V.: Active learning of multi-index function models. In: Advances in Neural Information Processing Systems, vol. 25, pp. 1475–1483 (2012) Tyagi, H., Cevher, V.: Active learning of multi-index function models. In: Advances in Neural Information Processing Systems, vol. 25, pp. 1475–1483 (2012)
41.
go back to reference Tyagi, H., Cevher, V.: Learning non-parametric basis independent models from point queries via low-rank methods. Appl. Comput. Harmonic Anal. (2014) Tyagi, H., Cevher, V.: Learning non-parametric basis independent models from point queries via low-rank methods. Appl. Comput. Harmonic Anal. (2014)
42.
go back to reference Tyagi, H., Gärtner, B.: Continuum armed bandit problem of few variables in high dimensions. CoRR abs/1304.5793 (2013) Tyagi, H., Gärtner, B.: Continuum armed bandit problem of few variables in high dimensions. CoRR abs/1304.5793 (2013)
43.
go back to reference Wang, Z., Zoghi, M., Hutter, F., Matheson, D., de Freitas, N.: Bayesian optimization in high dimensions via random embeddings. In: Proc. IJCAI (2013) Wang, Z., Zoghi, M., Hutter, F., Matheson, D., de Freitas, N.: Bayesian optimization in high dimensions via random embeddings. In: Proc. IJCAI (2013)
45.
go back to reference Weyl, H.: Das asymptotische verteilungsgesetz der eigenwerte linearer partieller differentialgleichungen (mit einer anwendung auf die theorie der hohlraumstrahlung). Mathematische Annalen 71, 441–479 (1912)MATHMathSciNetCrossRef Weyl, H.: Das asymptotische verteilungsgesetz der eigenwerte linearer partieller differentialgleichungen (mit einer anwendung auf die theorie der hohlraumstrahlung). Mathematische Annalen 71, 441–479 (1912)MATHMathSciNetCrossRef
Metadata
Title
On Two Continuum Armed Bandit Problems in High Dimensions
Authors
Hemant Tyagi
Sebastian U. Stich
Bernd Gärtner
Publication date
01-01-2016
Publisher
Springer US
Published in
Theory of Computing Systems / Issue 1/2016
Print ISSN: 1432-4350
Electronic ISSN: 1433-0490
DOI
https://doi.org/10.1007/s00224-014-9570-8

Other articles of this Issue 1/2016

Theory of Computing Systems 1/2016 Go to the issue

Premium Partner