Top

Theory of Computing Systems

Published in:

01-01-2016

On Two Continuum Armed Bandit Problems in High Dimensions

Authors: Hemant Tyagi, Sebastian U. Stich, Bernd Gärtner

Published in: Theory of Computing Systems | Issue 1/2016

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

We consider the problem of continuum armed bandits where the arms are indexed by a compact subset of \(\mathbb {R}^{d}\). For large d, it is well known that mere smoothness assumptions on the reward functions lead to regret bounds that suffer from the curse of dimensionality. A typical way to tackle this in the literature has been to make further assumptions on the structure of reward functions. In this work we assume the reward functions to be intrinsically of low dimension k ≪ d and consider two models: (i) The reward functions depend on only an unknown subset of k coordinate variables and, (ii) a generalization of (i) where the reward functions depend on an unknown k dimensional subspace of \(\mathbb {R}^{d}\). By placing suitable assumptions on the smoothness of the rewards we derive randomized algorithms for both problems that achieve nearly optimal regret bounds in terms of the number of rounds n.

previous article Online Knapsack Revisited

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Available only for authorised users

Rewards sampled at each round in an i.i.d manner from an unknown probability distribution.

See Remark 1 in Section 3.1 for discussion on how the logn factor can be removed.

See Definition 2 in Section 2.1.

Indeed, any function that depends on k ^′ ≤ k coordinates also depends on at most k coordinates.

Indeed for a compact domain, any C ² function is Lipschitz continuous but the converse is not necessarily true. Therefore, the mean reward functions that we consider, belong to a slightly restricted class of Lipschitz continuous functions.

This theorem is stated again in Section 4 for completeness.

This is actually true for k≥3. For k=1,2 the \(\left (n/\log n\right )^{\frac {4}{k+2}}\) factor dominates. See Remark 3 in Section 4.3 for details.

The interested reader can find a full analysis for the case k=1 in [42].

The above sampling scheme was first considered by Fornasier et al. [22], and later by Tyagi et al. [40], for the problem of approximating functions of the form f(x)=g(A x) from point queries.

Of course in practice we will not be able to solve (32) exactly, but will instead obtain a solution that can be made to come arbitrarily close to the actual solution. This difference will hence appear as an additional error term in the error bound of Lemma 4.

In the absence of external stochastic noise (i.e. σ=0) we can actually take 𝜖 to be arbitrarily small as shown by Tyagi et al. [41, Lemma 2]. This is also verified from (33), by plugging σ=0.

Abbasi-yadkori, Y., Pal, D., Szepesvari, C.: Online-to-confidence-set conversions and application to sparse stochastic bandits. In: Proceedings of AIStats (2012)

Abernethy, J., Hazan, E., Rakhlin, A.: Competing in the dark: An efficient algorithm for bandit linear optimization. In: Proceedings of the 21st Annual Conference on Learning Theory (COLT) (2008)

Agrawal, R.: The continuum-armed bandit problem. SIAM J. Control Optim. 33, 1926–1951 (1995)MATHMathSciNetCrossRef

Audibert, J.Y., Bubeck, S.: Regret bounds and minimax policies under partial monitoring. J. Mach. Learn. Res. 11, 2635–2686 (2010)MathSciNet

Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 47 (2-3), 235–256 (2002)MATHCrossRef

Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.: Gambling in a rigged casino: The adversarial multi-armed bandit problem. In: Proceedings of 36th Annual Symposium on Foundations of Computer Science, 1995, pp. 322–331 (1995)

Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.: The nonstochastic multiarmed bandit problem. SIAM J. Comput. 32 (1), 48–77 (2003)MathSciNetCrossRef

Auer, P., Ortner, R., Szepesvari, C.: Improved rates for the stochastic continuum-armed bandit problem. In: Proceedings of 20th Conference on Learning Theory (COLT), pp. 454–468 (2007)

Bansal, N., Blum, A., Chawla, S., Meyerson, A.: Online oblivious routing. In: Proceedings of ACM Symposium in Parallelism in Algorithms and Architectures, pp. 44–49 (2003)

10.

Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 15, 1373–1396 (2003)MATHCrossRef

11.

Blum, A., Kumar, V., Rudra, A., Wu, F.: Online learning in online auctions. In: Proceedings of 14th Symp. on Discrete Alg., pp. 202–204 (2003)

12.

Bubeck, S., Munos, R., Stoltz, G., Szepesvari, C.: X-armed bandits. J. Mach. Learn. Res. (JMLR) 12, 1587–1627 (2011)MathSciNet

13.

Bubeck, S., Stoltz, G., Yu, J.: Lipschitz bandits without the Lipschitz constant. In: Proceedings of the 22nd International Conference on Algorithmic Learning Theory (ALT), pp. 144–158 (2011)

14.

Candès, E., Plan, Y.: Tight oracle bounds for low-rank matrix recovery from a minimal number of random measurements. CoRR abs/1001.0339 (2010)

15.

Carpentier, A., Munos, R.: Bandit theory meets compressed sensing for high dimensional stochastic linear bandit. In: Proceedings of AIStats, pp. 190–198 (2012)

16.

Chen, B., Castro, R., Krause, A.: Joint optimization and variable selection of high-dimensional gaussian processes. In: Proceedings International Conference on Machine Learning (ICML) (2012)

17.

Coifman, R., Maggioni, M.: Diffusion wavelets. Appl. Comput. Harmon. Anal. 21, 53–94 (2006)MATHMathSciNetCrossRef

18.

Cope, E.: Regret and convergence bounds for a class of continuum-armed bandit problems. IEEE Trans. Autom. Control 54, 1243–1253 (2009)MathSciNetCrossRef

19.

DeVore, R., Petrova, G., Wojtaszczyk, P.: Approximation of functions of few variables in high dimensions. Constr. Approx 33, 125–143 (2011)MATHMathSciNetCrossRef

20.

Djolonga, J., Krause, A., Cevher, V.: High dimensional gaussian process bandits. In: To Appear in Neural Information Processing Systems (NIPS) (2013)

21.

Flaxman, A., Kalai, A., McMahan, H.: Online convex optimization in the bandit setting: gradient descent without a gradient. In: Proceedings of the Sixteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 385–394 (2005)

22.

Fornasier, M., Schnass, K., Vybiral, J.: Learning functions of few arbitrary linear parameters in high dimensions. Found. Comput. Math. 12 (2), 229–262 (2012)MATHMathSciNetCrossRef

23.

Fredman, M., Komlós, J.: On the size of separating systems and families of perfect hash functions. SIAM. J. Algebr. Discret. Methods 5, 61–68 (1984)MATHCrossRef

24.

Fredman, M., Komlós, J., Szemerédi, E.: Storing a sparse table with 0(1) worst case access time. J. ACM 31 (3), 538–544 (1984)MATHCrossRef

25.

Greenshtein, E.: Best subset selection, persistence in high dimensional statistical learning and optimization under ℓ ₁ constraint. Ann. Stat. 34, 2367–2386 (2006)MATHMathSciNetCrossRef

26.

Kleinberg, R.: Nearly tight bounds for the continuum-armed bandit problem. In: 18th Advances in Neural Information Processing Systems (2004)

27.

Kleinberg, R.: Online decision problems with large strategy sets. Ph.D. thesis. MIT, Boston (2005)

28.

Kleinberg, R., Leighton, T.: The value of knowing a demand curve: Bounds on regret for online posted-price auctions. In: Proceedings of Foundations of Computer Science, 2003., pp. 594–605 (2003)

29.

Kleinberg, R., Slivkins, A., Upfal, E.: Multi-armed bandits in metric spaces. In: Proceedings of the 40th Annual ACM Symposium on Theory of Computing, STOC ’08, pp. 681–690 (2008)

30.

Körner, J.: Fredmankomlós bounds and information theory. SIAM J. Algebraic Discret. Methods 7 (4), 560–570 (1986)MATHCrossRef

31.

Laurent, B., Massart, P.: Adaptive estimation of a quadratic functional by model selection. Ann. Stat. 28 (5), 1302–1338 (2000)MATHMathSciNetCrossRef

32.

Li, Q., Racine, J.: Nonparametric econometrics: Theory and practice (2007)

33.

McMahan, B., Blum, A.: Online geometric optimization in the bandit setting against an adaptive adversary. In: Proceedings of the 17th Annual Conference on Learning Theory (COLT), pp. 109–123 (2004)

34.

Mossel, E., O’Donnell, R., Servedio, R.: Learning juntas. In: Proceedings of the Thirty-Fifth Annual ACM Symposium on Theory of Computing, STOC, pp. 206–212. ACM (2003)

35.

Naor, M., Schulman, L., Srinivasan, A.: Splitters and near-optimal derandomization. In: Proceedings of the 36th Annual Symposium on Foundations of Computer Science, pp. 182–191 (1995)

36.

Nilli, A.: Perfect hashing and probability. Comb. Probab. Comput. 3, 407–409 (1994)MATHMathSciNetCrossRef

37.

Orlitsky, A.: Worst-case interactive communication i: Two messages are almost optimal. IEEE Trans. Inf. Theory 36, 1111–1126 (1990)MATHMathSciNetCrossRef

38.

Recht, B., Fazel, M., Parrilo, P.: Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Rev. 52, 471–501 (2010)MATHMathSciNetCrossRef

39.

Tropp, J.: User-friendly tail bounds for sums of random matrices. Found. Comput. Math. 12 (4), 389–434 (2012)MATHMathSciNetCrossRef

40.

Tyagi, H., Cevher, V.: Active learning of multi-index function models. In: Advances in Neural Information Processing Systems, vol. 25, pp. 1475–1483 (2012)

41.

Tyagi, H., Cevher, V.: Learning non-parametric basis independent models from point queries via low-rank methods. Appl. Comput. Harmonic Anal. (2014)

42.

Tyagi, H., Gärtner, B.: Continuum armed bandit problem of few variables in high dimensions. CoRR abs/1304.5793 (2013)

43.

Wang, Z., Zoghi, M., Hutter, F., Matheson, D., de Freitas, N.: Bayesian optimization in high dimensions via random embeddings. In: Proc. IJCAI (2013)

44.

Wedin, P.: Perturbation bounds in connection with singular value decomposition. BIT 12, 99–111 (1972)MATHMathSciNetCrossRef

45.

Weyl, H.: Das asymptotische verteilungsgesetz der eigenwerte linearer partieller differentialgleichungen (mit einer anwendung auf die theorie der hohlraumstrahlung). Mathematische Annalen 71, 441–479 (1912)MATHMathSciNetCrossRef

Title: On Two Continuum Armed Bandit Problems in High Dimensions
Authors: Hemant Tyagi
Sebastian U. Stich
Bernd Gärtner
Publication date: 01-01-2016
Publisher: Springer US
Published in: Theory of Computing Systems / Issue 1/2016
Print ISSN: 1432-4350
Electronic ISSN: 1433-0490
DOI: https://doi.org/10.1007/s00224-014-9570-8

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Other articles of this Issue 1/2016

Improved Approximation Algorithm for k-level Uncapacitated Facility Location Problem (with Penalties)

On Fixed Cost k-Flow Problems

Degree-Constrained Graph Orientation: Maximum Satisfaction and Minimum Violation

Approximately Counting Approximately-Shortest Paths in Directed Acyclic Graphs

Foreword of the Special Issue Dedicated to the 2013 Workshop on Approximation and Online Algorithms

Min-Sum 2-Paths Problems

Premium Partner