Skip to main content
Top

2025 | OriginalPaper | Chapter

Sparse Linear Regression and Lattice Problems

Authors : Aparna Gupte, Neekon Vafa, Vinod Vaikuntanathan

Published in: Theory of Cryptography

Publisher: Springer Nature Switzerland

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Sparse linear regression (SLR) is a well-studied problem in statistics where one is given a design matrix \(\textbf{X} \in \mathbb {R}^{m \times n}\) and a response vector \(\textbf{y} = \textbf{X} \boldsymbol{\theta }^* + \textbf{w}\) for a k-sparse vector \(\boldsymbol{\theta }^*\) (that is, \(\Vert \boldsymbol{\theta }^*\Vert _0 \le k\)) and small, arbitrary noise \(\textbf{w}\), and the goal is to find a k-sparse \(\widehat{\boldsymbol{\theta }} \in \mathbb {R}^{n}\) that minimizes the mean squared prediction error \(\frac{1}{m} \Vert \textbf{X} \widehat{\boldsymbol{\theta }} - \textbf{X} \boldsymbol{\theta }^*\Vert ^2_2\). While \(\ell _1\)-relaxation methods such as basis pursuit, Lasso, and the Dantzig selector solve SLR when the design matrix is well-conditioned, no general algorithm is known, nor is there any formal evidence of hardness in an average-case setting with respect to all efficient algorithms.
We give evidence of average-case hardness of SLR w.r.t. all efficient algorithms assuming the worst-case hardness of lattice problems. Specifically, we give an instance-by-instance reduction from a variant of the bounded distance decoding (BDD) problem on lattices to SLR, where the condition number of the lattice basis that defines the BDD instance is directly related to the restricted eigenvalue condition of the design matrix, which characterizes some of the classical statistical-computational gaps for sparse linear regression. Also, by appealing to worst-case to average-case reductions from the world of lattices, this shows hardness for a distribution of SLR instances; while the design matrices are ill-conditioned, the resulting SLR instances are in the identifiable regime.
Furthermore, for well-conditioned (essentially) isotropic Gaussian design matrices, where Lasso is known to behave well in the identifiable regime, we show hardness of outputting any good solution in the unidentifiable regime where there are many solutions, assuming the worst-case hardness of standard and well-studied lattice problems.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
This bound assumes \(\boldsymbol{\textbf{X}}\) satisfies a certain column-normalization condition (see Definition 4).
 
2
They show slightly more, i.e., they construct a distribution over \(\boldsymbol{\theta }^*\) for which sparse linear regression is hard; however, to the best of our knowledge, this distribution is not polynomial-time sampleable. Indeed, if this distribution were sampleable, their result would show a worst-case to average-case reduction for NP which remains a major open problem in complexity theory.
 
3
We actually use a slightly different definition of \(\lambda _1(\textbf{B})\) which is unimportant for this exposition; we refer the reader to Sect. 2 for more details.
 
4
We remark that the binary LWE (learning with errors) problem, where the LWE secret is binary, is not a special case of binary BDD even though LWE is a special case of BDD. Indeed, when writing a binary LWE instance as a BDD instance in the canonical way, only a part of the coefficient vector of the closest lattice point is binary. Thus, even though binary LWE is equivalent in hardness to LWE, [GKPV10, BLP+13, Mic18], we do not know such a statement relating binary BDD and BDD that preserves conditioning of the lattice basis.
 
5
The bottom \(k\times n\) sub-matrix of \(\textbf{X}\) (call it \(\textbf{X}_2\)) is a fixed, worst-case, matrix while each of the top rows is drawn i.i.d. from \(\mathcal {N}(\textbf{0}, \boldsymbol{\varSigma })\). There are two natural ways to remove the “worst-case” nature of \(\boldsymbol{\textbf{X}}_2\). We discuss them in Sect. 3.1.
 
6
More precisely, \(\beta \)-improve Lasso for \(\beta > 1\).
 
7
In the unidentifiable regime, minimizing the prediction error is not information-theoretically possible without some constraint on the noise.
 
8
We greatly thank Kiril Bangachev for turning the expected value statement into a high probability statement.
 
Literature
[Bab86]
[BDT24]
go back to reference Buhai, R.D., Ding, J., Tiegel, S.: Computational-statistical gaps for improper learning in sparse linear regression (2024). Personal Communication Buhai, R.D., Ding, J., Tiegel, S.: Computational-statistical gaps for improper learning in sparse linear regression (2024). Personal Communication
[BLP+13]
go back to reference Brakerski, Z., Langlois, A., Peikert, C., Regev, O., Stehlé, D.: Classical hardness of learning with errors. In: Boneh, D., Roughgarden, T., Feigenbaum, J., (eds.) Symposium on Theory of Computing Conference, STOC 2013, Palo Alto, CA, USA, June 1-4, 2013, pp. 575–584. ACM, (2013) Brakerski, Z., Langlois, A., Peikert, C., Regev, O., Stehlé, D.: Classical hardness of learning with errors. In: Boneh, D., Roughgarden, T., Feigenbaum, J., (eds.) Symposium on Theory of Computing Conference, STOC 2013, Palo Alto, CA, USA, June 1-4, 2013, pp. 575–584. ACM, (2013)
[BRST21]
go back to reference Bruna, J., Regev, O., Song, M.J., Tang, Y.: Continuous LWE. In: Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing, pp. 694–707 (2021) Bruna, J., Regev, O., Song, M.J., Tang, Y.: Continuous LWE. In: Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing, pp. 694–707 (2021)
[CDS98]
go back to reference Chen, S.S., Donoho, D.L., Saunders, M.A.: Saunders. Atomic decomposition by basis pursuit. SIAM J. Sci. Comput. 20(1), 33–61 (1998) Chen, S.S., Donoho, D.L., Saunders, M.A.: Saunders. Atomic decomposition by basis pursuit. SIAM J. Sci. Comput. 20(1), 33–61 (1998)
[CT07]
go back to reference Candes, E., Tao, T.: The Dantzig selector: statistical estimation when p is much larger than n. Ann. Stat. 35(6), 2313–2351 (2007)MathSciNet Candes, E., Tao, T.: The Dantzig selector: statistical estimation when p is much larger than n. Ann. Stat. 35(6), 2313–2351 (2007)MathSciNet
[GKPV10]
go back to reference Goldwasser, S., Kalai, Y.T., Peikert, C.: Robustness of the learning with errors assumption. In: Chi-Chih Yao, A., (ed.) Innovations in Computer Science - ICS 2010, Tsinghua University, Beijing, China, January 5-7, 2010. Proceedings, pp. 230–240. Tsinghua University Press (2010) Goldwasser, S., Kalai, Y.T., Peikert, C.: Robustness of the learning with errors assumption. In: Chi-Chih Yao, A., (ed.) Innovations in Computer Science - ICS 2010, Tsinghua University, Beijing, China, January 5-7, 2010. Proceedings, pp. 230–240. Tsinghua University Press (2010)
[GKZ21]
go back to reference Gamarnik, D., Kizildag, E.C., Zadik, I.: Inference in high-dimensional linear regression via lattice basis reduction and integer relation detection. IEEE Trans. Inf. Theory 67(12), 8109–8139 (2021)MathSciNetCrossRef Gamarnik, D., Kizildag, E.C., Zadik, I.: Inference in high-dimensional linear regression via lattice basis reduction and integer relation detection. IEEE Trans. Inf. Theory 67(12), 8109–8139 (2021)MathSciNetCrossRef
[GV21]
[GVV22]
go back to reference Gupte, A., Vafa, N., Vaikuntanathan, V.: Continuous LWE is as hard as lwe & applications to learning gaussian mixtures. In: 2022 IEEE 63rd Annual Symposium on Foundations of Computer Science (FOCS), pp. 1162–1173. IEEE (2022) Gupte, A., Vafa, N., Vaikuntanathan, V.: Continuous LWE is as hard as lwe & applications to learning gaussian mixtures. In: 2022 IEEE 63rd Annual Symposium on Foundations of Computer Science (FOCS), pp. 1162–1173. IEEE (2022)
[GVV24]
[KKMR21]
go back to reference Kelner, J.A., Koehler, F., Meka, R., Rohatgi, D.: On the power of preconditioning in sparse linear regression. CoRR, abs/2106.09207 (2021) Kelner, J.A., Koehler, F., Meka, R., Rohatgi, D.: On the power of preconditioning in sparse linear regression. CoRR, abs/2106.09207 (2021)
[KKMR22]
go back to reference Kelner, J., Koehler, F., Meka, R., Rohatgi, D.: Lower bounds on randomly preconditioned lasso via robust sparse designs. Adv. Neural. Inf. Process. Syst. 35, 24419–24431 (2022) Kelner, J., Koehler, F., Meka, R., Rohatgi, D.: Lower bounds on randomly preconditioned lasso via robust sparse designs. Adv. Neural. Inf. Process. Syst. 35, 24419–24431 (2022)
[KKMR23]
go back to reference Kelner, J., Koehler, F., Meka, R., Rohatgi, D.: Feature adaptation for sparse linear regression. CoRR, abs/2305.16892, 2023 Kelner, J., Koehler, F., Meka, R., Rohatgi, D.: Feature adaptation for sparse linear regression. CoRR, abs/2305.16892, 2023
[LLM06]
[LM00]
go back to reference Laurent, B., Massart, P.: Adaptive estimation of a quadratic functional by model selection. Ann. Stat. 28, 1302–1338 (2000)MathSciNetCrossRef Laurent, B., Massart, P.: Adaptive estimation of a quadratic functional by model selection. Ann. Stat. 28, 1302–1338 (2000)MathSciNetCrossRef
[Mic18]
[Pei09]
go back to reference Peikert, C.: Public-key cryptosystems from the worst-case shortest vector problem: extended abstract. In: Mitzenmacher, M., (ed.) Proceedings of the 41st Annual ACM Symposium on Theory of Computing, STOC 2009, Bethesda, MD, USA, May 31 - June 2, 2009, pp. 333–342. ACM (2009) Peikert, C.: Public-key cryptosystems from the worst-case shortest vector problem: extended abstract. In: Mitzenmacher, M., (ed.) Proceedings of the 41st Annual ACM Symposium on Theory of Computing, STOC 2009, Bethesda, MD, USA, May 31 - June 2, 2009, pp. 333–342. ACM (2009)
[Reg09]
go back to reference Regev, O.: On lattices, learning with errors, random linear codes, and cryptography. J. ACM, 56(6), 34:1–34:40 (2009) Regev, O.: On lattices, learning with errors, random linear codes, and cryptography. J. ACM, 56(6), 34:1–34:40 (2009)
[RV10]
go back to reference Rudelson, M., Vershynin, R.: Non-asymptotic theory of random matrices: extreme singular values. In: Proceedings of the International Congress of Mathematicians 2010 (ICM 2010) (In 4 Volumes) Vol. I: Plenary Lectures and Ceremonies Vols. II–IV: Invited Lectures, pp. 1576–1602. World Scientific (2010) Rudelson, M., Vershynin, R.: Non-asymptotic theory of random matrices: extreme singular values. In: Proceedings of the International Congress of Mathematicians 2010 (ICM 2010) (In 4 Volumes) Vol. I: Plenary Lectures and Ceremonies Vols. II–IV: Invited Lectures, pp. 1576–1602. World Scientific (2010)
[Sch87]
go back to reference Schnorr, C.-P.: A hierarchy of polynomial time lattice basis reduction algorithms. Theor. Comput. Sci. 53, 201–224 (1987)MathSciNetCrossRef Schnorr, C.-P.: A hierarchy of polynomial time lattice basis reduction algorithms. Theor. Comput. Sci. 53, 201–224 (1987)MathSciNetCrossRef
[Tib96]
go back to reference Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc.: Ser. B (Methodol.) 58(1), 267–288 (1996)MathSciNetCrossRef Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc.: Ser. B (Methodol.) 58(1), 267–288 (1996)MathSciNetCrossRef
[ZSWB22]
go back to reference Zadik, I., Song, M.J., Wein, A.S., Bruna, J.: Lattice-based methods surpass sum-of-squares in clustering. In: Loh P.-L., Raginsky, M., (eds.) Conference on Learning Theory, 2-5 July 2022, London, UK, vol. 178 of Proceedings of Machine Learning Research, pp. 1247–1248. PMLR (2022) Zadik, I., Song, M.J., Wein, A.S., Bruna, J.: Lattice-based methods surpass sum-of-squares in clustering. In: Loh P.-L., Raginsky, M., (eds.) Conference on Learning Theory, 2-5 July 2022, London, UK, vol. 178 of Proceedings of Machine Learning Research, pp. 1247–1248. PMLR (2022)
[ZWJ14]
go back to reference Zhang, Y., Wainwright, M.J., Jordan, M.I.: Lower bounds on the performance of polynomial-time algorithms for sparse linear regression. In: Conference on Learning Theory, pp. 921–948 (2014 Zhang, Y., Wainwright, M.J., Jordan, M.I.: Lower bounds on the performance of polynomial-time algorithms for sparse linear regression. In: Conference on Learning Theory, pp. 921–948 (2014
Metadata
Title
Sparse Linear Regression and Lattice Problems
Authors
Aparna Gupte
Neekon Vafa
Vinod Vaikuntanathan
Copyright Year
2025
DOI
https://doi.org/10.1007/978-3-031-78017-2_10

Premium Partner