## AbstractWe consider least-squares regression using a randomly generated subspace $\G_P\subset\F$ of finite dimension $P$, where $\F$ is a function space of infinite dimension, e.g.~$L_2([0,1]^d)$. $\G_P$ is defined as the span of $P$ random features that are linear combinations of the basis functions of $\F$ weighted by random Gaussian i.i.d.~coefficients. In particular, we consider multi-resolution random combinations at all scales of a given mother function, such as a hat function or a wavelet. In this latter case, the resulting Gaussian objects are called {\em scrambled wavelets} and we show that they enable to approximate functions in Sobolev spaces $H^{s}([0,1]^d)$. As a result, given $N$ data, the least-squares estimate $\hat g$ built from $P$ scrambled wavelets has excess risk $||f^* - \hat g||_\P^2 = O(||f^*||^2_{H^{s}([0,1]^d)}(\log N)/P + P(\log N )/N)$ for target functions $f^*\in H^{s}([0,1]^d)$ of smoothness order $s>d/2$. An interesting aspect of the resulting bounds is that they do not depend on the distribution $\P$ from which the data are generated, which is important in a statistical regression setting considered here. Randomization enables to adapt to any possible distribution. We conclude by describing an efficient numerical implementation using lazy expansions with numerical complexity $\tilde O(2^d N^{3/2}\log N + N^2)$, where $d$ is the dimension of the input space.
[Edit] |