Confidence sets for Gaussian linear models
In statistical inference, we often want to construct confidence sets for the parameter of interest; for example, you may have heard of the 95% confidence interval. Confidence sets are a way to quantify the uncertainty in our estimates. At its core, the construction of confidence sets is typically achieved through inverting some concentration inequalities.
Gaussian linear model with fixed design
The setting. We observe n samples \{(Y_t, x_t)\}_{t=1}^n that follows the linear regression model
Y_t = \langle x_t, \theta_\star \rangle + \eta_t,where the design vector x_t \in \mathbb{R}^d is fixed and known, the noise \eta_t is drawn i.i.d from a Gaussian \mathcal{N}(0, 1) and the parameter of interest is \theta_\star \in \mathbb{R}^d. For simplicity, we assume that the Gram matrix V:= X^\top X = \sum_{t=1}^n x_t x_t^\top is non-singular (here, X = [x_1^\top, \ldots, x_n^\top]^\top). For brevity, we can write Y = X\theta_\star + \eta with Y\in \mathbb{R}^n and \eta \in \mathbb{R}^n.
Our goal is to construct a sequence of confidence sets C_1, C_2, \ldots such that \Pr(\exists n \ge 1: \theta_\star \notin C_n) \le \delta for some confidence level \delta \in (0, 1). We will start with the maximum-likelihood estimator and then construct the confidence set for a certain n. Under the above assumptions, the maximum-likelihood estimator \hat{\theta} is given by
\hat{\theta} = V^{-1} X^\top Y,which turns out to be a Gaussian because Y is a Gaussian vector as given. In particular, we can write
\begin{align*} \hat{\theta} &\sim \mathcal{N}(\theta_\star, V^{-1}) \\ V^{1/2}(\hat{\theta} - \theta_\star) &\sim \mathcal{N}(0, I_d) \end{align*}Let Z := V^{1/2}(\hat{\theta} - \theta_\star), then we have \|Z\|_2^2 = \|\hat{\theta}-\theta_\star\|_{V}^2 follows a \mathcal{X}_d^2-distribution with d degrees of freedom. From the tail bounds of the \chi^2-distribution, we have
\begin{align*} \Pr\left(\|Z\|_2^2 \geq d + 2\sqrt{d\log(1/\delta)} + 2\log(1/\delta)\right) &\leq \delta \end{align*}So if we define the confidence set C_n as
C_n = \left\{\theta \in \mathbb{R}^d: \|\hat{\theta}-\theta\|_{V}^2 \leq d + 2\sqrt{d\log(1/\delta)} + 2\log(1/\delta)\right\},then it is a (1-\delta)-confidence set for \theta_\star.
Using union bound, we can construct a sequence of confidence sets C_1, C_2, \ldots such that \Pr(\exists n \ge 1: \theta_\star \notin C_n) \leq \delta by choosing a larger confidence set for each n:
C_n = \left\{\theta \in \mathbb{R}^d: \|\hat{\theta}-\theta\|_{V}^2 \leq d + 2\sqrt{d\log(n(n+1)/\delta)} + 2\log(n(n+1)/\delta)\right\}.