Confidence sets for Gaussian linear models

In statistical inference, we often want to construct confidence sets for the parameter of interest; for example, you may have heard of the 95% confidence interval. Confidence sets are a way to quantify the uncertainty in our estimates. At its core, the construction of confidence sets is typically achieved through inverting some concentration inequalities.

Gaussian linear model with fixed design

The setting. We observe $n$ samples $\{(Y_t, x_t)\}_{t=1}^n$ that follows the linear regression model

$Y_t = \langle x_t, \theta_\star \rangle + \eta_t,$

where the design vector $x_t \in \mathbb{R}^d$ is fixed and known, the noise $\eta_t$ is drawn i.i.d from a Gaussian $\mathcal{N}(0, 1)$ and the parameter of interest is $\theta_\star \in \mathbb{R}^d$ . For simplicity, we assume that the Gram matrix $V:= X^\top X = \sum_{t=1}^n x_t x_t^\top$ is non-singular (here, $X = [x_1^\top, \ldots, x_n^\top]^\top$ ). For brevity, we can write $Y = X\theta_\star + \eta$ with $Y\in \mathbb{R}^n$ and $\eta \in \mathbb{R}^n$ .

Our goal is to construct a sequence of confidence sets $C_1, C_2, \ldots$ such that $\Pr(\exists n \ge 1: \theta_\star \notin C_n) \le \delta$ for some confidence level $\delta \in (0, 1)$ . We will start with the maximum-likelihood estimator and then construct the confidence set for a certain $n$ . Under the above assumptions, the maximum-likelihood estimator $\hat{\theta}$ is given by

$\hat{\theta} = V^{-1} X^\top Y,$

which turns out to be a Gaussian because $Y$ is a Gaussian vector as given. In particular, we can write

$\begin{align*} \hat{\theta} &\sim \mathcal{N}(\theta_\star, V^{-1}) \\ V^{1/2}(\hat{\theta} - \theta_\star) &\sim \mathcal{N}(0, I_d) \end{align*}$

Let $Z := V^{1/2}(\hat{\theta} - \theta_\star)$ , then we have $\|Z\|_2^2 = \|\hat{\theta}-\theta_\star\|_{V}^2$ follows a $\mathcal{X}_d^2$ -distribution with $d$ degrees of freedom. From the tail bounds of the $\chi^2$ -distribution, we have

$\begin{align*} \Pr\left(\|Z\|_2^2 \geq d + 2\sqrt{d\log(1/\delta)} + 2\log(1/\delta)\right) &\leq \delta \end{align*}$

So if we define the confidence set $C_n$ as

$C_n = \left\{\theta \in \mathbb{R}^d: \|\hat{\theta}-\theta\|_{V}^2 \leq d + 2\sqrt{d\log(1/\delta)} + 2\log(1/\delta)\right\},$

then it is a $(1-\delta)$ -confidence set for $\theta_\star$ .

Using union bound, we can construct a sequence of confidence sets $C_1, C_2, \ldots$ such that $\Pr(\exists n \ge 1: \theta_\star \notin C_n) \leq \delta$ by choosing a larger confidence set for each $n$ :

$C_n = \left\{\theta \in \mathbb{R}^d: \|\hat{\theta}-\theta\|_{V}^2 \leq d + 2\sqrt{d\log(n(n+1)/\delta)} + 2\log(n(n+1)/\delta)\right\}.$

Khánh Vũ

Confidence sets for Gaussian linear models

Gaussian linear model with fixed design