Bayesian Sequential Hypothesis Testing

Frequentist algorithms are often Bayesian.

Published

04 February 2024

A Bayesian Model

We show how mSPRT can be thought of as a Bayesian Algorithm.

A Bayesian Model

Consider the following Bayesian model for the data: $Y_1|\alpha, \delta, M_1 \sim \mathcal{N}\left(\frac{1}{n_1} \alpha + \frac{1}{2n_1} \delta, \sigma^2 I_{n_1}\right)$

\[Y_2|\alpha, \delta, M_1 \sim \mathcal{N}\left(\frac{1}{n_2} \alpha - \frac{1}{2n_2} \delta, \sigma^2 I_{n_2}\right)\] \[\delta|M_1 \sim \mathcal{N} (0, \tau^2)\] \[p(\alpha|M_1) \propto 1\]

Note that the prior distribution for the lift $\delta$ is equal to the mixing distribution in the mSPRT. If you are not familiar with this vectorized notation, an alternative notation is

$$y_{1i}	\alpha, \delta, M_1 \sim \mathcal{N} (\alpha + \frac{\delta}{2}, \sigma^2) \text{ } \forall i = 1, \ldots, n_1$$
$$y_{2j}	\alpha, \delta, M_1 \sim \mathcal{N} (\alpha - \frac{\delta}{2}, \sigma^2) \text{ } \forall j = 1, \ldots, n_2$$

In this model for the “alternative” we have a grand mean $\alpha$ and a lift parameter $\delta$.

Following a Bayesian approach, we assign $\alpha$ a uniform prior, and $\delta$ a Normal prior. Under this model, the interpretation of $\delta$ is still unchanged as $E[y_1]-E[y_2]$ i.e., the expected difference in means. Moreover, it also follows under this model that $\bar{Y}_1 - \bar{Y}_2 \sim \mathcal{N} (\delta, \sigma^2(\frac{1}{n_1} + \frac{1}{n_2}))$. Notice, however, that we have conditioned on this model being the alternative model $M_1$. The “null” model can be expressed as

$$Y_1	\alpha, M_0 \sim \mathcal{N} (\frac{1}{n_1} \alpha, \sigma^2 I_{n_1})$$
$$Y_2	\alpha, M_0 \sim \mathcal{N} (\frac{1}{n_2} \alpha, \sigma^2 I_{n_2})$$
$$p(\alpha	M_0) \propto 1$

in which there is no $\delta$ i.e., it is the same as the alternative model except that our prior on $\delta$ concentrates all of its mass at zero. A Bayesian would test the hypothesis that $\delta \neq 0$ by looking at the Bayes Factor between models $M_1$ and $M_0$

\[\frac{p(Y_1, Y_2|M_1)}{p(Y_1, Y_2|M_0)} = \frac{\int p(Y_1, Y_2|\alpha, \delta)d\alpha d\delta}{\int p(Y_1, Y_2|\alpha)d\alpha},\]

note that we are explicitly integrating out the nuisance intercept parameter $\alpha$ w.r.t. a uniform prior. Let’s go ahead and compute these integrals!

To make the mathematics easier, it is helpful to stack the observations $Y_1, Y_2$ into a larger multivariate normal:

\[Y = [Y_1', Y_2']' \sim \mathcal{N} (1_n\alpha + X\delta, \sigma^2I_n)\]

where $n = n_1 + n_2$, $1_n$ is a vector of 1’s, $I_n$ is an identity matrix and $X' = \frac{1}{2}[1_{n_1}', -1_{n_2}']$.

The first trick is to marginalize out the intercept parameter by computing the component of $Y$ that is orthogonal to the column space of $1_n$. Dropping the $n$’s from now on, let $P_1 = \frac{1}{n}1(1'1)^{-1}1' = \frac{1}{n} 11'$ be the projection operator onto the column space of $1$. This neatly isolates the component of $\alpha$ in the quadratic form i.e.

\[\begin{aligned} \|Y - 1\alpha - X\delta\|^2_2 &= (Y - 1\alpha - X\delta)'(Y - 1\alpha - X\delta) \\ &= (Y - 1\alpha - X\delta)'(P_1 + I - P_1)(Y - 1\alpha - X\delta) \\ &= \|P_1(Y - 1\alpha - X\delta)\|^2_2 + \|(I - P_1)(Y - 1\alpha - X\delta)\|^2_2 \\ &= n(\alpha - \bar{Y} - \bar{X}\delta)^2 + \|Y_c - X_c\delta\|^2_2 \end{aligned}\]

where $Y_c$ and $X_c$ are the centered observations and design matrix

\[Y_c = (I - P_1)Y = Y - 1 \bar{Y}\] \[X_c = (I - P_1)X = \frac{1}{2} \begin{bmatrix} 1_{n_1} \\ -1_{n_2} \end{bmatrix} - \frac{1}{2} \frac{n_1 - n_2}{n_1 + n_2} \begin{bmatrix} 1_{n_1} \\ 1_{n_2} \end{bmatrix} = \frac{1}{n_1 + n_2} \begin{bmatrix} n_2 1_{n_1} \\ -n_1 1_{n_2} \end{bmatrix}.\]

As a foreshadowing of a result later it is also suggestive to note at this point that

\[X_c'X_c = \frac{n_1n_2}{n_1 + n_2} = \frac{1}{\frac{1}{n_1} + \frac{1}{n_2}}.\]

This has rearranged the quadratic form in the likelihood into two terms, one a quadratic in $\alpha$, which makes the integration step easy.

\[p(Y_1, Y_2|\delta, M_1) = \frac{1}{n} \left(\frac{1}{2\pi\sigma^2}\right)^{\frac{n-1}{2}} \exp\left(-\frac{1}{2\sigma^2} \|Y_c - X_c\delta\|^2\right)\] \[p(Y_1, Y_2|M_0) = \frac{1}{n} \left(\frac{1}{2\pi\sigma^2}\right)^{\frac{n-1}{2}} \exp\left(-\frac{1}{2\sigma^2} \|Y_c\|^2\right)\]

Integrating out the intercept from the model had the effect of losing a degree of freedom and working with the centered observations and design matrix. The final integration is the marginalization step of $\delta$ w.r.t. the prior $N (0, \tau^2)$:

\[p(Y_1, Y_2|M_1) = \frac{1}{n} \left(\frac{1}{2\pi\sigma^2}\right)^{\frac{n-1}{2}} \exp\left(-\frac{1}{2\sigma^2} \|Y_c\|^2\right) \left(\frac{1}{2\pi\tau^2}\right)^{\frac{1}{2}} \exp\left(\frac{1}{2} \left(X_c'X_c\sigma^2 + 1\tau^2\right)^{-1} (X_c'Y_c\sigma^2)^2\right)\]

When forming the Bayes factor, most of these terms cancel:

\[\frac{p(Y_1, Y_2|M_1)}{p(Y_1, Y_2|M_0)} = \left(\frac{1}{\tau^2} \frac{1}{\tau^2 + X_c'X_c\sigma^2}\right)^{\frac{1}{2}} \exp\left(\frac{1}{2} \left(X_c'X_c\sigma^2 + 1\tau^2\right)^{-1} (X_c'Y_c\sigma^2)^2\right),\]

yet $X_c'X_c/\sigma^2 = 1/(\sigma^2(\frac{1}{n_1} + \frac{1}{n_2}))$ which is $1/\rho^2$ under our earlier definition of $\rho^2$ and

\[X_c'Y_c =X_c'Y_c + X_c'1 \bar{Y} = X_c'Y = \frac{n_2n_1}{n_1 + n_2} (\bar{Y}_1 - \bar{Y}_2) = \frac{\bar{Y}_1 - \bar{Y}_2}{\frac{1}{n_1} + \frac{1}{n_2}}.\]

The Bayes factor then simplifies to $\frac{p(Y_1, Y_2|M_1)}{p(Y_1, Y_2|M_0)} = \left(\frac{\rho^2}{\rho^2 + \tau^2}\right)^{\frac{1}{2}} \exp\left(\frac{1}{2} \frac{\tau^2}{\rho^2 + \tau^2} \frac{(\bar{Y}_1 - \bar{Y}_2)^2}{\rho^2}\right) =\Lambda$