### 1. $$Z$$-scores

GWAS analysis typically proceeds by fitting single-SNP logistic regression models.

For each SNP $$i$$ typed in the study, the following model is fitted: $\begin{equation} logit(P(Y=1|X_i=x_i))=\beta_0+x_i\beta_{i}+\epsilon\,, \end{equation}$ where $$Y$$ is a binary indicator of disease (0 = no disease, 1 = diseased), $$X_{i}$$ is the genotype information at SNP $$i$$ (0, 1 or 2 for how many copies of the risk allele are present at that position), $$\beta_{i}$$ is the regression coefficient quantifying the evidence of an association between SNP $$i$$ and the disease, and $$\epsilon$$ is a normally distributed error term.

The marginal $$Z$$ scores for each SNP are derived by dividing the estimated regression coefficient by it’s standard error, $\begin{equation} Z_i=\dfrac{\hat\beta_i}{\sqrt{V_i}}\,, \end{equation}$ where $$V_i=var(\hat\beta_i)$$.

### 2. Posterior probabilities of causality

The posterior probabilities of causality (PP) for each SNP $$i$$ in an associated genomic region with $$k$$ SNPs can be calculated, $\begin{equation} PP_i=P(\beta_i \sim N(0,W),\text{ }i \text{ causal}|D)\,, \quad i \in \{1,...,k\} \end{equation}$

where $$D$$ is the genotype data (0, 1 or 2 counts of the minor allele) for the entire genomic region and $$W$$ is chosen to reflect the researcher’s prior belief on the variability of the true OR. We chose to set to $$W=0.2$$ in our method, reflecting a belief that 95% of ORs range from $$exp(-1.96\times 0.2)=0.68$$ to $$exp(1.96\times 0.2)=1.48$$.

Bayes theorem can be used to rewrite this in terms of the likelihood and the prior, \begin{equation} \begin{aligned} PP_i=P(\beta_i \sim N(0,W),\text{ }i \text{ causal}|D)\propto P(D|\beta_i\sim N(0,W),\text{ }i\text{ causal})\times P(\beta_i \sim N(0,W),\text{ }i\text{ causal}). \end{aligned} \end{equation}

The prior term, $$P(\beta_i \sim N(0,W),\text{ }i\text{ causal})$$, is easy since each SNP is assumed to be equally likely to be causal - i.e. $$P(\beta_i \sim N(0,W),\text{ }i\text{ causal})=\frac{1}{k}$$.

The likelihood requires more thought. Assuming that there is only one CV per region and that this is typed in the study, then if SNP $$i$$ is causal, $$\beta_i\neq 0$$ and $$\beta_j$$ (for $$j\neq i$$) is non-zero only through LD between SNPs $$i$$ and $$j$$ so that,

\begin{equation} \begin{aligned} P(D|\beta_i\sim N(0,W),\text{ }i\text{ causal}) = P(D_i|\beta_i\sim N(0,W),\text{ }i\text{ causal}) \times P(D_{-i}|D_i,\text{ }\beta_i\sim N(0,W),\text{ }i\text{ causal}) \\ = P(D_i |\beta_i\sim N(0,W),\text{ }i\text{ causal}) \times P(D_{-i}|D_i,\text{ }i\text{ causal})\,, \end{aligned} \end{equation}

since $$D_{-i}$$ is independent of $$\beta_i$$ given $$D_i$$ ($$D_i$$ and $$D_{-i}$$ are the genotype data at SNP $$i$$ and at the remaining SNPs in the genomic region, respectively).

We can substitute this form of the likelihood into the equation for the PPs,

$\begin{equation} PP_i\propto P(D_i|\beta_i \sim N(0,W),\text{ }i \text{ causal})\,. \end{equation}$

We divide by the probability of the data under the null hypothesis of no genetic effect to find that,

$\begin{equation} PP_i\propto \frac{P(D_i|\beta_i \sim N(0,W),\text{ }i \text{ causal})}{P(D_i|H_0)}= BF_i\,, \end{equation}$

where $$BF_i$$ is the Bayes factor for SNP $$i$$, measuring the ratio of the probabilities of the data at SNP $$i$$ given the alternative (SNP $$i$$ is causal) and the null (no genetic effect) models.

This means that the PPs are proportional to the per-SNP BFs and conviently we can use Wakefield’s asymptotic approach to derive these,

Given that $$\hat\beta_i\sim N(\beta_i,V_i)$$ and $$\beta_i\sim N(0,W)$$,

$\begin{equation} ABF_i=\sqrt{\frac{V_i}{V_i+W}}exp\left(\frac{Z_i^2}{2}\frac{W}{(V_i+W)}\right)\,, \end{equation}$ where $$Z_i^2=\dfrac{\hat\beta_i^2}{V_i}$$ is the squared marginal $$Z$$ score for SNP $$i$$.