1. Introduction
  2. Simple Example
  3. Realistic Example
  4. Comparison with Bayesian Approach
  5. Notes and Queries


The SUm of SIngle Effects (SuSiE) model is an extension of single effect regression (SER), where the vector of coefficients has one non-zero entry representing the single effect variable. The SER method is extended to allow for multiple effect variables by writing the vector of coefficients, \(\bf{b}\), as the sum of multiple single-effect vectors, \(\bf{b_1},...,\bf{b_L}\). So instead of just a single \(\bf{b}\) vector, there are now \(L\) of these. I.e.

\[\pmb{y}=\pmb{Xb}+\pmb{e}\] \[\pmb{e} \sim N(0, \sigma^2I_n)\]

So that \(\sigma^2\) is the residual variance.

\[\pmb{b}=\sum_{l=1}^L \pmb{b_l}\] So that there are now \(L\) vectors with one non-zero element.

\[\pmb{b_l}=\pmb{\gamma_l} b_l\]

Non-bold \(b_l\) is a scalar for the effect size of the non-zero element, with

\[b_l \sim N(0, \sigma^2_{0l})\] and \(\pmb\gamma_l\) is a binary vector with exactly one non-zero element.

\[\pmb{\gamma_l} \sim Mult(1, \pmb{\pi})\]

Notice, that \(\sigma^2_{0l}\) can vary among components, and if \(L=1\) then this is just the standard SER model. For now, \(\sigma^2\) (residual variance) and \(\bf{\sigma^2_{0}}\) (prior variance of the non-zero effect) are assumed known, but can be estimated as hyper-parameters using an empirical Bayes approach.

Some of the \(\bf{b_l}\) vectors may be the same (having the same non-zero entry) and thus, at most \(L\) variables have non-zero coefficients in the model.

Two key advantages:

  1. Simple method for computing approximate posterior distributions.

  2. Simple way to calculate credible sets.

    • Each \(\bf{b_l}\) captures only 1 effect, and therefore the posterior distribution on each \(\gamma_l\) can be used to compute credible sets that have high probability of containing an effect variable.

Fitting SuSiE

The authors have developed “Iterative Bayesian Stepwise Selection” (IBSS) to fit the model. This was developed using the fact that if \(\bf{b_1},...,\bf{b_{L-1}}\) are known, then estimating \(\bf{b_L}\) involves fitting the simpler SER model (i.e. subtract the known \(\bf{b}\)s out of \(\bf{y}\) and do SER (single SNP BFs etc)), and thus an iterative approach should be adopted.

Basic idea: Go through \(L\)’s and at each point calculate the residuals (removing all the effects except 1), use the SER model to fit the one we’ve left out and iterate (pretending we know the others).

This is similar to forward selection, but instead of choosing the single best variable at each step, it computes a distribution on which variable to select, thus capturing uncertainty in the selected variable (which can be used to form credible sets). I.e. for each SNP \(j\) calculate it’s BF and PP, which gives a ‘weight’ on how good that SNP is (some quantification rather than forward selection which just includes the best one). When calculating the residuals, a weighted average of the SNPs is removed. It does this “backwards” as well.

It is shown in the paper that overestimating the value for \(L\) is better than underestimating, so it should be treated as an upper bound on the number of causal SNPs. This doesn’t turn out to be much of a problem, because if there are say 2 effects and we’ve chosen \(L=10\), then the latter 7 posterior probabilities will be spread over a very large number of SNPs as it “doesn’t know where to put it” –> get a very big credible set of not very correlated variants, can just remove these.