The methods discussed here are concerned with finding the enrichment of GWAS signals in specific functional annotations or cell types/ tissues, either as an intermediate step in solving a different biological question or as the ultimate goal of the method. For example, finding that Crohn’s disease associated variants are enriched in open chromatin regions in a specific blood cell type. This may help with finding disease relevant cell types/ tissues for the development of therapeutic targets or in fine-mapping whereby association statistics can be reweighted (e.g. using enrichment information to define prior probabilities of causality in Bayesian fine-mapping).

The most basic functional enrichment methods estimate enrichment of association P values on the basis of comparisons of the full set of GWA variants (or a subset of those reaching genome wide significance). This yields two sets of P values; those from variants overlapping the annotation and those from variants not overlapping the annotation. Standard statistical tests (e.g. Kolmogorov-Smirnov test) can then be used to infer the probability that the two sets of P values were drawn from the same distribution, and thus infer enrichment. This method highlights some valid findings, but statistical concerns (e.g. not controlling for confounders) make it hard to trust unexpected results.

Due to these confounding effects, the P values need to be adjusted using a well specified null distribution of the test statistic. For example, if the method does not take into account the non-random distribution of SNPs and annotations across the genome then this may lead to spurious results. The existing methods vary with how they define this null distribution and deal with potential confounding. For example, GoShifter uses random permutations to estimate the empirical enrichment under the null, GREGOR uses 500 matched SNPs (by (i) number of variants in LD, (ii) MAF and (iii) distance to nearest gene) to assess enrichment and GARFIELD uses feature matching (by (i) nearest TSS and (ii) LD proxies) in logistic regression to quantify enrichment.

The existing methods discussed can be broadly grouped into:

  1. Matched SNP set methods (SNPs matched by confounding features and analysis repeated several times to compute matched null enrichment statistics which are then used to adjust observed P value for these confounders).

  2. Circularised permutation methods (originally based on LD block subsampling methods, circularise SNPs and annotation overlap to generate a set of empirically derived null statistics, the empirical P value is computed as the proportion of null statistics exceeding the observed overlap proportion).

  3. Statistical modelling methods


The main existing methods that leverage GWAS findings with functional annotations are:

1. GPA (2014):

2. fgwas (2014):

3. (Stratified) LD score regression (2015):

4. GoShifter (2015):

5. GREGOR (2015):

6. GARFIELD (2019):


Limitations