Identifying a disease-associated genomic locus is a small step to better understanding the biology of disease. Fine-mapping may then be used to identify the specific genetic variant within the associated locus that is causal, and then the target gene and mechanisms (which are often tissue-specific) need to be identified. This is difficult partly due to the fact that >90% of disease-associated variants are located in non-protein coding regions of the genome, and many are far away from the nearest known gene 12.


General Results 3


Testing the function of a regulatory variant 4

Suppose that we have cis-regulatory causal variant and we want to test it’s function, that is, whether it e.g. alters gene expression, affects a binding site or violates the protein structure.


Linking to target genes

3 main lines of evidence for linking variants to their target genes:

  1. Physical contact (Hi-C)
  2. Functional (look at activity correlation across genome, e.g. using chromatin summary tracks)
  3. Genetic (eQTL analysis - link genetic variants to gene expression of particular genes).

Determining molecular function 6


Summary

We now have vast amounts of GWAS data linking genomic loci to complex diseases. Focus should now shift to finding meaning from these associations. We don’t just want to find disease associated genetic variation, but we need to consider the intermediaries in this process. For example:

Note that once we move out of the genetic space, the effects are bi-directional, e.g. the disease could be affecting gene expression elsewhere in the genome, rather than the genetic basis of the disease affecting this gene expression or maybe this is due to correlation?

“We thus suggest that an increased emphasis on the downstream functional dissection of already-identified GWAS loci, rather than a search for ever more GWAS loci, might be most likely to benefit knowledge of pathophysiology” 7.


Methods


E.g. GRAM

A generalized model to predict the molecular effect of a non-coding variant in a cell-type specific manner.

GRAM is a generalised model to predict the expression-modulating effect of a non-coding variant in a cell-specific manner. I.e. estimate the expression consequence of a non-coding variant.

This new method has been applied to fine-mapping the causal variants in 5 LD blocks that are associated with prostate cancer. It requires gene expression and SELEX DeepBind scores (https://www.nature.com/articles/nmeth.3559). 561 eQTL SNPs from the 5 LD blocks were identified and “GRAMMAR” was used to get the prediction score for each allele in each patient.


E.g. FUMA

Functionally annotates GWAS findings and prioritises the most likely causal SNPs and genes using information from 18 biological data repositories and tools.

SNP2GENE process:

  1. Input is GWAS summary statistics. From these, 1000 Genomes LD structures are used to find independent significant SNP associations (\(P<5e-8\) and \(r^2<0.6\)). For each of these independent significant SNPs, all other SNPs with \(r^2\geq0.6\) are included in the list of “candidate SNPs”

  2. The candidate SNPs are then annotated for functional consequences on gene functions (using ANNOVAR), deleteriousness score (CADD score), potential regulatory function, effects on gene expression and 3D structure (Hi-C data).

  3. Functionally annotated SNPs are mapped to genes based on functional consequences on genes by (i) physical position on the genome (positional mapping) (ii) eQTL associations (iii) 3D chromatin interactions. At the end of this step, the user has a set of prioritised genes.

GENE2FUNC process:

Biological information for each prioritised gene is provided. E.g. Tissue specific expression patterns based on GTEx v6 RNA-seq data for each gene are visualized as an interactive heatmap.