1. Fine-mapping


Fine-mapping is limited in that it only pinpoints putative causal variants and does not elucidate the mechanisms by which the causal variants operate to cause disease. Fine-mapping is a necessary preliminary step to ensure efficient allocation of resources (e.g. to a handful of SNPs within the 99% credible sets) but functional genomic techniques should be exploited to dissect the underlying biology. For example, our method was used as a proof-of-principle to show that the one variant in the region which had a functional effect (measured using MPRA) was contained within the corrected 99% credible set, whilst the other 2 variants contained in the set showed no functional effect.


2. Functional Genomics


My current work is focussed on investigating the relationships between functional data and association/causality statistics. To do this, I have downloaded genomic annotations (100bp resolution) for 19 human cell types from the Segway encyclopedia and overlaid SNPs in the T1D GWAS. For each of the 123130 SNPs, I have the functional annotation in the 19 human cell types and their P value with T1D (need to extend this so I have PPs too - currently only have PPs for \(\approx 16,000\) SNPs from my previous analysis). Some questions I have explored using regression (logistic/penalised/quantile) include “which annotation in which cell type is most significant for P value?” and “falling in an active region in which cell type is most significant for P value?”. However, this analysis does not account for LD and the non-random distribution of genomic annotations. I.e. I have not specified the null test statistic distribution to account for confounding.

I am also investigating techniques to incorporate functional data to reweight association/causality statistics. Namely, I am hoping to use the cFDR method method to reweight GWAS P values using a binary indicator of active/inactive chromatin. I would then like to extend the cFDR method to PPs (rather than P values) and implement the reweighting PP method in the prostate cancer paper.

Note that my functional data only measures the activity of genomic regions (e.g. if it is transcribed region) and it does not model the potential for biochemical specificity that could allow certain regions to regulate only specific regions. It may be fruitful to incorporate contact information, such as that from CHi-C. Indeed, the CRISPRi-FlowFISH paper suggests that the percentage contribution of an enhancer on a gene is proportional the mean of the ATAC-seq and H3K27ac ChIP-Seq peaks at the enhancer and the KR-normalised Hi-C contact frequency between the enchancer and the gene. Due to resolution problems, Peaky could be used to further this analysis.

Idea: Extend current functional data (measuring activity at that SNP) using peaky data (but then would I need to specify genes…).