Secret paper

Meeting with him on Friday: https://docs.google.com/document/d/1vIyvrJlLwQR49vzHn-a0XGlQ-4-vVXtfJI5EMSORwPM/edit


Nobel et al. data

My previous analysis of the Nobel et al. data of genome annotations for 167 cell types only considers SNPs on the immunochip that were in the 39 T1D-associated genomic regions that I analysed in my previous project (approximately 17000 of these).

I’ve found that the distribution of genomic annotations across the whole genome differs to that across SNPs. Since my research focusses on SNPs, it would be useful to use the distibution of genomic annotations across all immunochip SNPs as the baseline (rather than the whole genome). However, there are approximately 200,000 SNPs on the immunochip, which would take too long to map to the relevent genomic annotation in many cell types. For this reason, I keep the T1D SNPs and add additional SNPs genotyped on the immunochip, although in the future I should run this for all immunochip SNPs.

Note that in the existing methods, this “empirical distribution of the enrichment under the null hypothesis” is estimated in various ways. E.g. GARFIELD uses replication over variants matched by key metrics and GoShifter uses a circularised permuation method that accounts for the non-random distribution of genomic annotations with respect to each other and the correlation between GWAS signals caused by LD to estimate the null enrichment statistics.

My baseline SNP file contains annotation information of 19 cell types for 36,278 SNPs on the immunochip (including those in my original T1D analysis).


T1D credible set SNP enrichment

I investigate the enrichment of annotations amongst T1D 95% credible set variants. To do this, I obtain the proportions of annotations across only the T1D 95% credible set variants and divide these by the original proportions (across a sample of immunochip SNPs). The plot is fairly sparse because the annotation must be present in the 95% credible set T1D SNPs (only ~700 of these). Lines above 1 indicate positive enrichment of that annotation in that cell type, and lines below 1 indicate negative enrichment of that annotation in that cell type in 95% credible set T1D SNPs.

The results look sensible:

  • Constitutive heterochromatin is negatively enriched in credible set SNPs in most cell types (except pancreatic islets and CD14 and CD19 cells).

  • Facultative heterochromatin is negatively enriched in credible set SNPs in most cell types (except brain and CD14).

  • Enhancers are positively enriched in credible set SNPs in most cell types (except brain).

  • Promoters are positively enriched in credible set SNPs in most cell types for which there is data.

  • Quiescent is negatively enriched in credible set SNPs in most cell types.