1. Fine-mapping IKZF1 gene region


The three fine-mapping results are shown below (for the ImmunoChip region covering IKZF1). Different SNPs are picked out by the three different fine-mapping results. All SNPs are present in all GWAS studies (haven’t checked Onengut in Chiou - they are not in the credible set and Chiou haven’t made their GWAS public yet), which rules out the possibility that the SNPs just weren’t included in the study.


I use the LDmatrix Tool on NIH to compute the SNP correlations in European populations. The Onengut SNPs are very highly correlated with one another, as are the Chiou SNPs. However, these sets of SNPs are not very correlated with eachother or the Robertson SNP. This rules out the possibility that we can just use one of these SNPs which will act as a proxy for them all.

chiou_snps <- c("rs10262731", "rs28625633", "rs10236879")
onengut_snps <- c("rs11770117", "rs12719030", "rs11764792")
robertson_snps <- c("rs6944602")


I now plot the available GWAS \(p\)-values against eachother, highlighting the prioritsed SNPs from each study (coloured as in above plot).


2. ChIP data


I investigate the enrichment of T1D SNPs in IKZF1 ChIP peaks in aCD4 and LCLs.

The first plots are for when I do not exclude the MHC - this is the pattern that we saw last week.

Next, I exclude the MHC and the results are much more sensible.


I next investigate only the SNPs present in both the Cooper and Robertson GWAS data sets. Here, the results are slightly different to that above as Cooper SNPs are not enriched.

On the Manhattan plots, I’ve coloured SNPs red if they overlap a ChIP peak.

FALSE `summarise()` ungrouping output (override with `.groups` argument)

FALSE `summarise()` ungrouping output (override with `.groups` argument)

FALSE `summarise()` ungrouping output (override with `.groups` argument)


For a sanity check, I also overlap the T1D SNPs with IKZF1 ChIP data in Hep G2 cells (liver). Of the 715,026 SNPs, only 238 overlap a peak and these are anti-enriched for small \(p\).

## `summarise()` ungrouping output (override with `.groups` argument)


So what does this tell us?


TF binding sites

I’ve been using the Funk et al. data whereby they do DNase-seq footprinting and match up the corresponding transcription factor using motif information. This is done at the tissue level for 1500 TFs.

I’ve also come across the paper: “Global reference mapping of human transcription factor footprints” in Nature, which came out at a similar time to Funk et al. but they don’t cite eachother…

I don’t think they have data for predicted TF occupancy at the cell type level, but they do have consensus footprints (overlapping footprinted regions across individual biosamples) that they’ve allocated TFs to. Below, I’ve extracted rows with “IKZF1_HUMAN.H11MO.0.C” from the collapsed_motifs_overlapping_consensus_footprints_hg38.bed.gz file. (see https://www.vierstra.org/resources/dgf)

##    contig     start       end motif_cluster score strand thickStart  thickEnd
## 1:  chr10 100025265 100025284        ZNF143     0      +  100025265 100025284
## 2:  chr10 100027643 100027662        ZNF143     0      -  100027643 100027662
## 3:  chr10 100045854 100045873        ZNF143     0      +  100045854 100045873
## 4:  chr10 100101130 100101149        ZNF143     0      -  100101130 100101149
## 5:  chr10 100185435 100185454        ZNF143     0      -  100185435 100185454
## 6:  chr10 100246054 100246073        ZNF143     0      -  100246054 100246073
##     itemRgb            best_model match_score  DBD num_models
## 1: 0,28,255 IKZF1_HUMAN.H11MO.0.C      8.4535 C2H2          2
## 2: 0,28,255 IKZF1_HUMAN.H11MO.0.C      8.4535 C2H2          2
## 3: 0,28,255 IKZF1_HUMAN.H11MO.0.C      8.4535 C2H2          2
## 4: 0,28,255 IKZF1_HUMAN.H11MO.0.C      7.7488 C2H2          2
## 5: 0,28,255 IKZF1_HUMAN.H11MO.0.C      7.7488 C2H2          2
## 6: 0,28,255 IKZF1_HUMAN.H11MO.0.C      7.7488 C2H2          2

but this isn’t cell type specific…

Maybe I should be focussing on the CAVs and whether these overlap any T1D SNPs.


Notes/comments