28th January

1. Fine-mapping IKZF1 gene region

The three fine-mapping results are shown below (for the ImmunoChip region covering IKZF1). Different SNPs are picked out by the three different fine-mapping results. All SNPs are present in all GWAS studies (haven’t checked Onengut in Chiou - they are not in the credible set and Chiou haven’t made their GWAS public yet), which rules out the possibility that the SNPs just weren’t included in the study.

I use the LDmatrix Tool on NIH to compute the SNP correlations in European populations. The Onengut SNPs are very highly correlated with one another, as are the Chiou SNPs. However, these sets of SNPs are not very correlated with eachother or the Robertson SNP. This rules out the possibility that we can just use one of these SNPs which will act as a proxy for them all.

chiou_snps <- c("rs10262731", "rs28625633", "rs10236879")
onengut_snps <- c("rs11770117", "rs12719030", "rs11764792")
robertson_snps <- c("rs6944602")

I now plot the available GWAS \(p\)-values against eachother, highlighting the prioritsed SNPs from each study (coloured as in above plot).

2. ChIP data

I investigate the enrichment of T1D SNPs in IKZF1 ChIP peaks in aCD4 and LCLs.

The first plots are for when I do not exclude the MHC - this is the pattern that we saw last week.

Next, I exclude the MHC and the results are much more sensible.

I next investigate only the SNPs present in both the Cooper and Robertson GWAS data sets. Here, the results are slightly different to that above as Cooper SNPs are not enriched.

On the Manhattan plots, I’ve coloured SNPs red if they overlap a ChIP peak.

FALSE `summarise()` ungrouping output (override with `.groups` argument)

FALSE `summarise()` ungrouping output (override with `.groups` argument)

FALSE `summarise()` ungrouping output (override with `.groups` argument)

For a sanity check, I also overlap the T1D SNPs with IKZF1 ChIP data in Hep G2 cells (liver). Of the 715,026 SNPs, only 238 overlap a peak and these are anti-enriched for small \(p\).

## `summarise()` ungrouping output (override with `.groups` argument)

So what does this tell us?

There’s not really much enrichment for T1D SNPs in IKZF1 ChIP peaks in any of the cell types…

TF binding sites

I’ve been using the Funk et al. data whereby they do DNase-seq footprinting and match up the corresponding transcription factor using motif information. This is done at the tissue level for 1500 TFs.

I’ve also come across the paper: “Global reference mapping of human transcription factor footprints” in Nature, which came out at a similar time to Funk et al. but they don’t cite eachother…

This paper states: “We also show that the enrichment of genetic variants associated with diseases or phenotypic traits in regulatory regions is almost entirely attributable to variants within footprints, and that functional variants that affect transcription factor occupancy are nearly evenly partitioned between loss- and gain-of-function alleles”… “We thus conclude that the genetic signals from disease- and trait-associated variants within DHSs emanate from TF footprints, and that variants within footprints are major contributors to trait heritability”, which gives me motivation for this work.
They found that within each biosample, footprints encompassed an average of around 7.6 Mb (0.2%) of the genome.
They focus on footprinting the DNA in various cell types (including loads of primary immune cell types) but then generate a consensus set of footprints (present in one or more tissues) and it is for these consensus footprints that they overlap TF archetypes. “Briefly, for each genomic locus, we aligned the location and dispersion of footprints across datasets to delineate consensus coordinates supported by at least 50% of all footprint-contributing datasets”.
This means that they do not have info on specific TFs (instead, done at the non-redundant archetype level) in specific cell types (instead, they look at consensus footprints and I can’t find info that relates these back to the specific cell types).
92% of footprints contained a single TF archetype recognition site in their model. They suggest that a typical DHS contains about 5/6 directly bound TFs spaced roughly 20 bp apart.
They also have enough read depth to quantify allelic imbalance (where the variants at a heterozygous site affects DNA accessibility). They therefore conservatively identified 120,00 chromatin altering variants (CAVs) that altered DNA accessibility on individual alleles (e.g. a variant creates a de novo TF footprint).
IKZF1 is in TF archetype 173 (“ZNF143:C2H2”), which has 7 members (IKZF1 in mice and human, ZNF76 in human, ZN143 in human and mice and THA11 in human and mice). See https://resources.altius.org/~jvierstra/projects/motif-clustering/releases/v1.0/cluster_viz.html

I don’t think they have data for predicted TF occupancy at the cell type level, but they do have consensus footprints (overlapping footprinted regions across individual biosamples) that they’ve allocated TFs to. Below, I’ve extracted rows with “IKZF1_HUMAN.H11MO.0.C” from the collapsed_motifs_overlapping_consensus_footprints_hg38.bed.gz file. (see https://www.vierstra.org/resources/dgf)

##    contig     start       end motif_cluster score strand thickStart  thickEnd
## 1:  chr10 100025265 100025284        ZNF143     0      +  100025265 100025284
## 2:  chr10 100027643 100027662        ZNF143     0      -  100027643 100027662
## 3:  chr10 100045854 100045873        ZNF143     0      +  100045854 100045873
## 4:  chr10 100101130 100101149        ZNF143     0      -  100101130 100101149
## 5:  chr10 100185435 100185454        ZNF143     0      -  100185435 100185454
## 6:  chr10 100246054 100246073        ZNF143     0      -  100246054 100246073
##     itemRgb            best_model match_score  DBD num_models
## 1: 0,28,255 IKZF1_HUMAN.H11MO.0.C      8.4535 C2H2          2
## 2: 0,28,255 IKZF1_HUMAN.H11MO.0.C      8.4535 C2H2          2
## 3: 0,28,255 IKZF1_HUMAN.H11MO.0.C      8.4535 C2H2          2
## 4: 0,28,255 IKZF1_HUMAN.H11MO.0.C      7.7488 C2H2          2
## 5: 0,28,255 IKZF1_HUMAN.H11MO.0.C      7.7488 C2H2          2
## 6: 0,28,255 IKZF1_HUMAN.H11MO.0.C      7.7488 C2H2          2

but this isn’t cell type specific…

Maybe I should be focussing on the CAVs and whether these overlap any T1D SNPs.

Notes/comments

Need to run SEMpl method (having problems with the dependencies).
Look into case-only SNP interaction. Look if IKZF1 region fine-mapped variant interacts with rs1633081 - less IKZF1 and harder to bind may be significant.
Chiou T1D GWAS - see Fig 3D, IKZFs are there.
Ikaros prevents autoimmunity by controlling…. https://www.nature.com/articles/s41590-019-0490-2.epdf?shared_access_token=XIS5wmzby9b3VYWR8S2VT9RgN0jAjWel9jnR3ZoTv0NoZPKihh9ntSrvM1OiwISwjnkJx0deQTv-2PFoMwRj2omFMhwn4NNoBAuRNP9s6tlkxT8VNORSQV0f5zCN56OvYHjSMGYTwU7xLFamnuQ0HQ%3D%3D
Loss of B-cell anergy in T1D… https://pubmed.ncbi.nlm.nih.gov/29343548/
Leveraging supervised learning… https://www.biorxiv.org/content/10.1101/2020.10.20.347294v2

28th January

Anna Hutchinson

1. Fine-mapping IKZF1 gene region

2. ChIP data

TF binding sites

Notes/comments