T1D

I’ve started looking at the data from https://www.nature.com/articles/s41588-019-0505-9#Sec1 (Calderon et al., 2019). Specifically, dataset3 which contains significant stimulation-associated ATAC-seq peaks in a variety of immune cells (\(FDR<0.01\)).

I’ve mapped the SNPs from the Robertson 2020 T1D GWAS (https://www.biorxiv.org/content/10.1101/2020.06.19.158071v1) to any significant stimulation-associated peaks. I use the \(p\)-value of association from the meta-analysis of case-control and family-based estimates.


BACH2 example

For example, in Robertson et al., (2020) they suggest that rs72928038 is the causal variant in the \(\textit{BACH2}\) gene region (although they incorrectly refer to this SNP as rs72938038 in the manuscript abstract). Our corrcoverage method also picked this SNP out as the CV in this region.

Across all the immune cells, this SNP only falls in a significant stimulation-associated ATAC-seq peak in effector CD4+ T cells. The logFC=-1.67756 value indicates that the peak is greater in unstimulated cells than stimulated cells.

## [1] "Effector_CD4pos_T_S-Effector_CD4pos_T_U"
##           MarkerName ichipreg chromosome position         ID alleleA alleleB
## 1: chr6:90267049:G:A      yes          6 90267049 rs72928038       G       A
##    Effect_EUR    SE_EUR    AF_EUR EffectphaseIandII SEphaseIandII pphaseIandII
## 1:   0.172309 0.0202005 0.1731371            0.1635        0.0168    2.529e-22
##    hg19_pos    logFC  AveExpr         t      P.Value   adj.P.Val         B
## 1: 90976768 -1.67756 1.500223 -3.895642 0.0001394576 0.004728978 0.4942096
##    peak_id contrast    start      end  chr
## 1:   54569        4 90975833 90977316 chr6

Correlation between logFC at significant peaks

I look at the correlation between logFC values across conditions for the T1D SNPs (\(q\); if the SNP doesn’t fall in a significant peak then it is given a logFC value of 0). We see B cells clustering, various T cell subtypes clustering and monocytes and mature NK cells seperating.

qs_logFC <- lapply(res, function(x){
  
  x$logFC[is.na(x$logFC)] <- 0
  x$logFC
  
})

library(pheatmap)
pheatmap(cor(do.call(cbind, qs_logFC)), display_numbers = TRUE)


An idea would be to use the T1D GWAS \(p\)-values as \(p\) in cFDR and leverage the logFC values. However, very few of the T1D SNPs fall into significiant stimulation-associated ATAC peaks.

The proportion of SNPs that do not fall into a significant ATAC-peak in each condition is shown below:

lapply(qs_logFC, function(x) length(which(x==0))/length(x))
## $`Bulk_B_S-Bulk_B_U`
## [1] 0.9972784
## 
## $`CD8pos_T_S-CD8pos_T_U`
## [1] 0.9901304
## 
## $`Central_memory_CD8pos_T_S-Central_memory_CD8pos_T_U`
## [1] 0.9881444
## 
## $`Effector_CD4pos_T_S-Effector_CD4pos_T_U`
## [1] 0.9816202
## 
## $`Effector_memory_CD8pos_T_S-Effector_memory_CD8pos_T_U`
## [1] 0.9856424
## 
## $`Follicular_T_Helper_S-Follicular_T_Helper_U`
## [1] 0.9825516
## 
## $`Gamma_delta_T_S-Gamma_delta_T_U`
## [1] 0.9940589
## 
## $`Mature_NK_S-Mature_NK_U`
## [1] 0.9994434
## 
## $`Mem_B_S-Mem_B_U`
## [1] 0.9979385
## 
## $`Memory_Teffs_S-Memory_Teffs_U`
## [1] 0.977535
## 
## $`Memory_Tregs_S-Memory_Tregs_U`
## [1] 0.9948589
## 
## $`Monocytes_S-Monocytes_U`
## [1] 0.9999413
## 
## $`Naive_B_S-Naive_B_U`
## [1] 0.9957638
## 
## $`Naive_CD8_T_S-Naive_CD8_T_U`
## [1] 0.9961596
## 
## $`Naive_Teffs_S-Naive_Teffs_U`
## [1] 0.9825278
## 
## $`Naive_Tregs_S-Naive_Tregs_U`
## [1] 0.9998867
## 
## $`Regulatory_T_S-Regulatory_T_U`
## [1] 0.9920968
## 
## $`Th1_precursors_S-Th1_precursors_U`
## [1] 0.9776049
## 
## $`Th17_precursors_S-Th17_precursors_U`
## [1] 0.9816565
## 
## $`Th2_precursors_S-Th2_precursors_U`
## [1] 0.9810048

I think the data is too sparse to use cFDR in this way.


Credible set SNPs

Instead, I investigate whether this data can be useful to prioritise fine-mapped SNPs (from GUESSFM in immunochip regions).

I see whether any fine-mapped SNPs fall into any of the significant stimulation-associated peaks. 258 out of the 3612 SNPs fall into such peaks. Comparing the PPs for those SNPs that fall into these peaks and those that don’t, we see that there isn’t much difference.

## Warning: NAs introduced by coercion
##   [1] 1.751487e-02 4.463159e-03 1.032981e-02 1.032981e-02 1.034641e-03
##   [6] 9.777626e-04 9.659399e-04 9.291001e-04 4.831635e-03 2.979361e-03
##  [11] 4.992721e-03 3.191598e-02 5.527923e-03 5.254263e-03 1.125276e-02
##  [16] 6.335970e-03 1.610099e-03 5.075188e-03 3.278932e-01 4.719541e-02
##  [21] 3.054125e-04 3.261938e-04 2.531616e-04 1.354480e-02 7.103902e-04
##  [26] 2.471800e-04 6.892095e-03 2.800839e-02 4.981114e-02 3.506642e-02
##  [31] 5.749459e-03 7.183161e-03 3.472816e-03 6.128756e-03 7.117475e-03
##  [36] 4.605429e-03 6.682921e-03 4.582785e-03 6.682554e-03 4.568754e-03
##  [41]           NA 4.464318e-03 3.064808e-03 1.479767e-04 1.576589e-03
##  [46] 1.661200e-03 1.982979e-03 1.902357e-03 2.802051e-03 2.515725e-03
##  [51] 2.565474e-03 2.568997e-03 2.944060e-03 3.994876e-03 6.514253e-03
##  [56] 6.977154e-03 4.128837e-03 9.594409e-02 9.594409e-02 1.747487e-02
##  [61] 1.094140e-02 7.181717e-01 1.421529e-01 1.921072e-01           NA
##  [66]           NA 3.159697e-03 4.184411e-04 4.613359e-02 4.613359e-02
##  [71] 2.406655e-02 5.160325e-02 3.181202e-05 3.261307e-05 9.823060e-04
##  [76] 7.711309e-03 1.378706e-02 3.452535e-03 7.276422e-03 7.652751e-03
##  [81] 3.164903e-03 4.522758e-03 4.424692e-03 1.923178e-03 1.162119e-03
##  [86] 2.044447e-03 2.267487e-03 6.842337e-04 1.834072e-03 5.758467e-03
##  [91] 6.515168e-03 4.322066e-03 2.805850e-02 5.407387e-02 5.537699e-02
##  [96] 2.049351e-03 1.723237e-03 1.996392e-03 1.617782e-02 5.614851e-04
## [101] 4.564098e-04 4.114185e-04 7.172281e-04 2.128756e-04 3.196552e-04
## [106] 3.197025e-04 2.397953e-04 4.943888e-04 1.027246e-02 1.025522e-02
## [111] 9.864498e-02 2.846133e-03 2.533766e-03 5.879079e-03 2.587420e-03
## [116] 9.107300e-02 6.851087e-02 8.194620e-02 8.496321e-04 8.312506e-04
## [121] 7.226226e-03 8.124640e-03 8.012775e-03 8.603975e-03 7.751372e-03
## [126]           NA 2.249115e-02 7.424428e-01 1.411434e-02 2.064321e-02
## [131] 2.575877e-01 6.053995e-03 2.853635e-01 4.514695e-03 3.925788e-03
## [136] 8.165681e-03 5.130291e-03 3.914713e-03 2.164883e-02 6.391962e-03
## [141] 3.278436e-03 1.105648e-02 5.735602e-04 5.856577e-04 7.215018e-02
## [146] 7.215018e-02 6.904728e-02 6.131015e-04 6.880149e-02 6.093419e-04
## [151] 1.021231e-02 3.145773e-03 1.489722e-01 4.318087e-02 5.866830e-04
## [156] 1.283955e-03 1.048954e-02 1.926414e-02 5.127204e-01 7.506242e-03
## [161] 6.462077e-03 6.791220e-03 7.731364e-03 2.118929e-02 1.459075e-02
## [166] 1.797221e-03 1.755946e-02 1.658057e-02 1.950617e-02 6.057796e-02
## [171] 6.216390e-02 5.627531e-05 3.363373e-04 3.439321e-04 3.637896e-04
## [176] 3.362721e-04 3.385012e-04 3.546080e-04 3.546080e-04 3.794357e-04
## [181] 3.362721e-04 8.358353e-05 5.393640e-02 1.123277e-02 6.489534e-01
## [186] 9.742872e-03 2.878103e-04 9.872701e-02 3.042943e-04 3.479917e-02
## [191] 3.344597e-02 3.346390e-02 3.345903e-02 3.345903e-02 3.575665e-02
## [196] 2.137051e-02 3.210469e-03 1.480818e-03 1.535025e-03 1.525499e-03
## [201] 1.035605e-03 7.358710e-04 1.538794e-04 1.369361e-02 2.360346e-03
## [206] 3.213706e-02 6.176134e-03 5.795382e-04 4.782437e-03 6.188101e-03
## [211] 5.391730e-03 5.565050e-03 1.428015e-02 1.275601e-02 2.320828e-02
## [216] 5.027115e-02 8.796092e-03 5.341985e-03 7.027657e-04 6.945212e-02
## [221] 3.529393e-04 2.265702e-03 4.862118e-04 5.310963e-01 2.660564e-02
## [226] 5.864262e-04 3.702085e-03 1.953951e-01 1.799459e-02 1.040603e-02
## [231] 1.042768e-02 1.650826e-01 5.522610e-02 7.921473e-05 1.659734e-03
## [236] 1.437380e-03 7.510794e-02 1.347827e-01 6.264266e-04 2.315042e-03
## [241] 3.811841e-04 1.279520e-03 1.481390e-03 1.713126e-03 5.407633e-02
## [246] 5.346586e-02 4.873717e-02 2.194690e-02 3.217126e-02 4.169222e-02
## [251] 5.487649e-03 1.056871e-02 9.100420e-03 4.172347e-03 3.259407e-03
## [256] 2.199341e-04 3.430155e-04 3.702522e-02
## Warning in na.omit(as.numeric(fm_res[match(names(final_out), fm_res$Rsid), : NAs
## introduced by coercion
## Warning in na.omit(as.numeric(fm_res[-match(names(final_out), fm_res$Rsid), :
## NAs introduced by coercion


Other ideas:

  • Go through the fine-mapping results matched to ATAC-peaks and see if there is anything interesting.

  • IKZF1 CHiP-Seq data:

    • The protein encoded by \(\textit{IKZF1}\) is a transcription factor which functions as a regulator of lymphocyte differentiation.
    • The idea is that if we find T1D SNPs that fall within IKZF1 binding sites (in CD4+ T cells), then a putative causal mechanism for these SNPs would be that they disrupt IKZF1 binding in CD4+ T cells - thereby identifying the causal cell type and the functional mechanism.
    • But it seems that this is what they’ve already done in the tracks that they’ve sent = at least in the IL2RA and IL10 loci. Did they do anything with these results? Do they have the pre-processed peaks?