PAINTOR

Aim: Use PAINTOR for T1D fine-mapping and functional enrichment. “A key feature of our approach is that it empirically estimates the contribution of each functional annotation to the trait of interest directly from summary association statistics while allowing for multiple causal variants at any risk locus”.

Paper: https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1004722

Software: https://github.com/gkichaev/PAINTOR_V3.0

Implementation

1. Make input files

For each of the 39 T1D-associated regions, I generate 3 files with a row for each SNP:

Locus file (“CHR”,“POS”,“RSID”,“ZSCORE” columns)
LD matrix with pairwise correlations (a symmetric matrix of Pearson correlation coefficients)
Annotation matrix file (binary indicator of each annotation mark - note that these are mutually exclusive)

(all in PAINTOR_thymus on HPC).

2. Generate input and output directories

I make an input directory (InDirectory) consisting of each of the input files and an additional file listing the loci (called input.file). I also make an empty output directory (OutDirectory) to store the results.

3. Run PAINTOR

In the PAINTOR_V3.0 directory run:

./PAINTOR -input InDirectory/input.file -in InDirectory/ -out OutDirectory/ -Zhead ZSCORE -LDname LD1 -enumerate 3 -annotations ConstitutiveHet Enhancer LowConfidence Promoter Quiescent RegPermissive Transcribed NA

Note: I get the following warning

Warning! The estimated N*h2_g,local for locus Locus1 is: 1990.26 This may potentially indicate mismatch/error in the LD-matrix

This produces output files containing the PPs.

Suggested pipeline to find relevant annotations

“In order to determine which annotations are relevant to the phenotype being considered, we recommend running PAINTOR on each annotation independently.” E.g.

./PAINTOR -input InDirectory/input.file -in InDirectory/ -out OutDirectory/ -Zhead ZSCORE -LDname LD1 -enumerate 1 -Gname Enrich.Base -Lname BF.Base

./PAINTOR -input InDirectory/input.file -in InDirectory/ -out OutDirectory/ -Zhead ZSCORE -LDname LD1 -enumerate 1 -annotations ConstitutiveHet -Gname Enrich.ConstitutiveHet -Lname BF.ConstitutiveHet

./PAINTOR -input InDirectory/input.file -in InDirectory/ -out OutDirectory/ -Zhead ZSCORE -LDname LD1 -enumerate 1 -annotations Enhancer -Gname Enrich.Enhancer -Lname BF.Enhancer

./PAINTOR -input InDirectory/input.file -in InDirectory/ -out OutDirectory/ -Zhead ZSCORE -LDname LD1 -enumerate 1 -annotations Promoter -Gname Enrich.Promoter -Lname BF.Promoter

./PAINTOR -input InDirectory/input.file -in InDirectory/ -out OutDirectory/ -Zhead ZSCORE -LDname LD1 -enumerate 1 -annotations Quiescent -Gname Enrich.Quiescent -Lname BF.Quiescent

./PAINTOR -input InDirectory/input.file -in InDirectory/ -out OutDirectory/ -Zhead ZSCORE -LDname LD1 -enumerate 1 -annotations RegPermissive -Gname Enrich.RegPermissive -Lname BF.RegPermissive

./PAINTOR -input InDirectory/input.file -in InDirectory/ -out OutDirectory/ -Zhead ZSCORE -LDname LD1 -enumerate 1 -annotations Transcribed -Gname Enrich.Transcribed -Lname BF.Transcribed

“After obtaining the output for all of the annotations marginally, prioritize annotations based on the improvement in the model fit. Take the top annotations (usually no more than 4 or 5) to enter the final model that are roughly uncorrelated with one another. We recommend correlation matrices for this process.”

E.g. determine significance of each annotation independently using the likelihood ratio test:

$> cat OutDirectory/BF.Base

204.6911542

cat OutDirectory/BF.ConstitutiveHet

204.7133103

LRT = -2[logBF(M0)) - (logBF(M1))]

=-2[204.6911542-204.7133103]=0.0443122

1-pchisq(0.0443122, 1)=0.8332738

P values for each annotation in thymus cells:

ConstitutiveHet: 0.8332738
Enhancer: 0.8508141
Promoter: 0.7893417
Quiescent: 0.1636341
RegPermissive: 0.6766891
Transcribed: 0.0628845

“Then use those annotations in a final model to compute trait-specific posterior probabilities for causality.”

Suggested pipeline to find the relative probability for a SNP to be causal given that it is in each annotation

The baseline prior probability for any SNP in the fine-mapping dataset to be causal is 0.001492543.

The prior probability for a SNP in each annotation to be causal is:

ConstitutiveHet: 3.08096e-12
Enhancer: 5.467224e-10
Promoter: 3.08096e-12
Quiescent: 9.714949e-09
RegPermissive: 3.287315e-12
Transcribed: 0.009433592

Thus, the relative probaility for a SNP to be causal given that it is in each annotation is:

ConstitutiveHet: 3.08096e-12/0.001492543=2.064235e-09
Enhancer: 3.663026e-07
Promoter: 2.064235e-09
Quiescent: 6.508991e-06
RegPermissive: 2.202493e-09
Transcribed: 6.320483

Notes

Is the ancestry correct? P values from T1D GWAS and LD estimates from paste0("/home/ah2011/rds/rds-cew54-wallace-share/Projects/anna-credsets/geno-",snpChr,".RDS")?
I can enumerate the maxmimum number of CVs per region or leave this empty (in which case it looks for at most 3).
What questions can I aim to answer using PAINTOR? I have data for the annotation of SNPs in these 39 genomic regions in 19 cell types (here I’ve only looked at annotations in thymus cells).