Aim: Use PAINTOR for T1D fine-mapping and functional enrichment. “A key feature of our approach is that it empirically estimates the contribution of each functional annotation to the trait of interest directly from summary association statistics while allowing for multiple causal variants at any risk locus”.

Paper: https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1004722

Software: https://github.com/gkichaev/PAINTOR_V3.0

For each of the 39 T1D-associated regions, I generate 3 files with a row for each SNP:

- Locus file (“CHR”,“POS”,“RSID”,“ZSCORE” columns)
- LD matrix with pairwise correlations (a symmetric matrix of Pearson correlation coefficients)
- Annotation matrix file (binary indicator of each annotation mark - note that these are mutually exclusive)

(all in PAINTOR_thymus on HPC).

I make an input directory (InDirectory) consisting of each of the input files and an additional file listing the loci (called input.file). I also make an empty output directory (OutDirectory) to store the results.

In the PAINTOR_V3.0 directory run:

`./PAINTOR -input InDirectory/input.file -in InDirectory/ -out OutDirectory/ -Zhead ZSCORE -LDname LD1 -enumerate 3 -annotations ConstitutiveHet Enhancer LowConfidence Promoter Quiescent RegPermissive Transcribed NA`

Note: I get the following warning

`Warning! The estimated N*h2_g,local for locus Locus1 is: 1990.26 This may potentially indicate mismatch/error in the LD-matrix`

This produces output files containing the PPs.

“In order to determine which annotations are relevant to the phenotype being considered, we recommend running PAINTOR on each annotation independently.” E.g.

`./PAINTOR -input InDirectory/input.file -in InDirectory/ -out OutDirectory/ -Zhead ZSCORE -LDname LD1 -enumerate 1 -Gname Enrich.Base -Lname BF.Base`

`./PAINTOR -input InDirectory/input.file -in InDirectory/ -out OutDirectory/ -Zhead ZSCORE -LDname LD1 -enumerate 1 -annotations ConstitutiveHet -Gname Enrich.ConstitutiveHet -Lname BF.ConstitutiveHet`

`./PAINTOR -input InDirectory/input.file -in InDirectory/ -out OutDirectory/ -Zhead ZSCORE -LDname LD1 -enumerate 1 -annotations Enhancer -Gname Enrich.Enhancer -Lname BF.Enhancer`

`./PAINTOR -input InDirectory/input.file -in InDirectory/ -out OutDirectory/ -Zhead ZSCORE -LDname LD1 -enumerate 1 -annotations Promoter -Gname Enrich.Promoter -Lname BF.Promoter`

`./PAINTOR -input InDirectory/input.file -in InDirectory/ -out OutDirectory/ -Zhead ZSCORE -LDname LD1 -enumerate 1 -annotations Quiescent -Gname Enrich.Quiescent -Lname BF.Quiescent`

`./PAINTOR -input InDirectory/input.file -in InDirectory/ -out OutDirectory/ -Zhead ZSCORE -LDname LD1 -enumerate 1 -annotations RegPermissive -Gname Enrich.RegPermissive -Lname BF.RegPermissive`

`./PAINTOR -input InDirectory/input.file -in InDirectory/ -out OutDirectory/ -Zhead ZSCORE -LDname LD1 -enumerate 1 -annotations Transcribed -Gname Enrich.Transcribed -Lname BF.Transcribed`

“After obtaining the output for all of the annotations marginally, prioritize annotations based on the improvement in the model fit. Take the top annotations (usually no more than 4 or 5) to enter the final model that are roughly uncorrelated with one another. We recommend correlation matrices for this process.”

E.g. determine significance of each annotation independently using the likelihood ratio test:

`$> cat OutDirectory/BF.Base`

`204.6911542`

`cat OutDirectory/BF.ConstitutiveHet`

`204.7133103`

`LRT = -2[logBF(M0)) - (logBF(M1))]`

`=-2[204.6911542-204.7133103]=0.0443122`

`1-pchisq(0.0443122, 1)=0.8332738`

P values for each annotation in thymus cells:

- ConstitutiveHet: 0.8332738
- Enhancer: 0.8508141
- Promoter: 0.7893417
- Quiescent: 0.1636341
- RegPermissive: 0.6766891
- Transcribed: 0.0628845

“Then use those annotations in a final model to compute trait-specific posterior probabilities for causality.”

The baseline prior probability for any SNP in the fine-mapping dataset to be causal is 0.001492543.

The prior probability for a SNP in each annotation to be causal is:

- ConstitutiveHet: 3.08096e-12
- Enhancer: 5.467224e-10
- Promoter: 3.08096e-12
- Quiescent: 9.714949e-09
- RegPermissive: 3.287315e-12
- Transcribed: 0.009433592

Thus, the relative probaility for a SNP to be causal given that it is in each annotation is:

- ConstitutiveHet: 3.08096e-12/0.001492543=2.064235e-09
- Enhancer: 3.663026e-07
- Promoter: 2.064235e-09
- Quiescent: 6.508991e-06
- RegPermissive: 2.202493e-09
- Transcribed: 6.320483

Is the ancestry correct? P values from T1D GWAS and LD estimates from

`paste0("/home/ah2011/rds/rds-cew54-wallace-share/Projects/anna-credsets/geno-",snpChr,".RDS")`

?I can enumerate the maxmimum number of CVs per region or leave this empty (in which case it looks for at most 3).

What questions can I aim to answer using PAINTOR? I have data for the annotation of SNPs in these 39 genomic regions in 19 cell types (here I’ve only looked at annotations in thymus cells).