CHEERS is a statistical method that accounts for subtle changes in the chromatin landscape to identify SNP enrichment across cell states. E.g. genome-wide significant SNPs for T1D are enriched in IFN\(\beta\).

Rather than converting annotations into a binary measure of present/absent at the SNP level (for which they found there were not many cell state specific changes in H3K27ac and ATAC-seq data for different T cell states), they model the peak properties.


Method

  1. Define a union of peak regions present across cell types and construct a matrix of normalised read counts.

  2. Calculate and rank specificity scores (the higher the rank score the more specific that peak is to that cell type). The peak specificity score is the normalised read count for that peak in that cell state divided by the square root of the sum of the squared normalised read counts for that peak across all cell states.

  3. Define SNP set as lead-SNPs and those with \(r^2>0.8\) and overlay these to the peaks.

  4. For the peaks that overlap the associated variants, calculate the mean specificity ranks and derive a P value representing the probability that SNP-peak overlap is random in that cell type.


Notes

"Within each cell state, all ranks (1,2,…,N) can be observed with an equal probability and thus follow a discrete uniform distribution (Kolmogorov-Smirnov test, P=1) with mean:

\[\mu = \dfrac{1+N}{2}\]

and variance:

\[\sigma^2=\dfrac{(max-min+1)^2-1}{12}.\]

When we substitute max with number of peaks (N) and min with the minimal rank (1), we obtain the following formula:

\[\sigma^2=\dfrac{N^2-1}{12}\] which, under the CLT coverges to,

\[\bar{\sigma}^2=\dfrac{N^2-1}{12n}\] where \(\bar{\sigma}^2\) is the variance of the mean of the \(n\) peak ranks overlapping the GWAS SNPs, assuming that these overlap at random.

Finally, we calculate P values as:

\[P = 1-\Phi(\dfrac{x-\mu}{\sigma})."\]