GoShifter (Genomic Annotation Shifter) uses a circularised permutation method for functional enrichment of GWAS variants. Contrary to SNP matching based methods, whereby matching (confounding) parameters must be specified, GoShifter does not require any prior knowledge of the confounding factors, because the null distribution is derived within the tested loci.

The paper discusses two specific types of confounding that they claim their method accounts for:

1. Trait-associated SNPs often map to regions with greater gene density, genetic variation and LD than the rest of the genome.

2. Functional annotations that colocalise are often enriched within trait associated loci (e.g. DHSs colocalise with exons). This means that annotations could be labelled as enriched when this is only due to colocalisation with another annotation which is actually enriched.

#### Method

1. Derive set of potentially functional variants (index SNPs + those with $$r^2>0.8$$ in 1000 Genomes Project European samples).

2. Define loci as the region between the furthest linked SNPs and extend by twice the median size of the tested annotation (X) (ensures sufficient size for testing the significance of an overlap within a locus defined by an index variant with no other variants in linkage).

3. Quantify the proportion of loci in which at least one SNP in LD overlapped X.

4. Circularise the loci and randomly shift X sites within each locus and quantify the proportion of loci overlapping X while fixing the locations of the SNPs many times to generate a null distribution.

5. Compute P values as the proportion of iterations for which the number of overlapping loci was equal to or greater than that for the tested SNPs.

The method is extended for “stratified enrichment of an association” whereby enrichment of an annotation (X) is calculated whilst controlling for a potentially colocalising second annotation (Y).

1. Fragment each locus on the basis of the presence of Y while fixing the relative positions of he SNPs and annotation X (splitting X annotations if they partially overlap Y).

2. Concatenate these fragments (preserving the relationships and relative positions among X, Y and the SNPs in the locus in both segments).

3. To generate the stratified null distribution, circularise and randomly shift X within the two segments (overlapping Y and not overlapping Y) independently and quantify the proportion of loci that had at least one SNP that overlapped X in either region.

4. Define P value of the enrichment as the proportion of iterations where the number of loci with SNPs overlapping X exceeded the number of loci overlapping X prior to shifting.

#### Other info

• They define the “Delta-Overlap” parameter as the difference between the observed proportion of loci overlapping an annotation and the mean of the proportion of loci overlapping the annotation under the null derived by local shifting. If there is no enrichment, the difference will be very small (observed overlap close to mean overlap). This parameter can therefore be used to quantify the effect size of the overlap.

• They derive an “overlap score” for each locus to identify individual loci where the overlap between a SNP and an annotation was particularly informative. It is the probability that each locus overlaps an annotation by chance and is calculated only for loci that overlap an annotation. Thus, low overlap scores at these loci indicate important informative loci for further functional investigations (typically those loci with few variants linked to the index SNP and sparse density of the annotation). In the paper the find valid loci with low overlap scores for RA and breast cancer.