This vignette contains supplementary information regarding the usage of the corrcoverage R package.


Key functions

The two main functions are:

  1. corrcov (or analogously corrcov_bhat): Provides a corrected coverage estimate of credible sets obtained using the Bayesian approach for fine-mapping (see “Corrected Coverage” vignette).
  • Use corrcov for \(Z\)-scores and minor allele frequencies
  • Use corrcov_bhat for beta hat estimates and their standard errors.
  1. corrected_cs (or analogously corrected_cs_bhat): Finds a corrected credible set, which is the smallest set of variants such that the corrected coverage is above some user defined “desired coverage” (see "Corrected credible set vignette).
  • Use corrected_cs for \(Z\)-scores and minor allele frequencies
  • Use corrected_cs_bhat for beta hat estimates and their standard errors.

Additional useful functions

  • corrcov_nvar (or analogously corrcov_nvar_bhat): Finds a corrected coverage estimate whereby the simulated credible sets used to derive the estimate are limited to those which contain a specified number of variants (parameter ‘nvar’). These functions should be used with caution and only ever for very small credible sets (fewer than 4 variants).

  • corrcov_CI (or analogously corrcov_CI_bhat): Finds a confidence interval for the corrected coverage estimate. The default is a 95% confidence interval (parameter CI = 0.95), but can be adjusted accordingly. This function involves repeating the correction procedure 100 times and therefore requires lots of memory.


Conversion functions

The Bayesian method for fine-mapping involves finding the posterior probability of causality for each SNP, before sorting these into descending order and adding variants to a ‘credible set’ until the combined posterior probabilities of these SNPs exceed some threshold. The supplementary text of Maller’s paper (available here) shows that these posterior probabilities are normalised Bayes factors.

Asymptotic Bayes factors (Wakefield, 2009) are commonly used in genetic association studies as these only require the specification of \(Z\)-scores (or equivalently the effect size coefficients, \(\beta\), and their standard errors, \(V\)), the standard errors of the effect sizes (\(V\)) and the prior variance of the estimated effect size (\(W^2\)), thus only requiring summary data from genetic association studies plus an estimate for the \(W\) parameter.

Consequently, the corrcoverage package contains functions for converting between \(P\)-values, \(Z\)-scores, asymptotic Bayes factors (ABFs) and posterior probabilities of causality (PPs). The following table shows what input these conversion functions require and what output they produce. The ‘include null model’ column is for whether the null model of no genetic effect is included in the calculation (PPs obtained using the standard Bayesian approach ignore this).

Function Include null model? Input Output
approx.bf.p YES \(P\)-values log(ABF)
pvals_pp YES \(P\)-values Posterior Probabilities
z0_pp YES Marginal \(Z\)-scores Posterior Probabilities
ppfunc NO Marginal \(Z\)-scores Posterior Probabilities

Marginal and joint Z scores

Functions are also provided to simulate marginal \(Z\)-scores from joint \(Z\)-scores (\(Z_j\)). The joint \(Z\)-scores are all 0, except at the causal variant where it is the “true effect”, \(\mu\).

\(\mu\) can be estimated using the est_mu function which requires sample sizes, marginal \(Z\)-scores and minor allele frequencies. We estimate \(\mu\) by \[\hat\mu=\sum_{j}|Z_j|\times PP_j\]

The z_sim function simulates marginal \(Z\)-scores from joint \(Z\)-scores, whilst zj_pp can be used to simulate posterior probability systems from a joint \(Z\)-score vector. These functions first calculate the expected marginal \(Z\) scores, \(E(Z)\), \[E(Z)=Z_j \times \Sigma\] where \(\Sigma\) is the correlation matrix between SNPs.

We can then simulate more \(Z\)-score systems from a multivariate normal distribution with mean \(E(Z)\) and variance \(\Sigma\). This is a key step in our corrected coverage method.

Summary

  • Joint \(Z\) score vectors (Z_j) are used to derive expected marginal \(Z\) score vectors, by multiplying with the SNP correlation matrix.

\[E(Z)=Z_j \times \Sigma\]

  • The expected marginal \(Z\) score vectors can be used as the mean in a multi-variate normal distribution with variance equal to the SNP correlation matrix to simulate more marginal \(Z\) score vectors.

\[Z \sim MVN(E(Z),\Sigma)\]

  • The z_sim function follows these steps to simulate nrep marginal \(Z\) score vectors.

  • The zj_pp function goes one step further and converts these simulated marginal \(Z\) scores to posterior probabilities of causality.