This vignette will show users how the
corrcoverage R package can be used to obtain a new credible set of variants that contains the true causal variant with some specified desired coverage value whilst containing as few variants as possible.
As in the corrected coverage vignette, let’s begin by simulating some GWAS data using the
set.seed(18) library(corrcoverage) library(simGWAS) # Simulate reference haplotypes nsnps <- 200 nhaps <- 1000 lag <- 30 # genotypes are correlated between neighbouring variants maf <- runif(nsnps + lag, 0.05, 0.5) # common SNPs laghaps <- do.call("cbind", lapply(maf, function(f) rbinom(nhaps, 1, f))) haps <- laghaps[, 1:nsnps] for (j in 1:lag) haps <- haps + laghaps[, (1:nsnps) + j] haps <- round(haps/matrix(apply(haps, 2, max), nhaps, nsnps, byrow = TRUE)) snps <- colnames(haps) <- paste0("s", 1:nsnps) freq <- as.data.frame(haps + 1) freq$Probability <- 1/nrow(freq) sum(freq$Probability)
##  1
OR <- 1.1 # odds ratios N0 <- 10000 # number of controls N1 <- 10000 # number of cases z0 <- simulated_z_score(N0 = N0, # number of controls N1 = N1, # number of cases snps = snps, # column names in freq W = CV, # causal variants, subset of snps gamma.W = log(OR), # log odds ratios freq = freq) # reference haplotypes
To calculate \(V\), the prior variance for the estimated effect size, we use
We can then use the
ppfunc function to calculate the posterior probabilities of causality for each variant.
We use the
est_mu function to obtain an estimate of the true effect at the causal variant.
##  4.970273
corrected_cov function is used to find the corrected coverage of a credible set with specified threshold, say 0.9.
Note that this function is similar to using
corrcov as explained in the “Corrected Coverage” vignette; which would require \(Z\)-scores, minor allele frequencies and sample sizes. Here, we have already calculated some of the intermediaries calculated in the
corrcov function (muhat, varbeta etc.) so we can use
## claimed.cov corr.cov nvar ## 1 0.9307395 0.967282 9
Using the Bayesian approach for statistical fine-mapping we obtain a 90% credible set consisting of 9 variants. The claimed coverage of this credible set is ~0.93, yet the corrected coverage estimate is ~0.97, suggesting that we can afford to be ‘more confident’ that we have captured the causal variant in our credible set.
Again, for the purpose of this vignette we can investigate how accurate this estimate is by simulating many credible sets from the same system, and finding the proportion of these that contain the true causal variant.
z0.tmp <- simulated_z_score(N0 = N0, # number of controls N1 = N1, # number of cases snps = snps, # column names in freq W = CV, # causal variants, subset of snps gamma.W = log(OR), # log odds ratios freq = freq, # reference haplotypes nrep = 5000) pps <- ppfunc.mat(zstar = z0.tmp, V = varbeta) # find pps cs.cov <- apply(pps, 1, function(x) credset(x, CV = iCV, thr = thr)$cov) true.cov.est <- mean(cs.cov) data.frame(claimed.cov = cs$claimed.cov, corr.cov = corrcov, true.cov = true.cov.est, nvar = cs$nvar)
## claimed.cov corr.cov true.cov nvar ## 1 0.9307395 0.967282 0.9696 9
We find that our corrected coverage value is very close to the empirical coverage of the credible set.
Our results suggest that we may be able to remove some variants from the credible set, whilst still achieving the desired coverage of 90%.
corrected_cs function uses GWAS summary statistics and some user-defined parameters to find the smallest credible set such that the coverage estimate is within some accuracy of the desired coverage.
The function requires the following parameters to be specified:
z(vector of marginal \(Z\)-scores)
f(vector of minor allele frequencies)
N1(number of controls and cases respectively)
Sigma(SNP correlation matrix)
desired.cov(desired coverage of the credible set)
In addition, there are a number of optional parameters:
lower(the lower value to try for the threshold)
upper(the upper value to try for the threshold)
acc(the accuracy required for the coverage of the resultant credible set)
max.iter(maximum number of iterations)
The function uses the bisection root finding method to converge to the smallest threshold such that the corrected coverage is larger than the desired coverage. The
upper parameters define the boundaries of the threshold values for the root finding method, with default values of 0 and 1 respectively. The
acc parameter has a default value set to 0.005, meaning that the algorithm will keep attempting to find a corrected credible set (until the number of iterations reaches
max.iter) that has coverage within 0.005 of the desired coverage value (e.g. between 0.895 and 0.905 for a 90% credible set).
The function reports the threshold values tested and their corresponding corrected coverage value at each iteration. The maximum number of iterations for the bisecting root finding algorithm is an optional parameter, with default value 20. The functions stops when either the number of iterations reaches the maximum, or the corrected coverage is within some accuracy of the desired coverage.
##  "thr: 0.75 , cov: 0.917720814110773" ##  "thr: 0.625 , cov: 0.878894070908612" ##  "thr: 0.6875 , cov: 0.89679762147096" ##  "thr: 0.71875 , cov: 0.907034158324742" ##  "thr: 0.703125 , cov: 0.901702620558738"
## $credset ##  "s54" "s58" "s57" "s67" ## ## $req.thr ##  0.703125 ## ## $corr.cov ##  0.9017026 ## ## $size ##  0.7343958
The first threshold value tested is the midpoint of the
lower (0.5) and
upper (1) parameter values, which here is 0.75. The coverage of this credible set is too high (\(>0.9\)) and so a smaller threshold value is tested, the midpoint of the
lower parameter value (0.5) and the previously tried threshold value (0.75). This credible set has a coverage that is too low, and so a higher threshold value is tested, and so on.
In this example we see that a much smaller threshold value is required to obtain a credible set with 90% corrected coverage of the causal variant, containing only 4 variants. In the standard Bayesian approach, the threshold value of 90% leads to over-coverage.
Finally, we can compare this coverage estimate of the corrected credible set to an empirical estimate of the true coverage.
Original 90% credible set:
## claimed.cov corr.cov true.cov nvar ## 0.931 0.967 0.97 9
New 90% credible set:
## claimed.cov corr.cov true.cov nvar ## 0.734 0.902 0.897 4
This vignette has shown how the
corrcoverage R package can be used to improve the resolution of a credible set from Bayesian genetic fine-mapping, without the use of any additional data.