Contents

  1. Obtain Required Threshold
  2. Idea 4
  3. Idea 4 Code
  4. Fewer Replicates in Correction
  5. Speeding Up corrcov Function
  6. Next Steps

Obtain Required Threshold


Aim: Obtain a required threshold (\(x\)) from a desired coverage (\(y\)).

Idea 1: Obtain predicted threshold from each GAM (logit(covered)~logit(cumsum)), normalise each with pp and sum

Idea 2: Predict \(y\) for \(x \in (0,1)\) from each GAM, average these and use these to predict required threshold

Idea 3: As above but normalise each with pp


Idea 4


The quick_corrcov function takes approximately 4 seconds to report the corrected coverage estimate of the credible set, using a specified threshold.

## "if you use this threshold, the resultant credible set has this corrected coverage"

quick_corrcov <- function(thr, simulated.pps, pp){
  n_pps <- length(simulated.pps)
  args <- 1:nsnps
  
  d5 <- lapply(1:n_pps, function(x) {
    credset3(simulated.pps[[x]], CV = rep(args[x], dim(simulated.pps[[x]])[1]), thr = thr)
  })
  
  prop_cov <- lapply(d5, pred_na) %>% unlist()
  sum(prop_cov * pp)
}

The quick_corrcov_cs function takes approximately 4 seconds to report a list of the corrected coverage estimate, the new credible set, the threshold used and the size.

## "if you use this threshold, the resultant credible set is this and has this corrected coverage"

quick_corrcov_cs <- function(thr, simulated.pps, pp){
  n_pps <- length(simulated.pps)
  args <- 1:nsnps
  
  d5 <- lapply(1:n_pps, function(x) {
    credset3(simulated.pps[[x]], CV = rep(args[x], dim(simulated.pps[[x]])[1]), thr = thr)
  })
  
  prop_cov <- lapply(d5, pred_na) %>% unlist()
  corr_cov <- sum(prop_cov * pp)
  
  o <- order(pp, decreasing = TRUE)  
  cumpp <- cumsum(pp[o])  
  wh <- which(cumpp > thr)[1]  
  list(credset = names(pp)[o[1:wh]], corr.cov = corr_cov, thr = thr, size = cumpp[wh])
}

Idea 4 code


1. Vary threshold


  • BUT the aim of all of these ‘ideas’ is to provide the user with the threshold they should be using to obtain a credible set with the desired coverage. Of course, they could keep using quick_corrcov_cs above until the corrected coverage is where they want it to be, but we need to automate this.

  • Could use the following function to obtain a credible set with corrected coverage within some accuracy of the desired coverage. This uses C++ code to generate the credible sets from the simulated pps, do we really need to transfer the whole thing to C? Need to decide when to break the function - i.e. if it is impossible to get within this accuracy as there is one CV holding all the pp - include counter (this doesn’t seem to be stopping after 4 goes??)

new_cs <- function(desired.cov, acc = 0.05, simulated.pps, pp){
  
  corr.cov <- quick_corrcov(desired.cov, simulated.pps, pp)
  is_within_tolerance <- FALSE
  thr1 <- desired.cov
  counter <- 1
  
  while (counter < 4 | !is_within_tolerance) {
    if(dplyr::between(corr.cov, desired.cov, desired.cov+acc)){
      is_within_tolerance <- TRUE
    }
    
    # if corr.cov is too high
    if(corr.cov > desired.cov+acc){ # then run again with lower threshold (remove variants)
      thr1 <- thr1 - (desired.cov-corr.cov)
      corr.cov <- quick_corrcov(thr1, simulated.pps, pp)
      next
    }
    
    # if corr.cov is too low
    if(corr.cov < desired.cov){ # then run again with higher threshold (add variants)
      thr1 <- thr1 + (desired.cov-corr.cov)
      corr.cov <- quick_corrcov(thr1, simulated.pps, pp)
      next
    }
     counter <- counter + 1
  } 
  
  o <- order(pp, decreasing = TRUE)  # order index for true pp
  cumpp <- cumsum(pp[o])  # cum sums of ordered pps
  wh <- which(cumpp > thr1)[1]  # how many needed to exceed thr
  list(credset = o[1:wh], corr.cov = corr.cov, thr = thr1, counter=counter)
}

new_cs(0.8, 0.05, pps, pp0)

2. Systematically add/remove variants


  • A better (more robust) approach is the method below, which systematically removes/adds variants from the credible set until we have a new credible set with corrected coverage within some accuracy.

  • Odd scenarios I had to consider:

  1. Only 1 variant in the set (so can’t get a smaller credible set)
  2. Alternating endlessly between over desired.cov+accuracy and below desired.cov
desiredcov_cs <- function(desired.cov, acc = 0.05, simulated.pps, pp){
  n_pps <- length(simulated.pps)
  args <- 1:length(pp)
  thr <- desired.cov
  
  d5 <- lapply(1:n_pps, function(x) {
    credset3(simulated.pps[[x]], CV = rep(args[x], dim(simulated.pps[[x]])[1]), thr = thr)
  })
  
  prop_cov <- lapply(d5, pred_na) %>% unlist()
  corr_cov <- sum(prop_cov * pp)
  
  o <- order(pp, decreasing = TRUE)  
  cumpp <- cumsum(pp[o])  
  wh <- which(cumpp > thr)[1]  
  
  is_within_tolerance <- FALSE
  
  while (!is_within_tolerance) {
    if(dplyr::between(corr_cov, desired.cov, desired.cov+acc)){
      is_within_tolerance <- TRUE
    }
    
    if(wh==1){
      warning("Only 1 variant in the credible set, so can't make it any smaller")
      is_within_tolerance <- TRUE
    }
    
    # if corr.cov is too high
    if(!wh==1 & corr_cov > desired.cov+acc){ 
      wh <- wh-1 # remove a variant from the cs
      thr <- cumsum(pp[o][1:wh-1])[wh-1]+(cumsum(pp[o][1:wh])[wh]-cumsum(pp[o][1:wh-1])[wh-1])/2 # new thr required
      corr_cov <- quick_corrcov(thr, simulated.pps, pp)
      next
    }
    
    # if corr.cov is too low
    if(!wh==1 & corr_cov < desired.cov){ 
      wh <- wh+1 # add a variant to the cs
      thr <- cumsum(pp[o][1:wh-1])[wh-1]+(cumsum(pp[o][1:wh])[wh]-cumsum(pp[o][1:wh-1])[wh-1])/2 # new thr required
      corr_cov <- quick_corrcov(thr, simulated.pps, pp)
      
      if(corr_cov > desired.cov+acc){
        warning("This is the smallest credible set with corrected coverage above the desired coverage")
        is_within_tolerance <- TRUE
      }
      next
    }
  }
  list(credset = names(pp)[o[1:wh]], corr_cov = corr_cov, thr = thr, size = cumsum(pp[o])[wh], nvar = wh)
}
  • I’ve tried this with several datasets and it looks really promising - very quick and accurate.

  • Note that this function can be used to find the smallest credible set possible with corrected coverage greater than the desired coverage by setting acc to a very small value.


Can we use fewer replicates in the correction?



Playing around


This function reports the cumsum of a vector of pps

cppFunction('List f2(NumericVector pp) {
                int n = pp.size();
                NumericVector out(n);
                
                List ret(1);
                
                out[0] = pp[0];
                for(int i = 1; i < n; ++i) {
                  out[i] = out[i - 1] + pp[i];
                }
                ret[0]=out;
                return(ret);
              }')

Speeding up corrcov function


  • I have worked through each line of the corrected coverage functions to find the quickest way to do each line. These are the new functions.
corrected_cov <- function(mu, V, Sigma, pp0, thresh, size, nrep = 1000) {
  nsnps <- length(pp0)

  # form joint z-score vectors
  temp <- diag(x = mu, nrow = nsnps, ncol = nsnps)
  zj <- lapply(seq_len(nrow(temp)), function(i) temp[i,]) # nsnp zj vectors for each snp considered causal

  # simulate pp systems
  pps <- mapply(zj_pp, zj, V, MoreArgs = list(nrep = nrep, Sigma = LD), SIMPLIFY = FALSE)

  # consider different CV as causal in each list
  n_pps <- length(pps)
  args <- 1:nsnps

  # obtain credible set for each simulation
  prop_cov <- lapply(1:n_pps, function(x) {
    tmp = credset3(pps[[x]], CV = rep(args[x], nrep), thr = thresh)
    sum(tmp[[2]])/length(tmp[[2]])
  }) %>% unlist()

  # final corrected coverage value
  sum(prop_cov * pp0)
}

corrcov <- function(z, f, N0, N1, Sigma, thr, W = 0.2) {
  ph0.tmp <- z0_pp(z, f, type = "cc", N = N0 + N1, s = N1/(N0 + N1), W = W)
  ph0 <- ph0.tmp[1]  # prob of the null
  pp0dash <- ph0.tmp[-1]  # pps of variants
  
  varbeta <- coloc:::Var.data.cc(f, N0 + N1, N1/(N0 + N1))  # variance of estimated effect size
  
  pp0 <- ppfunc(z, V = varbeta, W = W)  # posterior probs of system
  
  muhat <- est_mu(z, f, N0, N1)
  
  corrected_cov(mu = muhat, V = varbeta, Sigma = Sigma, pp0 = pp0, thresh = thr)
}
  • Obviously, if we simulate posterior probabilities outside of the function, then we can just use quick_corrcov.

Next Steps


  • Is there a quicker way to simulate posterior probabilities from zj? This would require transferring the following to C code..? rmvnorm already uses source C code by default so I’m not sure this would help too much (profiling shows that this is the slow stage). Takes ~13 secs for 1000 simulations for each 200 snps considered causal.
zj_pp <- function(Zj, V, nrep = 5000, W = 0.2, Sigma) {
    exp.zm = Zj %*% Sigma  # find E(Z_m)
    zstar = mvtnorm:::rmvnorm(nrep, exp.zm, Sigma)  # nrep is rows, nsnps is cols
    r <- W^2/(W^2 + V)
    bf = 0.5 * (log(1 - r) + (r * zstar^2))
    denom <- coloc:::logsum(bf)  # logsum(x) = max(x) + log(sum(exp(x - max(x)))) so sum is not inf
    pp.tmp <- exp(bf - denom)  # convert back from log scale
    pp.tmp/rowSums(pp.tmp)
}

pps <- mapply(zj_pp, zj, V, MoreArgs = list(nrep = nrep, Sigma = LD), SIMPLIFY = FALSE)

  • At the moment, corrcov uses the following parameters z, f, N0, N1, Sigma, thr, W = 0.2. In terms of replacing the \(Z\) scores and minor allele frequences with \(\hat\beta\) and \(V\), I don’t think this will work since maf are needed to find V (Var.data.cc) and ABFs (approx.bf.p) within the function body.

  • Can I get going with the real T1D data?

  • In terms of finding the required threshold, the desiredcov_cs function above is working very well. Need to consider other scenarioes where it may not work.


Questions


  • Any ideas where I can get my poster printed A0 - do I have to pay for this?

  • Tips for poster presentation (I am going to print off A4 versions to hand out if anyone is interested at the conference).

  • Where can I get the data from the T1D paper - how shall I get going with this?

  • Is my desiredcov_cs function good enough to put into my package. I’ll include a vignette on using this too.