Apply our method to the 44 regions in table 1.

For the new credible set method, accuracy was set to 0.0001 and maximum iterations to 50.

Only able to do 38/44 regions:

- (2nd) rs6691977: No position info
- (5th) rs4849135: No position info
- (10th) rs2611215: No position info
- (34th) rs1052553: No position info
- rs34536443: Not in ic_t1d_onengut_allresults.csv file?
- (19th) rs689: Eliminated after quality control. “We integrated preexisting rs689 data with Immunochip data for the 6,670 UK GRID (UK Genetic Resource Investigating Diabetes) cases and 6,304 British 1958 Birth Cohort controls and found rs689 to be the most strongly associated SNP.” not sure where cred set in paper comes from

Cautious about row 35 (SNP rs917911).

rs2476601 (second last row) has OR = 1.89 and \(p<10^{−100}\). Notice that the corrected coverage of the new credible set is only 0.94, this is because the top two SNPs have pps 0.8640309 (rs2476601) and 0.1359691 (rs6679677) which sum to 1. The remaining SNPs have pps on the order of \(10^{-90}\). The maximum corrected coverage you can get with the credible set including just these two variants is ~95% and our method can’t add any more variants to increase this coverage because of R’s precision and how long the algorithm would take to get to these variants. E.g. got to

`"thr: 0.999999999883585 , cov: 0.926728061790903"`

after 50 iterations.Their credible sets are different to mine because they combine SIBs and trio data. Interesting that mine are almost always smaller. For this reason, in my paper I will compare my credible sets obtained using the standard Bayesian method and those obtained using the new required threshold method.

Investigating the median size of the credible set (as they do in the paper), it seems that applying my correction actually increases the size of the credible set - suggesting that there are more instances when the standard Bayesian approach has undercoverage (doesn’t actually reach 0.99) - but this might just be because there’s ‘more room’ below 99% than above it. Also, if we increase the number of variants in the set then we often increase it by quite a lot relative to how many we remove (because can be further below 99% than you can be above it).

- This means that in most cases, the 99% credible set obtained using the standard Bayesian approach for fine-mapping does not actually reach 99% coverage of the causal variant. E.g. the 99% credible set for the first SNP, rs10277986, actually only has ~97% coverage of the CV. I have used the R package to find the smallest credible set that does exceed 99% coverage. This credible set has an additional 4 variants in it, increasing the set size from 11 to 15 variants.

- I have simulated many credible sets with varying sample size, OR and threshold values. I correct each credible set using the R package. For each threshold:
- The first plots show the true coverage of the original and the new credible sets. These show that our new credible sets have the desired true coverage but the original ones have over-coverage.
- The second plots show the relative error of the claimed and corrected coverage estimates of the new credible set. These show that claimed coverage is inaccurate for low mu and variable for high mu, whereas the corrected coverage is more accurate and less variable always.

Have added

`notch=TRUE`

which provides roughly a 95% confidence interval about the median (\(median +/- 1.58*IQR/sqrt(n)\)). If the notches between groups differ then the medians are significantly different.“The upper whisker extends from the hinge to the largest value no further than \(1.5 \times IQR\) from the hinge. The lower whisker extends from the hinge to the smallest value at most \(1.5 \times IQR\) of the hinge.”