This alternative corrected coverage estimate uses only those simulated credible sets where the number of variants matches that of the original credible set to be corrected.
I am running simulations to implement this method on credible sets with \(nvar < 4\) (to ensure we are not conditioning on a particularly implausible event - e.g. exactly 120 variants in the set).
Our standard corrected coverage estimate uses 1000 simulations for each SNP considered causal. Since the matching estimate cuts out SNPs (and we don’t want our estimate to be based only on a handful of samples) I increase the number of simulations for each SNP considered causal to 10K (although this leads to some considered causal SNPs with 10K simulations and some with a handful).
To get our final corrected coverage estimate, we need to deal with unbalanced samples. To do this, we take a single average by stacking the ‘covered’ binary indicator for each credible set (where the nvar matches what we want). We then make a pp.vect vector of the corresponding posterior probabilities. The length of both of these vectors is sum(nsims). This means that those simulations which have more data (because they have the correct nvar) will be up weighted.
The simulations are in /APRIL/trim_sims/
It seems that this matching method is not very accurate. Perhaps we could limit its use to high power scenarios (large \(\mu\), e.g. for PTPN22 \(\mu=20\)).
No, this method is too noisy.
This method has been implemented on the T1D data.
## index_SNP claim nvar corr_cov corr_cov_match nsims_match
## 1: rs10277986 0.9960149 11 0.9754023 0.9932061 26.4226519
## 2: rs11203202 0.9993207 4 0.9955823 0.9878488 127.2089552
## 3: rs113010081 0.9901508 120 0.9941965 0.6856924 2.3640777
## 4: rs11954020 0.9901063 83 0.9921446 0.9770868 1.8967254
## 5: rs12416116 0.9915243 6 0.9908959 0.9982534 56.4000000
## 6: rs12453507 0.9904096 65 0.9915284 0.998754 3.1443038
## 7: rs12927355 0.9918667 20 0.9705063 0.9773273 11.1950207
## 8: rs13415583 0.9901788 92 0.9901757 0.9685234 2.2048193
## 9: rs1456988 0.9953685 26 0.9836469 0.9967821 13.5183246
## 10: rs151234 0.99019 10 0.9859141 0.9884305 18.2379032
## 11: rs1538171 0.9903378 87 0.9735812 0.9659006 3.6538462
## 12: rs1615504 0.9905884 26 0.9907459 0.9961016 18.1734104
## 13: rs1893217 0.9962419 11 0.9463931 0.9996313 25.0000000
## 14: rs193778 0.990299 23 0.9929865 0.9987876 13.1706485
## 15: rs2111485 0.9999631 4 0.9795897 0.9959734 18.1500000
## 16: rs229533 0.9902057 8 0.9961789 0.9998258 36.8798077
## 17: rs2476601 1 2 0.8042806 1 126.9217527
## 18: rs3024505 0.9919685 7 0.9908633 0.9993566 35.3620178
## 19: rs3087243 0.9914763 21 0.9457756 0.982225 13.7558594
## 20: rs34593439 0.9913195 2 0.9927623 0.9952842 77.6587838
## 21: rs402072 0.9909357 14 0.9913847 0.9999948 25.5695364
## 22: rs4820830 0.9904556 59 0.9853841 0.9947404 15.0179949
## 23: rs516246 0.9965928 12 0.9876407 0.9953469 21.3224044
## 24: rs56994090 0.9934261 8 0.9976826 0.9999007 39.1645570
## 25: rs6043409 0.9901574 16 0.996303 0.9991871 36.6371681
## 26: rs61839660 0.9985256 4 0.9469757 0.969185 96.7827476
## 27: rs62447205 0.9901583 40 0.9797742 0.9949124 7.7051482
## 28: rs6476839 0.9947365 18 0.9931858 0.9994083 25.5403226
## 29: rs6518350 0.9902603 41 0.9966605 0.996814 3.2250000
## 30: rs653178 0.9995797 2 0.8324161 0.8645506 91.4722222
## 31: rs705705 0.9932401 9 0.9266029 0.954514 28.8281250
## 32: rs72727394 0.9904069 19 0.9953029 0.9974931 10.4550000
## 33: rs72928038 0.9900544 36 0.9923231 0.9846845 8.4029851
## 34: rs757411 0.990144 54 0.9874525 0.9923193 7.7875000
## 35: rs75793288 0.9940996 22 0.9761336 0.9843889 8.0512821
## 36: rs8056814 0.9920644 8 0.9649084 0.9955263 16.3964844
## 37: rs917911 0.990001 133 0.9908669 0.940162 0.8049713
## 38: rs9585056 0.9907895 3 0.9989341 0.9967159 52.2662338
## index_SNP claim nvar corr_cov corr_cov_match nsims_match
If we limit its use to smaller credible sets (nvar<4), then we have the following examples. Note that it only seems to drastically alter the estimate for rs2476601.
## index_SNP claim nvar corr_cov corr_cov_match nsims_match
## 1: rs2476601 1 2 0.8042806 1 126.92175
## 2: rs34593439 0.9913195 2 0.9927623 0.9952842 77.65878
## 3: rs653178 0.9995797 2 0.8324161 0.8645506 91.47222
## 4: rs9585056 0.9907895 3 0.9989341 0.9967159 52.26623
This estimate is not as robust as the original. I have included this corrected coverage matching estimate as an optional extra function in the package (with a cautionary warning), for users if they would like to condition on the number of variants in the credible set as well.
The following index SNPs are not in aligned9.build37.RData
(i.e. not in X@snps
).
rs75793288 (1kg_13_98879767): Only corresponding build 36 region is D_chr13_98723872_99034738
, but the index-SNPs position is 100081766 on chromosome 13 (not in this region).
D_chr7_50866661_51640000
(index SNP is chr7: 51028987). The index-SNP does appear in this build and has the 12th largest absolute Z score.
D_chr16_10831557_11408130
(index SNP is chr16: 11194771).
rs34536443b: Corr cov for thr=1 is 0.99999 and corr cov for thr=0 is 0.988 so cannot find a root in the region to apply the correction. Perhaps because max pp is 0.9885797.
Using the https://chr1swallace.github.io/MFM-output/ tool, I can visualise the regions where the top SNP is not the index SNP. Indeed, in these regions there are lots of SNPs in tight LD all with high pp.
I have two final data frames for the T1D results, one uses 90% CIs and the other uses 95% CIs.
I also have a final .csv file with these results, which could be a supplementary table in the paper.
Still need to figure out why our simulation show that the corrected coverage estimate for standard 99% credible sets is usually greater than 0.99, whilst in the T1D results, the corrected coverage estimates of the standard credible sets are both above and below 0.99.
I think that this is most likely due to differences in LD (our simulations use African LD data). Maybe try simulations using the T1D LD data (check out post from 27th March where I did this for a few regions).
Chris also suggested that it is due to differences in MAF.
## Index_SNP topSNP_maf High_LD Mid_LD Low_LD Lowest_LD Total
## 1 rs2476601 0.09568158 2 2 9 164 177
## 2 rs1538171 0.44785858 103 9 79 101 292
## 3 rs62447205 0.48428046 13 6 5 35 59
## 4 rs10277986 0.04811343 29 12 10 118 169
## 5 rs6476839 0.49667853 7 0 2 7 16
## 6 rs61839660 0.10284698 16 12 0 34 62
## 7 rs12416116 0.28896797 6 2 11 21 40
## 8 rs917911 0.36008304 4 0 1 76 81
## 9 rs705705 0.34870608 13 0 0 0 13
## 10 rs653178 0.48629893 3 0 0 2 5
## 11 rs9585056 0.24857651 1 2 8 12 23
## 12 rs3024505 0.16087318 4 2 2 58 66
## 13 rs1456988 0.27752076 23 1 12 12 48
## 14 rs56994090 0.41436202 3 0 0 9 12
## 15 rs72727394 0.20011862 17 12 1 23 53
## 16 rs34593439 0.10505398 4 0 1 103 108
## 17 rs151234 0.12838319 10 1 1 7 19
## 18 rs12927355 0.31931427 26 1 12 41 80
## 19 rs193778 0.24475027 5 30 5 67 107
## 20 rs8056814 0.08007117 23 6 0 72 101
## 21 rs12453507 0.32666983 3 13 8 53 77
## 22 rs757411 0.36854906 12 17 0 5 34
## 23 rs13415583 0.39198007 3 2 30 103 138
## 24 rs1893217 0.16927639 11 25 2 59 97
## 25 rs1615504 0.47354686 26 3 2 1 32
## 26 rs402072 0.16603393 11 3 0 12 26
## 27 rs516246 0.49590699 17 4 1 8 30
## 28 rs6043409 0.15705471 1 3 17 59 80
## 29 rs11203202 0.33558719 5 7 4 12 28
## 30 rs6518350 0.18475682 9 0 6 14 29
## 31 rs4820830 0.38903394 62 13 0 18 93
## 32 rs229533 0.43300071 5 0 3 30 38
## 33 rs2111485 0.39074733 2 2 42 1 47
## 34 rs3087243 0.45438368 39 0 0 46 85
## 35 rs113010081 0.11731910 52 5 0 377 434
## 36 rs75793288 0.36251483 22 18 0 50 90
## 37 rs11954020 0.40226679 13 0 24 32 69
## 38 rs72928038 0.17580071 2 3 0 10 15
## 39 rs2476601 0.09568158 2 2 9 164 177
Comments and Next Steps
Have updated the
corrcoverage
R package to include functions to implement the matching nvar method above (with a cautionary note) and to obtain a confidence interval for the coverage estimate (https://annahutch.github.io/corrcoverage/articles/Useful-Info.html) –> discuss averaging the estimates.Came across Lindley’s paradox on ATPS, where we saw an example of how using a high diffuse prior distribution (large \(\sigma^2\)) led to a BF in favour of a simpler model being made arbitrarily large by increasing \(\sigma\) (irrespective of whatever the data tells us). Was used to emphasis the importance of specifying prior dispersion parameters to compare models. Does our choice of \(W\) link to this? (slide 45 of APTS notes).
Get a similar table as above (high, mid, low LD) for the simulations to compare.