17th April 2019

Corrected Coverage Matching Estimate

This alternative corrected coverage estimate uses only those simulated credible sets where the number of variants matches that of the original credible set to be corrected.
I am running simulations to implement this method on credible sets with \(nvar < 4\) (to ensure we are not conditioning on a particularly implausible event - e.g. exactly 120 variants in the set).
Our standard corrected coverage estimate uses 1000 simulations for each SNP considered causal. Since the matching estimate cuts out SNPs (and we don’t want our estimate to be based only on a handful of samples) I increase the number of simulations for each SNP considered causal to 10K (although this leads to some considered causal SNPs with 10K simulations and some with a handful).
To get our final corrected coverage estimate, we need to deal with unbalanced samples. To do this, we take a single average by stacking the ‘covered’ binary indicator for each credible set (where the nvar matches what we want). We then make a pp.vect vector of the corresponding posterior probabilities. The length of both of these vectors is sum(nsims). This means that those simulations which have more data (because they have the correct nvar) will be up weighted.
The simulations are in /APRIL/trim_sims/

It seems that this matching method is not very accurate. Perhaps we could limit its use to high power scenarios (large \(\mu\), e.g. for PTPN22 \(\mu=20\)).

No, this method is too noisy.

This method has been implemented on the T1D data.

##       index_SNP     claim nvar  corr_cov corr_cov_match nsims_match
##  1:  rs10277986 0.9960149   11 0.9754023      0.9932061  26.4226519
##  2:  rs11203202 0.9993207    4 0.9955823      0.9878488 127.2089552
##  3: rs113010081 0.9901508  120 0.9941965      0.6856924   2.3640777
##  4:  rs11954020 0.9901063   83 0.9921446      0.9770868   1.8967254
##  5:  rs12416116 0.9915243    6 0.9908959      0.9982534  56.4000000
##  6:  rs12453507 0.9904096   65 0.9915284       0.998754   3.1443038
##  7:  rs12927355 0.9918667   20 0.9705063      0.9773273  11.1950207
##  8:  rs13415583 0.9901788   92 0.9901757      0.9685234   2.2048193
##  9:   rs1456988 0.9953685   26 0.9836469      0.9967821  13.5183246
## 10:    rs151234   0.99019   10 0.9859141      0.9884305  18.2379032
## 11:   rs1538171 0.9903378   87 0.9735812      0.9659006   3.6538462
## 12:   rs1615504 0.9905884   26 0.9907459      0.9961016  18.1734104
## 13:   rs1893217 0.9962419   11 0.9463931      0.9996313  25.0000000
## 14:    rs193778  0.990299   23 0.9929865      0.9987876  13.1706485
## 15:   rs2111485 0.9999631    4 0.9795897      0.9959734  18.1500000
## 16:    rs229533 0.9902057    8 0.9961789      0.9998258  36.8798077
## 17:   rs2476601         1    2 0.8042806              1 126.9217527
## 18:   rs3024505 0.9919685    7 0.9908633      0.9993566  35.3620178
## 19:   rs3087243 0.9914763   21 0.9457756       0.982225  13.7558594
## 20:  rs34593439 0.9913195    2 0.9927623      0.9952842  77.6587838
## 21:    rs402072 0.9909357   14 0.9913847      0.9999948  25.5695364
## 22:   rs4820830 0.9904556   59 0.9853841      0.9947404  15.0179949
## 23:    rs516246 0.9965928   12 0.9876407      0.9953469  21.3224044
## 24:  rs56994090 0.9934261    8 0.9976826      0.9999007  39.1645570
## 25:   rs6043409 0.9901574   16  0.996303      0.9991871  36.6371681
## 26:  rs61839660 0.9985256    4 0.9469757       0.969185  96.7827476
## 27:  rs62447205 0.9901583   40 0.9797742      0.9949124   7.7051482
## 28:   rs6476839 0.9947365   18 0.9931858      0.9994083  25.5403226
## 29:   rs6518350 0.9902603   41 0.9966605       0.996814   3.2250000
## 30:    rs653178 0.9995797    2 0.8324161      0.8645506  91.4722222
## 31:    rs705705 0.9932401    9 0.9266029       0.954514  28.8281250
## 32:  rs72727394 0.9904069   19 0.9953029      0.9974931  10.4550000
## 33:  rs72928038 0.9900544   36 0.9923231      0.9846845   8.4029851
## 34:    rs757411  0.990144   54 0.9874525      0.9923193   7.7875000
## 35:  rs75793288 0.9940996   22 0.9761336      0.9843889   8.0512821
## 36:   rs8056814 0.9920644    8 0.9649084      0.9955263  16.3964844
## 37:    rs917911  0.990001  133 0.9908669       0.940162   0.8049713
## 38:   rs9585056 0.9907895    3 0.9989341      0.9967159  52.2662338
##       index_SNP     claim nvar  corr_cov corr_cov_match nsims_match

If we limit its use to smaller credible sets (nvar<4), then we have the following examples. Note that it only seems to drastically alter the estimate for rs2476601.

##     index_SNP     claim nvar  corr_cov corr_cov_match nsims_match
## 1:  rs2476601         1    2 0.8042806              1   126.92175
## 2: rs34593439 0.9913195    2 0.9927623      0.9952842    77.65878
## 3:   rs653178 0.9995797    2 0.8324161      0.8645506    91.47222
## 4:  rs9585056 0.9907895    3 0.9989341      0.9967159    52.26623

This estimate is not as robust as the original. I have included this corrected coverage matching estimate as an optional extra function in the package (with a cautionary warning), for users if they would like to condition on the number of variants in the credible set as well.

T1D Investigation

Regions to investigate

The following index SNPs are not in aligned9.build37.RData (i.e. not in X@snps).

rs75793288 (1kg_13_98879767): Only corresponding build 36 region is D_chr13_98723872_99034738, but the index-SNPs position is 100081766 on chromosome 13 (not in this region).
rs10277986 (imm_7_50996481): Corresponding region in build 36 is D_chr7_50866661_51640000 (index SNP is chr7: 51028987). The index-SNP does appear in this build and has the 12th largest absolute Z score.
- Using the build 36 method: nsnps = 334, 99% CS nvar = 13 (including index-SNP), claimed cov = 0.9963597, corrected cov = 0.9870171.
- Using the build 37 method: nsnps = 362, 99% CS nvar = 11, claimed cov = 0.9960149, corrected cov = 0.9763712.
rs12927355 (imm_16_11102272): Corresponding region in build 36 is D_chr16_10831557_11408130 (index SNP is chr16: 11194771).
- Using the build 36 method: nsnps = 751, 99% CS nvar = 29 (including index-SNP), claimed cov = 0.9922648, corrected cov = 0.9870171.
- Using the build 37 method: nsnps = 482, 99% CS nvar = 20, claimed cov = 0.9918667, corrected cov = 0.9686685.
rs34536443b: Corr cov for thr=1 is 0.99999 and corr cov for thr=0 is 0.988 so cannot find a root in the region to apply the correction. Perhaps because max pp is 0.9885797.

Top SNP != index SNP

Using the https://chr1swallace.github.io/MFM-output/ tool, I can visualise the regions where the top SNP is not the index SNP. Indeed, in these regions there are lots of SNPs in tight LD all with high pp.

Final T1D Stuff

I have two final data frames for the T1D results, one uses 90% CIs and the other uses 95% CIs.
I also have a final .csv file with these results, which could be a supplementary table in the paper.

Investigating Discrepency

Still need to figure out why our simulation show that the corrected coverage estimate for standard 99% credible sets is usually greater than 0.99, whilst in the T1D results, the corrected coverage estimates of the standard credible sets are both above and below 0.99.

I think that this is most likely due to differences in LD (our simulations use African LD data). Maybe try simulations using the T1D LD data (check out post from 27th March where I did this for a few regions).

Chris also suggested that it is due to differences in MAF.

MAF for top-SNP in T1D regions (if this doesn’t show up: min = 0.04811, 1st q = 0.16684, median = 0.32299, mean = 0.29297, 3rd q = 0.39970, max = 0.49668):

MAF for top-SNP in our simulations: (if this doesn’t show up: min = 0.02446, 1st q = 0.11835, median = 0.27374, mean = 0.32901, 3rd q = 0.593, max = 0.89355).

For the T1D data, I see how many SNPs are in high LD with the top SNP (\(LD>0.8\)), medium LD (\(0.5<LD<0.8\)), lower LD (\(0.2<LD<0.5\)) and lowest LD (\(0<LD<0.2\)). The table below shows these results.

##      Index_SNP topSNP_maf High_LD Mid_LD Low_LD Lowest_LD Total
## 1    rs2476601 0.09568158       2      2      9       164   177
## 2    rs1538171 0.44785858     103      9     79       101   292
## 3   rs62447205 0.48428046      13      6      5        35    59
## 4   rs10277986 0.04811343      29     12     10       118   169
## 5    rs6476839 0.49667853       7      0      2         7    16
## 6   rs61839660 0.10284698      16     12      0        34    62
## 7   rs12416116 0.28896797       6      2     11        21    40
## 8     rs917911 0.36008304       4      0      1        76    81
## 9     rs705705 0.34870608      13      0      0         0    13
## 10    rs653178 0.48629893       3      0      0         2     5
## 11   rs9585056 0.24857651       1      2      8        12    23
## 12   rs3024505 0.16087318       4      2      2        58    66
## 13   rs1456988 0.27752076      23      1     12        12    48
## 14  rs56994090 0.41436202       3      0      0         9    12
## 15  rs72727394 0.20011862      17     12      1        23    53
## 16  rs34593439 0.10505398       4      0      1       103   108
## 17    rs151234 0.12838319      10      1      1         7    19
## 18  rs12927355 0.31931427      26      1     12        41    80
## 19    rs193778 0.24475027       5     30      5        67   107
## 20   rs8056814 0.08007117      23      6      0        72   101
## 21  rs12453507 0.32666983       3     13      8        53    77
## 22    rs757411 0.36854906      12     17      0         5    34
## 23  rs13415583 0.39198007       3      2     30       103   138
## 24   rs1893217 0.16927639      11     25      2        59    97
## 25   rs1615504 0.47354686      26      3      2         1    32
## 26    rs402072 0.16603393      11      3      0        12    26
## 27    rs516246 0.49590699      17      4      1         8    30
## 28   rs6043409 0.15705471       1      3     17        59    80
## 29  rs11203202 0.33558719       5      7      4        12    28
## 30   rs6518350 0.18475682       9      0      6        14    29
## 31   rs4820830 0.38903394      62     13      0        18    93
## 32    rs229533 0.43300071       5      0      3        30    38
## 33   rs2111485 0.39074733       2      2     42         1    47
## 34   rs3087243 0.45438368      39      0      0        46    85
## 35 rs113010081 0.11731910      52      5      0       377   434
## 36  rs75793288 0.36251483      22     18      0        50    90
## 37  rs11954020 0.40226679      13      0     24        32    69
## 38  rs72928038 0.17580071       2      3      0        10    15
## 39   rs2476601 0.09568158       2      2      9       164   177

Comments and Next Steps

Have updated the corrcoverage R package to include functions to implement the matching nvar method above (with a cautionary note) and to obtain a confidence interval for the coverage estimate (https://annahutch.github.io/corrcoverage/articles/Useful-Info.html) –> discuss averaging the estimates.
Came across Lindley’s paradox on ATPS, where we saw an example of how using a high diffuse prior distribution (large \(\sigma^2\)) led to a BF in favour of a simpler model being made arbitrarily large by increasing \(\sigma\) (irrespective of whatever the data tells us). Was used to emphasis the importance of specifying prior dispersion parameters to compare models. Does our choice of \(W\) link to this? (slide 45 of APTS notes).
Get a similar table as above (high, mid, low LD) for the simulations to compare.