This report shows my investigation of what could be causing the problem in our corrected coverage estimates.

However, we’ve now identified the problem (see https://annahutch.github.io/PhD/31july.html) so these findings are redundant.


PPs are Calibrated


We’ve seen that the PPs are well calibrated in the UK10K simulations, lets check whether they are well calibrated in the low and high 1000 Genomes simulations.


Yep - they still are!


Correlation of UK10K Regions


What does the LD look like in the UK10K regions?

The geth() function samples a small (ldd$dist <1200000) LD block on chromosome 22, there are 6 of these to choose from (with 2121, 3927, 3788, 3516, 2960 or 5475 snps). 2 random starting points are sampled and the 100 adjacent snps from each of these are selected.

An example of a 200 SNP region is shown below: