Simulations


I simulate GWAS results for 114,310 SNPs on chromosome 22, including 37 CVs.

I consider 3 use cases:

[I have the code ready, but need to finalise the functional cFDR function first].


Independent uniform


I first consider simulations where \(q\) is sampled from a uniform distribution, so that I can compare results to those using empirical cFDR.

Sample \(q\) from w*exp(-rexp(x, rate = 1)) + (1-w)*exp(-rexp(x, rate = 1/2)) where w <- sample(seq(0.3, 0.7, 0.1), 1).

I need to ensure that the correlation between p and q is positive (for both the full dataset and the independent subset).


Independent mixture beta \(q\)


Sample \(q\) from 0.99*rbeta(x, shape1 = 30, shape2 = 0.5) + 0.01*rbeta(x, shape1 = 0.5, shape2 = 30).


Dependent mixture beta \(q\)


Sample \(q\) at null SNPs from 0.99*rbeta(x, shape1 = 30, shape2 = 0.5) + 0.01*rbeta(x, shape1 = 0.5, shape2 = 30) and \(q\) at “causals” from 0.01*rbeta(x, shape1 = 30, shape2 = 0.5) + 0.99*rbeta(x, shape1 = 0.5, shape2 = 30) where “causals” are chosen as those SNPs defined in our 3 use cases above.


Comparing to James


Note that in the independent simulations, the correlation between p and q is close to 0. Infact, it may be (just) \(>0\) in the whole dataset but (just) \(<0\) in the independent subset of SNPs. This is what was responsible for the different colouring between mine and James’ method. I amend the cFDR function to return an error if the sign of the correlation is different between the whole dataset and the independent subset.


I compare the results from James’ KDE method (vly()), James’ empirical method (vl()) and my method for independent uniform \(q\).

Since James’ KDE method is most similar to mine, I directly compare the results from functional cFDR to James’ KDE cFDR (vly()). For high \(q\), James’ method increases the \(p\) values more and for low \(q\), James’ method decreases the \(p\) values (a lot) more. Also need to investigate the gap in points..


I step through mine and James’ KDE code to see what may be causing this descrepancy.

Firstly, both methods estimate the bivariate density.

I.e. we use a folding method so that the KDE doesn’t have to “come down” at p=1 (zp=0). [Note, results similar if I use n=200 and lims=c(0,12,0,12) in my method].


I then incorporate the adjustment directly (so that various terms cancel). Whereas James incorporates the adjustment seperately. My L curves are defined as the contours of the cFDR curves, whereas James’ are approximated using approx(cfx,xtest,ccut/correct[i],rule=2,method="const",f=1)$y where correct[i] incorporates the adjustment.

James’ L curves are defined on [0,1] whereas mine are defined on [lims[3], lims[4]].


Summarising annotation data



To do