I simulate GWAS results for 114,310 SNPs on chromosome 22, including 37 CVs.
I consider 3 use cases:
[I have the code ready, but need to finalise the functional cFDR function first].
I first consider simulations where \(q\) is sampled from a uniform distribution, so that I can compare results to those using empirical cFDR.
Sample \(q\) from w*exp(-rexp(x, rate = 1)) + (1-w)*exp(-rexp(x, rate = 1/2))
where w <- sample(seq(0.3, 0.7, 0.1), 1)
.
I need to ensure that the correlation between p and q is positive (for both the full dataset and the independent subset).
Sample \(q\) from 0.99*rbeta(x, shape1 = 30, shape2 = 0.5) + 0.01*rbeta(x, shape1 = 0.5, shape2 = 30)
.
Sample \(q\) at null SNPs from 0.99*rbeta(x, shape1 = 30, shape2 = 0.5) + 0.01*rbeta(x, shape1 = 0.5, shape2 = 30)
and \(q\) at “causals” from 0.01*rbeta(x, shape1 = 30, shape2 = 0.5) + 0.99*rbeta(x, shape1 = 0.5, shape2 = 30)
where “causals” are chosen as those SNPs defined in our 3 use cases above.
Note that in the independent simulations, the correlation between p and q is close to 0. Infact, it may be (just) \(>0\) in the whole dataset but (just) \(<0\) in the independent subset of SNPs. This is what was responsible for the different colouring between mine and James’ method. I amend the cFDR function to return an error if the sign of the correlation is different between the whole dataset and the independent subset.
I compare the results from James’ KDE method (vly()
), James’ empirical method (vl()
) and my method for independent uniform \(q\).
Since James’ KDE method is most similar to mine, I directly compare the results from functional cFDR to James’ KDE cFDR (vly()
). For high \(q\), James’ method increases the \(p\) values more and for low \(q\), James’ method decreases the \(p\) values (a lot) more. Also need to investigate the gap in points..
I step through mine and James’ KDE code to see what may be causing this descrepancy.
Firstly, both methods estimate the bivariate density.
vly()
: kde2d(zp[sub], zq[sub], n=200, lims=c(0,12,0,12))
cFDR()
: kde2d(c(zp, -zp), c(q, q), n = c(300, 500), lims = lims)
I.e. we use a folding method so that the KDE doesn’t have to “come down” at p=1
(zp=0
). [Note, results similar if I use n=200
and lims=c(0,12,0,12)
in my method].
I then incorporate the adjustment directly (so that various terms cancel). Whereas James incorporates the adjustment seperately. My L curves are defined as the contours of the cFDR curves, whereas James’ are approximated using approx(cfx,xtest,ccut/correct[i],rule=2,method="const",f=1)$y
where correct[i]
incorporates the adjustment.
James’ L curves are defined on [0,1] whereas mine are defined on [lims[3], lims[4]].
PCAmix: Current method, but possible non-monotonicity problem that orthogonal rotations (e.g. varimax) don’t seem to fix.
Non-negative matrix factorisation: but the following annotations have negative entries MAF_Adj_Predicted_Allele_Age; MAF_Adj_LLD_AFR; MAF_Adj_ASMC
and I can’t see how I can easily make them non-negative.
Finalise code.
Run simulations.
Run on Asthma data (once I’ve sorted out some dimensionality reduction for the annotation data).
Make R package.