5th March

Two options for cFDR

My method of using non-bounded KDEs gives V values that never reach 1. Instead, we investigate using ecdfs. Integrating over ecdfs rather than KDEs extends the range of V values, meaning small V get smaller and V near to 1 get nearer to 1 (regardless of the annotation).

Our two options are:

Chris’ method: Convert James’ vl() function to take a continuous q (this new function is called vl2()) and integrate ecdf over these L curves. This method takes 32 mins for all 121,000 SNPs.
Combo method: Integrate ecdf over my L curves (convert q to \([0,1]\) range, use James’ vl() function to find L curves, convert \(y\) co-ordinate of these back to \((-Inf, Inf)\) range). This method takes 29 mins for all 121,000 SNPs.

Chris’ method gives spikier L curves:

Looking at the results for all 121,000 SNPs, we see that Chris’ method adds noise. However, I am happiest with this method as it re-weights things well according to the annotation, whereas the combo method gives some very small V values for SNPs with inactive annotations.

Iterating results

We need to iterate over many dimensions, as only ~50% of the total variation in the data set is captured by the first dimension. However, things seem to be going wrong when iterating…

Conditional Q-Q curves

I investigate the relationship between P and Q in each analysis, focusing on the results using Chris’ method.

Conditional Q-Q curves show enrichment of SNPs associated with T1D as a function of association with functional annotations summarised by various dimensions from the MCA analysis (see Andreassen et al.).

Note, here I am plotting against x=-log10((1:n)/(n+1)) which are the random quantiles of a -log10 uniform distribution.

These results mostly make sense:

Dimension 1: Higher values of Q predict lower P (yep high Q is promoter/enhancer and low Q is heterochromatin).
Dimension 2: Lower values of Q predict lower P (yep low is transcribed) until top quantile which predicts lower P too (yep high is promoter/enhancer).
Dimension 3: Higher values of Q predict lower P (hmm low Q is promoter and high Q is reg permissive?).

Alternative stratified Q-Q plots

Rather than looking at non-overlapping stratum of Q, I construct Q-Q plots for overlapping stratum of Q (which I’ve seen more in the literature). The cutoff values are quantiles of the q distribution.

Distribution of Q

Chris said: “ok - so can you show me the distribution of q? I wonder if there is a bunch of q near its minimum value, all with high-ish p? This might cause the simple ecdf to fail, while the combo method smooths over them (by simply not fitting the points of the curve in that range)”.

From the conditional Q-Q plots above we see that in the smallest quantile of Q2, the P values are the smallest which gives merit to what Chris says above.

However, the combo method also fails?

Final comments

Paper that uses functional annotations and cFDR in a different way: https://academic.oup.com/hmg/article/26/22/4530/4097760#113429336
cFDR write up: https://www.overleaf.com/8936597366gszgkkddjtpr
Useful plot code that I could ammend: https://github.com/KehaoWu/GWAScFDR/blob/master/R/Plot.R