`mclust::densityMclust`

takes on average 10 seconds for 50,000 data points.`mixtools::normalmixEM`

takes on average 24 seconds for 50,000 data points and requires specification of how may Gaussians.

At the moment I am using:

\[\begin{equation} P(P\leq p|Q\leq q) = \dfrac{P(P\leq p, Q\leq q)}{P(Q\leq q)} \end{equation}\]

where \(P(Q\leq q)\) cancels with part of the adjustment factor on the numerator.

\(P(P\leq p, Q\leq q)\) is then estimated using the `joint_func_zq`

function.

**For each (p,q) pairâ€¦**

`joint_func_zq`

is called nt=5000 times (length of xtest) to generate the cfdr curve to estimate`ccut`

.cfdr curves are then fit for each entry in yval2. Meaning that for each (p,q) pair,

`joint_func_zq`

is called nv=1000 * nt=5000 times (length of yval2 * length of xtest).

That means for each (p,q) pair, `joint_func_zq`

is called 25,000,000,000 times!! It takes 4.6 milliseconds to call but in my sapply loop, it takes 14 seconds to fit to `nt=5000`

x coordinates.

Alternatively, Chris has suggested estimating this conditional probability directly using a univariate mixture of normals. I.e. we fit it to all (SIGNED) zp values and then do something elseâ€¦ We use signed z scores because then hopefully all the means are approximately 0. â€“> may come back to this later.

I summarise the methods I have explored so far.

The idea now is to use the same density estimate to (i) generate the L curves (ii) estimate \(Q|H_0^p\) to integrate the L curves over.

I investigate using KDEs for the L curves and for estimating the null to integrate. Using Jamesâ€™ standard code in the cfdr package, I observe that the L curves are pretty different when using `vlyl()`

compared with `vl()`

. [Note that `vlyl()`

includes the adjustment so need to specify `adj=TRUE`

in the `vl()`

function].