### 1. Tying up loose ends

#### Comparing mixtools and mclust for density estimation using GMM

• mclust::densityMclust takes on average 10 seconds for 50,000 data points.

• mixtools::normalmixEM takes on average 24 seconds for 50,000 data points and requires specification of how may Gaussians.

#### Estimating $$P(P\leq p|Q\leq q)$$ (denominator in cfdr)

At the moment I am using:

$$$P(P\leq p|Q\leq q) = \dfrac{P(P\leq p, Q\leq q)}{P(Q\leq q)}$$$

where $$P(Q\leq q)$$ cancels with part of the adjustment factor on the numerator.

$$P(P\leq p, Q\leq q)$$ is then estimated using the joint_func_zq function.

For each (p,q) pair…

• joint_func_zq is called nt=5000 times (length of xtest) to generate the cfdr curve to estimate ccut.

• cfdr curves are then fit for each entry in yval2. Meaning that for each (p,q) pair, joint_func_zq is called nv=1000 * nt=5000 times (length of yval2 * length of xtest).

That means for each (p,q) pair, joint_func_zq is called 25,000,000,000 times!! It takes 4.6 milliseconds to call but in my sapply loop, it takes 14 seconds to fit to nt=5000 x coordinates.

Alternatively, Chris has suggested estimating this conditional probability directly using a univariate mixture of normals. I.e. we fit it to all (SIGNED) zp values and then do something else… We use signed z scores because then hopefully all the means are approximately 0. –> may come back to this later.

### 2. Taking a step back

I summarise the methods I have explored so far.

The idea now is to use the same density estimate to (i) generate the L curves (ii) estimate $$Q|H_0^p$$ to integrate the L curves over.

### vl vrs vlyl

I investigate using KDEs for the L curves and for estimating the null to integrate. Using James’ standard code in the cfdr package, I observe that the L curves are pretty different when using vlyl() compared with vl(). [Note that vlyl() includes the adjustment so need to specify adj=TRUE in the vl() function].