### Aim: Write cFDR parametrically

$$$\begin{split} \hat{cFDR} &= P(H_0^p\text{ | }P\leq p, Q\leq q)\\[2ex] &= \dfrac{P(P\leq p\text{ | }H_0^p, Q\leq q) P(H_0^p\text{ | }Q\leq q)}{P(P\leq p \text{ | }Q\leq q)}\\[2ex] &= \dfrac{p \times P(H_0^p\text{ | }Q\leq q)}{P(P\leq p \text{ | }Q\leq q)}\\[2ex] &= \dfrac{p \times \frac{P(H_0^p) P(Q\leq q\text{ | }H_0^p)}{P(Q\leq q)}}{\frac{P(P\leq p, Q\leq q)}{P(Q\leq q)}}\\[2ex] &= \dfrac{p \times P(H_0^p) P(Q\leq q\text{ | }H_0^p)}{P(P\leq p, Q\leq q)} \end{split}$$$

• We can approximate $$P(H_0^p)$$ with 1 to generate a conservative cFDR estimator.
• $$P(Q\leq q\text{ | }H_0^p)$$ can be approximated by $$P(Q\leq q\text{ | }P>1/2)$$ which can be modelled using a Gaussian mixture.
• $$P(P\leq p, Q\leq q)$$ can be modelled using a bivariate mixture of normals.

### 1. Modelling $$f(Q\text{ | }H_0^p)$$

I model $$f(Q\text{ | }H_0^p) \approx f(Q\text{ | }P>1/2)$$ and use this to approximate $$P(Q\leq q\text{ | }H_0^p)$$ for Q1, Q2 and Q3.

#### Q1

I use the mclust R package to model $$f(Q1\text{ | }H_0^p) \approx f(Q1\text{ | }P>1/2)$$ using a Gaussian mixture model. This is modelled using 6 Gaussians, with their weights, means and variances shown in the table below.

## -------------------------------------------------------
## Density estimation via Gaussian finite mixture modeling
## -------------------------------------------------------
##
## Mclust V (univariate, unequal variance) model with 6 components:
##
##  log-likelihood     n df       BIC       ICL
##       -25788.06 49359 17 -51759.84 -79139.49
##       Weight       Mean     Variance
## 1 0.11624221 -0.7104803 0.0004452516
## 2 0.25382864 -0.6299951 0.0034543467
## 3 0.20929539 -0.4406951 0.0185984792
## 4 0.19171657  0.1426794 0.1336937072
## 5 0.08149797  0.8100512 0.0061517932
## 6 0.14741922  1.2848616 0.4350631819

This density estimation can then be used to approximate $$P(Q1\leq q\text{ | }H_0^p)$$.

q1_null <- function(x) {
sum <- 0
for(i in 1:length(q1_mclust$parameters$mean)){
sum <- sum + q1_mclust$parameters$pro[i] * pnorm(x,mean = q1_mclust$parameters$mean[i], sd = sqrt(q1_mclust$parameters$variance$sigmasq[i])) } print(sum) } For example, $$P(Q1\leq 0.1\text{ | }H_0^p) \approx 0.672$$. q1_null(0.1) ## [1] 0.6716489 #### Q2 $$f(Q2\text{ | }H_0^p) \approx f(Q2\text{ | }P>1/2)$$ is modelled by 8 Gaussians, with their weights, means and variances shown in the table below. ## ------------------------------------------------------- ## Density estimation via Gaussian finite mixture modeling ## ------------------------------------------------------- ## ## Mclust V (univariate, unequal variance) model with 8 components: ## ## log-likelihood n df BIC ICL ## -22538.47 49359 23 -45325.49 -82751 ## Weight Mean Variance ## 1 0.03014613 -1.7862069 6.277092e-05 ## 2 0.01623277 -1.6705778 4.426572e-04 ## 3 0.04362900 -1.4327442 2.665499e-02 ## 4 0.33556434 0.1584163 1.404298e-02 ## 5 0.04483450 0.2047153 2.681079e-04 ## 6 0.17355507 0.2645235 1.848158e-03 ## 7 0.34488972 0.0687391 5.302849e-01 ## 8 0.01114846 2.0400943 1.431901e-02 This density estimation can then be used to approximate $$P(Q2\leq q\text{ | }H_0^p)$$. q2_null <- function(x) { sum <- 0 for(i in 1:length(q2_mclust$parameters$mean)){ sum <- sum + q2_mclust$parameters$pro[i] * pnorm(x,mean = q2_mclust$parameters$mean[i], sd = sqrt(q2_mclust$parameters$variance$sigmasq[i]))
}
print(sum)
}

For example, $$P(Q2\leq 0.1\text{ | }H_0^p) \approx 0.374$$.

q2_null(0.1)
## [1] 0.3727371

#### Q3

$$f(Q3\text{ | }H_0^p) \approx f(Q3\text{ | }P>1/2)$$ is modelled by 5 Gaussians, with their weights, means and variances shown in the table below.

## -------------------------------------------------------
## Density estimation via Gaussian finite mixture modeling
## -------------------------------------------------------
##
## Mclust V (univariate, unequal variance) model with 5 components:
##
##  log-likelihood     n df       BIC       ICL
##       -18809.26 49359 14 -37769.81 -82026.24
##       Weight        Mean    Variance
## 1 0.01596818 -1.95645661 0.240837277
## 2 0.32882455 -0.22453928 0.016075237
## 3 0.16677963 -0.04297176 0.005651741
## 4 0.14404266  0.06747826 0.014848945
## 5 0.34438497  0.26579276 0.285519094

This density estimation can then be used to approximate $$P(Q3\leq q\text{ | }H_0^p)$$.

q3_null <- function(x) {
sum <- 0
for(i in 1:length(q3_mclust$parameters$mean)){
sum <- sum + q3_mclust$parameters$pro[i] * pnorm(x,mean = q3_mclust$parameters$mean[i], sd = sqrt(q3_mclust$parameters$variance\$sigmasq[i]))
}
print(sum)
}

For example, $$P(Q3\leq 0.1\text{ | }H_0^p) \approx 0.722$$.

q3_null(0.1)
## [1] 0.7224955

Iâ€™ve checked with the empirical values (by counting), and these probabilities look reasonable.

### 2. Modelling $$f(P, Q)$$

I use a bivariate mixture of normals to model $$f(ZP, Q)$$ and then use this to approximate $$P(P\leq p, Q\leq q)$$. Note that I convert the P values to Z scores to be able to model them using a GMM (P values are uniformly distributed so it doesnâ€™t make sense to model them using a GMM).

df_pq <- data.frame("z" = -qnorm(p/2), "q1" = q1)

# joint <- densityMclust(df_pq)

summary(joint, parameters = TRUE)
## -------------------------------------------------------
## Density estimation via Gaussian finite mixture modeling
## -------------------------------------------------------
##
## Mclust VVI (diagonal, varying volume and shape) model with 9 components:
##
##  log-likelihood      n df       BIC       ICL
##       -223412.4 121879 44 -447340.1 -526574.5
##
## Mixing probabilities:
##          1          2          3          4          5          6
## 0.05023858 0.09498104 0.05950001 0.12375738 0.18296299 0.12167108
##          7          8          9
## 0.04682377 0.14927248 0.17079268
##
## Means:
##         [,1]       [,2]     [,3]      [,4]      [,5]       [,6]      [,7]
## z  4.3189527  0.2274909 1.213810 0.3502670 1.4406614  1.9182004 0.9496216
## q1 0.5008692 -0.6315374 1.816454 0.5592739 0.3556564 -0.5668422 0.8231415
##          [,8]       [,9]
## z   0.6797708  0.9382327
## q1 -0.4162857 -0.6732045
##
## Variances:
## [,,1]
##           z        q1
## z  6.638785 0.0000000
## q1 0.000000 0.6277607
## [,,2]
##             z          q1
## z  0.02130109 0.000000000
## q1 0.00000000 0.005886361
## [,,3]
##            z        q1
## z  0.5789952 0.0000000
## q1 0.0000000 0.2186955
## [,,4]
##             z        q1
## z  0.04845419 0.0000000
## q1 0.00000000 0.3760025
## [,,5]
##            z        q1
## z  0.4883255 0.0000000
## q1 0.0000000 0.2399573
## [,,6]
##            z        q1
## z  0.7846349 0.0000000
## q1 0.0000000 0.0119245
## [,,7]
##            z          q1
## z  0.3203132 0.000000000
## q1 0.0000000 0.002147366
## [,,8]
##            z        q1
## z  0.1450473 0.0000000
## q1 0.0000000 0.0263504
## [,,9]
##            z          q1
## z  0.2351207 0.000000000
## q1 0.0000000 0.002233113
# plot(joint, what = "density", data = df_pq,
#      drawlabels = FALSE, points.pch = 20)
plot(joint, what = "density", type = "hdr")