Aim: Write cFDR parametrically


\[\begin{equation} \begin{split} \hat{cFDR} &= P(H_0^p\text{ | }P\leq p, Q\leq q)\\[2ex] &= \dfrac{P(P\leq p\text{ | }H_0^p, Q\leq q) P(H_0^p\text{ | }Q\leq q)}{P(P\leq p \text{ | }Q\leq q)}\\[2ex] &= \dfrac{p \times P(H_0^p\text{ | }Q\leq q)}{P(P\leq p \text{ | }Q\leq q)}\\[2ex] &= \dfrac{p \times \frac{P(H_0^p) P(Q\leq q\text{ | }H_0^p)}{P(Q\leq q)}}{\frac{P(P\leq p, Q\leq q)}{P(Q\leq q)}}\\[2ex] &= \dfrac{p \times P(H_0^p) P(Q\leq q\text{ | }H_0^p)}{P(P\leq p, Q\leq q)} \end{split} \end{equation}\]


1. Modelling \(f(Q\text{ | }H_0^p)\)


I model \(f(Q\text{ | }H_0^p) \approx f(Q\text{ | }P>1/2)\) and use this to approximate \(P(Q\leq q\text{ | }H_0^p)\) for Q1, Q2 and Q3.


Q1

I use the mclust R package to model \(f(Q1\text{ | }H_0^p) \approx f(Q1\text{ | }P>1/2)\) using a Gaussian mixture model. This is modelled using 6 Gaussians, with their weights, means and variances shown in the table below.

## ------------------------------------------------------- 
## Density estimation via Gaussian finite mixture modeling 
## ------------------------------------------------------- 
## 
## Mclust V (univariate, unequal variance) model with 6 components: 
## 
##  log-likelihood     n df       BIC       ICL
##       -25788.06 49359 17 -51759.84 -79139.49
##       Weight       Mean     Variance
## 1 0.11624221 -0.7104803 0.0004452516
## 2 0.25382864 -0.6299951 0.0034543467
## 3 0.20929539 -0.4406951 0.0185984792
## 4 0.19171657  0.1426794 0.1336937072
## 5 0.08149797  0.8100512 0.0061517932
## 6 0.14741922  1.2848616 0.4350631819

This density estimation can then be used to approximate \(P(Q1\leq q\text{ | }H_0^p)\).

q1_null <- function(x) {
  sum <- 0
  for(i in 1:length(q1_mclust$parameters$mean)){
  sum <- sum + q1_mclust$parameters$pro[i] * pnorm(x,mean = q1_mclust$parameters$mean[i], sd = sqrt(q1_mclust$parameters$variance$sigmasq[i]))
  }
  print(sum)
  }

For example, \(P(Q1\leq 0.1\text{ | }H_0^p) \approx 0.672\).

q1_null(0.1)
## [1] 0.6716489

Q2

\(f(Q2\text{ | }H_0^p) \approx f(Q2\text{ | }P>1/2)\) is modelled by 8 Gaussians, with their weights, means and variances shown in the table below.

## ------------------------------------------------------- 
## Density estimation via Gaussian finite mixture modeling 
## ------------------------------------------------------- 
## 
## Mclust V (univariate, unequal variance) model with 8 components: 
## 
##  log-likelihood     n df       BIC    ICL
##       -22538.47 49359 23 -45325.49 -82751
##       Weight       Mean     Variance
## 1 0.03014613 -1.7862069 6.277092e-05
## 2 0.01623277 -1.6705778 4.426572e-04
## 3 0.04362900 -1.4327442 2.665499e-02
## 4 0.33556434  0.1584163 1.404298e-02
## 5 0.04483450  0.2047153 2.681079e-04
## 6 0.17355507  0.2645235 1.848158e-03
## 7 0.34488972  0.0687391 5.302849e-01
## 8 0.01114846  2.0400943 1.431901e-02

This density estimation can then be used to approximate \(P(Q2\leq q\text{ | }H_0^p)\).

q2_null <- function(x) {
  sum <- 0
  for(i in 1:length(q2_mclust$parameters$mean)){
  sum <- sum + q2_mclust$parameters$pro[i] * pnorm(x,mean = q2_mclust$parameters$mean[i], sd = sqrt(q2_mclust$parameters$variance$sigmasq[i]))
  }
  print(sum)
  }

For example, \(P(Q2\leq 0.1\text{ | }H_0^p) \approx 0.374\).

q2_null(0.1)
## [1] 0.3727371

Q3

\(f(Q3\text{ | }H_0^p) \approx f(Q3\text{ | }P>1/2)\) is modelled by 5 Gaussians, with their weights, means and variances shown in the table below.

## ------------------------------------------------------- 
## Density estimation via Gaussian finite mixture modeling 
## ------------------------------------------------------- 
## 
## Mclust V (univariate, unequal variance) model with 5 components: 
## 
##  log-likelihood     n df       BIC       ICL
##       -18809.26 49359 14 -37769.81 -82026.24
##       Weight        Mean    Variance
## 1 0.01596818 -1.95645661 0.240837277
## 2 0.32882455 -0.22453928 0.016075237
## 3 0.16677963 -0.04297176 0.005651741
## 4 0.14404266  0.06747826 0.014848945
## 5 0.34438497  0.26579276 0.285519094

This density estimation can then be used to approximate \(P(Q3\leq q\text{ | }H_0^p)\).

q3_null <- function(x) {
  sum <- 0
  for(i in 1:length(q3_mclust$parameters$mean)){
  sum <- sum + q3_mclust$parameters$pro[i] * pnorm(x,mean = q3_mclust$parameters$mean[i], sd = sqrt(q3_mclust$parameters$variance$sigmasq[i]))
  }
  print(sum)
  }

For example, \(P(Q3\leq 0.1\text{ | }H_0^p) \approx 0.722\).

q3_null(0.1)
## [1] 0.7224955

I’ve checked with the empirical values (by counting), and these probabilities look reasonable.


2. Modelling \(f(P, Q)\)


I use a bivariate mixture of normals to model \(f(ZP, Q)\) and then use this to approximate \(P(P\leq p, Q\leq q)\). Note that I convert the P values to Z scores to be able to model them using a GMM (P values are uniformly distributed so it doesn’t make sense to model them using a GMM).

df_pq <- data.frame("z" = -qnorm(p/2), "q1" = q1)

# joint <- densityMclust(df_pq)

joint <- readRDS("zq_joint.RDS")
summary(joint, parameters = TRUE)
## ------------------------------------------------------- 
## Density estimation via Gaussian finite mixture modeling 
## ------------------------------------------------------- 
## 
## Mclust VVI (diagonal, varying volume and shape) model with 9 components: 
## 
##  log-likelihood      n df       BIC       ICL
##       -223412.4 121879 44 -447340.1 -526574.5
## 
## Mixing probabilities:
##          1          2          3          4          5          6 
## 0.05023858 0.09498104 0.05950001 0.12375738 0.18296299 0.12167108 
##          7          8          9 
## 0.04682377 0.14927248 0.17079268 
## 
## Means:
##         [,1]       [,2]     [,3]      [,4]      [,5]       [,6]      [,7]
## z  4.3189527  0.2274909 1.213810 0.3502670 1.4406614  1.9182004 0.9496216
## q1 0.5008692 -0.6315374 1.816454 0.5592739 0.3556564 -0.5668422 0.8231415
##          [,8]       [,9]
## z   0.6797708  0.9382327
## q1 -0.4162857 -0.6732045
## 
## Variances:
## [,,1]
##           z        q1
## z  6.638785 0.0000000
## q1 0.000000 0.6277607
## [,,2]
##             z          q1
## z  0.02130109 0.000000000
## q1 0.00000000 0.005886361
## [,,3]
##            z        q1
## z  0.5789952 0.0000000
## q1 0.0000000 0.2186955
## [,,4]
##             z        q1
## z  0.04845419 0.0000000
## q1 0.00000000 0.3760025
## [,,5]
##            z        q1
## z  0.4883255 0.0000000
## q1 0.0000000 0.2399573
## [,,6]
##            z        q1
## z  0.7846349 0.0000000
## q1 0.0000000 0.0119245
## [,,7]
##            z          q1
## z  0.3203132 0.000000000
## q1 0.0000000 0.002147366
## [,,8]
##            z        q1
## z  0.1450473 0.0000000
## q1 0.0000000 0.0263504
## [,,9]
##            z          q1
## z  0.2351207 0.000000000
## q1 0.0000000 0.002233113
# plot(joint, what = "density", data = df_pq,
#      drawlabels = FALSE, points.pch = 20)
plot(joint, what = "density", type = "hdr")