\[\begin{equation} \begin{split} \hat{cFDR} &= P(H_0^p\text{ | }P\leq p, Q\leq q)\\[2ex] &= \dfrac{P(P\leq p\text{ | }H_0^p, Q\leq q) P(H_0^p\text{ | }Q\leq q)}{P(P\leq p \text{ | }Q\leq q)}\\[2ex] &= \dfrac{p \times P(H_0^p\text{ | }Q\leq q)}{P(P\leq p \text{ | }Q\leq q)}\\[2ex] &= \dfrac{p \times \frac{P(H_0^p) P(Q\leq q\text{ | }H_0^p)}{P(Q\leq q)}}{\frac{P(P\leq p, Q\leq q)}{P(Q\leq q)}}\\[2ex] &= \dfrac{p \times P(H_0^p) P(Q\leq q\text{ | }H_0^p)}{P(P\leq p, Q\leq q)} \end{split} \end{equation}\]
I model \(f(Q\text{ | }H_0^p) \approx f(Q\text{ | }P>1/2)\) and use this to approximate \(P(Q\leq q\text{ | }H_0^p)\) for Q1, Q2 and Q3.
I use the mclust
R package to model \(f(Q1\text{ | }H_0^p) \approx f(Q1\text{ | }P>1/2)\) using a Gaussian mixture model. This is modelled using 6 Gaussians, with their weights, means and variances shown in the table below.
## -------------------------------------------------------
## Density estimation via Gaussian finite mixture modeling
## -------------------------------------------------------
##
## Mclust V (univariate, unequal variance) model with 6 components:
##
## log-likelihood n df BIC ICL
## -25788.06 49359 17 -51759.84 -79139.49
## Weight Mean Variance
## 1 0.11624221 -0.7104803 0.0004452516
## 2 0.25382864 -0.6299951 0.0034543467
## 3 0.20929539 -0.4406951 0.0185984792
## 4 0.19171657 0.1426794 0.1336937072
## 5 0.08149797 0.8100512 0.0061517932
## 6 0.14741922 1.2848616 0.4350631819
This density estimation can then be used to approximate \(P(Q1\leq q\text{ | }H_0^p)\).
q1_null <- function(x) {
sum <- 0
for(i in 1:length(q1_mclust$parameters$mean)){
sum <- sum + q1_mclust$parameters$pro[i] * pnorm(x,mean = q1_mclust$parameters$mean[i], sd = sqrt(q1_mclust$parameters$variance$sigmasq[i]))
}
print(sum)
}
For example, \(P(Q1\leq 0.1\text{ | }H_0^p) \approx 0.672\).
q1_null(0.1)
## [1] 0.6716489
\(f(Q2\text{ | }H_0^p) \approx f(Q2\text{ | }P>1/2)\) is modelled by 8 Gaussians, with their weights, means and variances shown in the table below.
## -------------------------------------------------------
## Density estimation via Gaussian finite mixture modeling
## -------------------------------------------------------
##
## Mclust V (univariate, unequal variance) model with 8 components:
##
## log-likelihood n df BIC ICL
## -22538.47 49359 23 -45325.49 -82751
## Weight Mean Variance
## 1 0.03014613 -1.7862069 6.277092e-05
## 2 0.01623277 -1.6705778 4.426572e-04
## 3 0.04362900 -1.4327442 2.665499e-02
## 4 0.33556434 0.1584163 1.404298e-02
## 5 0.04483450 0.2047153 2.681079e-04
## 6 0.17355507 0.2645235 1.848158e-03
## 7 0.34488972 0.0687391 5.302849e-01
## 8 0.01114846 2.0400943 1.431901e-02
This density estimation can then be used to approximate \(P(Q2\leq q\text{ | }H_0^p)\).
q2_null <- function(x) {
sum <- 0
for(i in 1:length(q2_mclust$parameters$mean)){
sum <- sum + q2_mclust$parameters$pro[i] * pnorm(x,mean = q2_mclust$parameters$mean[i], sd = sqrt(q2_mclust$parameters$variance$sigmasq[i]))
}
print(sum)
}
For example, \(P(Q2\leq 0.1\text{ | }H_0^p) \approx 0.374\).
q2_null(0.1)
## [1] 0.3727371
\(f(Q3\text{ | }H_0^p) \approx f(Q3\text{ | }P>1/2)\) is modelled by 5 Gaussians, with their weights, means and variances shown in the table below.
## -------------------------------------------------------
## Density estimation via Gaussian finite mixture modeling
## -------------------------------------------------------
##
## Mclust V (univariate, unequal variance) model with 5 components:
##
## log-likelihood n df BIC ICL
## -18809.26 49359 14 -37769.81 -82026.24
## Weight Mean Variance
## 1 0.01596818 -1.95645661 0.240837277
## 2 0.32882455 -0.22453928 0.016075237
## 3 0.16677963 -0.04297176 0.005651741
## 4 0.14404266 0.06747826 0.014848945
## 5 0.34438497 0.26579276 0.285519094
This density estimation can then be used to approximate \(P(Q3\leq q\text{ | }H_0^p)\).
q3_null <- function(x) {
sum <- 0
for(i in 1:length(q3_mclust$parameters$mean)){
sum <- sum + q3_mclust$parameters$pro[i] * pnorm(x,mean = q3_mclust$parameters$mean[i], sd = sqrt(q3_mclust$parameters$variance$sigmasq[i]))
}
print(sum)
}
For example, \(P(Q3\leq 0.1\text{ | }H_0^p) \approx 0.722\).
q3_null(0.1)
## [1] 0.7224955
I’ve checked with the empirical values (by counting), and these probabilities look reasonable.
I use a bivariate mixture of normals to model \(f(ZP, Q)\) and then use this to approximate \(P(P\leq p, Q\leq q)\). Note that I convert the P values to Z scores to be able to model them using a GMM (P values are uniformly distributed so it doesn’t make sense to model them using a GMM).
df_pq <- data.frame("z" = -qnorm(p/2), "q1" = q1)
# joint <- densityMclust(df_pq)
joint <- readRDS("zq_joint.RDS")
summary(joint, parameters = TRUE)
## -------------------------------------------------------
## Density estimation via Gaussian finite mixture modeling
## -------------------------------------------------------
##
## Mclust VVI (diagonal, varying volume and shape) model with 9 components:
##
## log-likelihood n df BIC ICL
## -223412.4 121879 44 -447340.1 -526574.5
##
## Mixing probabilities:
## 1 2 3 4 5 6
## 0.05023858 0.09498104 0.05950001 0.12375738 0.18296299 0.12167108
## 7 8 9
## 0.04682377 0.14927248 0.17079268
##
## Means:
## [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## z 4.3189527 0.2274909 1.213810 0.3502670 1.4406614 1.9182004 0.9496216
## q1 0.5008692 -0.6315374 1.816454 0.5592739 0.3556564 -0.5668422 0.8231415
## [,8] [,9]
## z 0.6797708 0.9382327
## q1 -0.4162857 -0.6732045
##
## Variances:
## [,,1]
## z q1
## z 6.638785 0.0000000
## q1 0.000000 0.6277607
## [,,2]
## z q1
## z 0.02130109 0.000000000
## q1 0.00000000 0.005886361
## [,,3]
## z q1
## z 0.5789952 0.0000000
## q1 0.0000000 0.2186955
## [,,4]
## z q1
## z 0.04845419 0.0000000
## q1 0.00000000 0.3760025
## [,,5]
## z q1
## z 0.4883255 0.0000000
## q1 0.0000000 0.2399573
## [,,6]
## z q1
## z 0.7846349 0.0000000
## q1 0.0000000 0.0119245
## [,,7]
## z q1
## z 0.3203132 0.000000000
## q1 0.0000000 0.002147366
## [,,8]
## z q1
## z 0.1450473 0.0000000
## q1 0.0000000 0.0263504
## [,,9]
## z q1
## z 0.2351207 0.000000000
## q1 0.0000000 0.002233113
# plot(joint, what = "density", data = df_pq,
# drawlabels = FALSE, points.pch = 20)
plot(joint, what = "density", type = "hdr")
plot(joint, what = "density", type = "persp")
test_prob <- function(z0, q0) {
sum <- 0
for(i in 1:ncol(joint$parameters$mean)){
sum <- sum + (joint$parameters$pro[i] * pmvnorm(lower = c(z0,-Inf), upper = c(Inf, q0), mean = joint$parameters$mean[,i], sigma = joint$parameters$variance$sigma[, , i]))[1]
}
print(sum)
}
test_prob(z0 = 3, q0 = 2)
## [1] 0.05015193
I begin by simulating some data and setting the parameter values.
set.seed(1) # ensure reproducibility
n=10000; n1p=100; n1pq=100; n1q=100 # parameters
zp=c(rnorm(n1p,sd=3), rnorm(n1q,sd=1),
rnorm(n1pq,sd=3), rnorm(n-n1p-n1q-n1pq,sd=1)) # simulate z-scores corresponding to p
zq=c(rnorm(n1p,sd=1), rnorm(n1q,sd=3),
rnorm(n1pq,sd=3), rnorm(n-n1p-n1q-n1pq,sd=1)) # simulate z-scores corresponding to q
p=2*pnorm(-abs(zp)); q=2*pnorm(-abs(zq)) # convert to p-values
fold_id=(1:n) %% 3 # chr for each p,q pair
candidate_indices=which(p<0.01 | q< 0.001)
fold1=which(fold_id==1) # which p,q pair on chr 1
ind1=intersect(candidate_indices,fold1) # calculate coords for indices on chr 1
indices=ind1
mode=2 # specifies leave-one-out method
fold=fold1 # parameter fold specifies indices to leave out
nt = 5000 # number of test points in x-direction (p)
nv = 1000 # resolution for constructing L-curves (y=q)
p_threshold = 0
adj = FALSE
gx = 10^-5
closed = TRUE
I then follow James’ code to obtain the cfdr values for each \((p_i,q_i)\) pair, i.e. ccut_i=cfdr(zp_i, zq_i)
. He does this by:
Plotting cfsub_i-gx*xtest + gx*mx
(strictly increasing version of cfdr for fixed q).
Finding the corresponding cfsub_i-gx*xtest + gx*mx
(y) value for zp=zp[indices[i]]
. This is named ccut
.
He states: “Set ccut NOT equal to cfdr at the points; needs to be adjusted since an additional test point is used in determination of L” - not sure what this means.
zp=-qnorm(p/2); zq=-qnorm(q/2) # convert to Z scores
if (any(!is.finite(zp+zq))) stop("P-values p,q must be in [1e-300,1]")
mx=max(c(10,abs(-qnorm(p/2))))
my=max(c(10,abs(-qnorm(q/2)))) # maximum limits for integration
if (is.null(indices)) {
if (is.null(at)) stop("One of the parameters 'indices', 'at' must be set")
ccut=at
mode=0
} else ccut=rep(0,length(indices))
yval2=seq(0,my,length.out=nv+1)[1:nv] # equally spaced values
xval2=outer(rep(1,length(ccut)),yval2) # yval2 repeated for each ind1
pval2=2*pnorm(-yval2) # convert yval2 to p vals
xtest=seq(0,mx,length.out=nt) # Z value scale
ptest=2*pnorm(-xtest) # P value scale
ccut=rep(0,length(indices))
for (i in 1:length(indices)) {
w=which(zq[-fold] >= zq[indices[i]])
if (length(w) >= 1) {
cfsub= (1+(1/length(w)))*ptest/(1+(1/length(w))-ecdf(zp[-fold][w])(xtest)); cfsub=cummin(cfsub)
ccut[i]=approx(xtest,cfsub-gx*xtest + gx*mx,zp[indices[i]],rule=2)$y
} else ccut[i]=p[indices[i]]
}
An example:
w_ex=which(zq[-fold] >= zq[indices[50]])
cfsub_ex= (1+(1/length(w_ex)))*ptest/(1+(1/length(w_ex))-ecdf(zp[-fold][w_ex])(xtest))
cfsub_ex=cummin(cfsub_ex)
ccut_1=approx(xtest,cfsub_ex-gx*xtest + gx*mx,zp[indices[50]],rule=2)$y
par(mfrow = c(1,1))
plot(xtest,cfsub_ex-gx*xtest + gx*mx, xlab = "Z values (xtest)", main = "Example for (p[50],q[50]) pair")
abline(v = zp[indices[50]], lty = "dashed")
text(x = 5, y = 0.5, "x = zp[50] = 3.06")
abline(h = ccut_1, lty = "dashed")
text(x = 1, y = 0.25, "y = ccut[50] = 0.229")
Note that James’ cfsub
function is slightly odd, it uses 1+(1/length(w))
(which is usually approximately 1) and then approximates \(P(P\leq p|Q\leq q)\) using an ecdf on the Z scale. He subtracts the values from 1 to get back to the P value orientation. His approximation is therefore,
\[\begin{equation} \begin{split} \text{cfsub} &= \dfrac{(1+\dfrac{1}{|w|})*ptest}{(1+\dfrac{1}{|w|})-ecdf_{zp|zq\geq zq[1]}(xtest)}\\[2ex] &\approx \dfrac{ptest}{ecdf_{p|q\leq q[1]}(ptest)}. \end{split} \end{equation}\]
ecdf_z <- ecdf(zp[-fold][w])(xtest)
ecdf_p <- ecdf(p[-fold][w])(ptest)
par(mfrow = c(1,2))
plot(ecdf_z, ecdf_p)
plot(1-ecdf_z, ecdf_p)
Then, for each \((p_i,q_i)\) pair, he finds the xval2[j]
value such that cfdr(xval2[j],yval2[j])=ccut_i
.
ccut=ccut*(1+ 1e-6) # ccut + 1e-8 # prevent floating-point comparison errors
if (mode==2) {
if (adj) {
correct=cummin((1+ecdf(q[-fold][which(p[-fold]>0.5)])(pval2)*length(p[-fold]))/
(1+ ecdf(q[-fold])(pval2)*length(p[-fold])))
if (!is.null(indices)) correct_ccut=approx(pval2,correct,q[indices],rule=2)$y #cummin((1+ecdf(pc[which(p>0.5)])(pc)*length(p))/(1+ rank(pc)))
} else {
correct=rep(1,length(pval2)) # adjustment factor for pc[i]
correct_ccut=rep(1,length(ccut))
}
if (!is.null(indices)) ccut=ccut*correct_ccut
for (i in 1:length(yval2)) {
w=which(zq[-fold] > yval2[i])
if (length(w) >= 1 ) {
cfsub= (1+(1/length(w)))*ptest/(1+(1/length(w))-ecdf(zp[-fold][w])(xtest)); cfsub=cummin(cfsub)
xval2[,i]=approx((cfsub-gx*xtest + gx*mx)*correct[i],xtest,ccut,rule=2,f=1)$y
} else xval2[,i]=-qnorm((ccut/correct_ccut)/2)
}
}
An example:
test_xval <- rep(0, length(yval2))
for (i in 1:length(yval2)) {
w=which(zq[-fold] > yval2[i])
if (length(w) >= 1 ) {
cfsub= (1+(1/length(w)))*ptest/(1+(1/length(w))-ecdf(zp[-fold][w])(xtest)); cfsub=cummin(cfsub)
test_xval[i]=approx((cfsub-gx*xtest + gx*mx)*correct[i],xtest,ccut[50],rule=2,f=1)$y
} else test_xval[i]=-qnorm((ccut[50]/correct_ccut[50])/2)
}
plot((cfsub-gx*xtest + gx*mx)*correct[i],xtest, main = "Example for (p[50],q[50]) pair", ylab = "Z values (xtest)")
abline(v = ccut[50], lty = "dashed")
text(x = 0.4, y = 6, "x = ccut[5] = 0.229")
abline(h = test_xval[i], lty = "dashed")
text(x = 0.1, y = 3, "y = xval2[i] = 1.2")
Overall, it seems that James:
Finds ccut_i=cfdr(p_i,q_i)
for each \((p_i, q_i)\) pair.
For each element \(j\) of yval2
(i.e. how many equally spaced points yval2 is made up of) finds the xval2
value such that cfdr(xval2[i,j], yval2[i,j]) = ccut_i
.
My cfsub approximation is,
\[\begin{equation} \text{cfsub_{anna}} = \dfrac{ptest * P(q\leq q[1]|p>1/2)}{P(p\leq ptest, q\leq q[1])} \end{equation}\]
where \(P(q\leq q[1]|p>1/2)\) is approximated using a GMM and \(P(p\leq ptest, q\leq q[1])\) is approximated using a bivariate GMM.
# GMM for P(Q | H_0)
zq_h0 <- densityMclust(zq[-fold][which(p[-fold]>1/2)])
# equation to evaluate P(Q < q| H_0)
zq_h0_func <- function(x) {
sum <- 0
for(i in 1:length(zq_h0$parameters$mean)){
sum <- sum + zq_h0$parameters$pro[i] * pnorm(x,mean = zq_h0$parameters$mean[i], sd = sqrt(zq_h0$parameters$variance$sigmasq[i]), lower.tail = FALSE) # lower.tail FALSE to get P(Z>z)
}
return(sum)
}
par(mfrow=c(1,1))
plot(zq_h0, what = "density", data = zq[-fold][which(p[-fold]>1/2)], breaks = 100, xlab = "ZQ | P>1/2")
x <- seq(0, 7, length=100)
hx <- dnorm(x, mean = zq_h0$parameters$mean[1], sd = sqrt(zq_h0$parameters$variance$sigmasq[1]))
degf <- c(1, 3, 8, 30)
colors <- c("red", "blue", "darkgreen", "gold", "black")
labels <- c("df=1", "df=3", "df=8", "df=30", "normal")
for (i in 1:length(zq_h0$parameters$mean)){
lines(x, zq_h0$parameters$pro[i] * dnorm(x,mean = zq_h0$parameters$mean[i], sd = sqrt(zq_h0$parameters$variance$sigmasq[i])), lwd=2, col=colors[i])
}
zq_h0_func(0.8)
## [1] 0.4135718
# Bivariate GMM for P(P,Q)
df_zpzq <- data.frame("zp" = zp[-fold], "zq" = zq[-fold])
joint <- densityMclust(df_zpzq)
joint_func <- function(zp0, zq0) {
sum <- 0
for(i in 1:ncol(joint$parameters$mean)){
sum <- sum + (joint$parameters$pro[i] * pmvnorm(lower = c(zp0, zq0), upper = c(Inf, Inf), mean = joint$parameters$mean[,i], sigma = joint$parameters$variance$sigma[, , i]))[1]
}
return(sum)
}
ccut_me=rep(0,length(indices))
i = 50
cfsub_me_orig <- (ptest * zq_h0_func(zq[indices[i]]))/sapply(xtest, function(x) joint_func(x, zq[indices[i]]))
ccut_me=approx(xtest,cfsub_me_orig-gx*xtest + gx*mx,zp[indices[50]],rule=2)$y
# can rewrite the joint_func to take a vector as input later
I compare his results so far (black) to mine (red) [note I now stick to P value range].
w=which(zq[-fold] >= zq[indices[50]])
cfsub_james_orig <- (1+(1/length(w)))*ptest/(1+(1/length(w))-ecdf(zp[-fold][w])(xtest))
par(mfrow=c(1,3))
plot(ptest, cfsub_james_orig, pch = 16, cex = 0.5, xlab = "P", ylab = "cfdr", main = "Original")
points(ptest, cfsub_me_orig, col = "red", pch = 16, cex = 0.5)
cfsub_james=cummin(cfsub_james_orig)
cfsub_me=cummin(cfsub_me_orig)
plot(ptest, cfsub_james, pch = 16, cex = 0.5, xlab = "P", ylab = "cfdr", main = "Non-decreasing")
points(ptest, cfsub_me, col = "red", pch = 16, cex = 0.5)
plot(ptest, cfsub_james-gx*ptest + gx*mx, pch = 16, cex = 0.5, xlab = "P", ylab = "cfdr", main = "Strictly increasing")
points(ptest, cfsub_me-gx*ptest + gx*mx, col = "red", pch = 16, cex = 0.5)
par(mfrow=c(1,3))
plot(cfsub_me_orig, cfsub_james_orig, main = "Original", xlab = "Me", ylab = "James")
abline(0,1)
plot(cfsub_me, cfsub_james, main = "Non-decreasing", xlab = "Me", ylab = "James")
abline(0,1)
plot(cfsub_me-gx*ptest + gx*mx, cfsub_james-gx*ptest + gx*mx, main = "Strictly increasing", xlab = "Me", ylab = "James")
abline(0,1)
The difference in ccut
values for the 50th p,q pair using mine and James’ method is shown below:
Hmm something is going wrong…
Recall that his cfdr approximation is:
\[\begin{equation} \begin{split} \text{cfsub} &= \dfrac{(1+\dfrac{1}{|w|})*ptest}{(1+\dfrac{1}{|w|})-ecdf_{zp|zq\geq zq[1]}(xtest)}\\[2ex] &\approx \dfrac{ptest}{ecdf_{p|q\leq q[1]}(ptest)}. \end{split} \end{equation}\]
and mine is:
\[\begin{equation} \text{cfsub_{anna}} = \dfrac{ptest * P(q\leq q[1]|p>1/2)}{P(p\leq ptest, q\leq q[1])} \end{equation}\]
I write functions to obtain ccut
values using (i) James original method (on the Z scale), (ii) James method on the P value scale (just to check these give the same results as on the Z scale - they do) and (iii) my method.
james_ccut_z <- function(p,q,adj=TRUE,indices=NULL,mode=0,fold=NULL,nt=5000, nv=1000, gx=10^-5){
zp = -qnorm(p/2)
zq = -qnorm(q/2)
mx = max(c(10,abs(-qnorm(p/2))))
my = max(c(10,abs(-qnorm(q/2)))) # maximum limits for integration
#gx=0 #1/1000 # use for 'tiebreaks'- if a point is on a curve with nonzero area, force the L-curve through that point
ccut = rep(0,length(indices))
xtest = seq(0,mx,length.out=nt)
ptest = 2*pnorm(-xtest)
if (mode==2) {
ccut=rep(0,length(indices))
for (i in 1:length(indices)) {
w=which(zq[-fold] >= zq[indices[i]])
if (length(w) >= 1) {
cfsub= (1+(1/length(w)))*ptest/(1+(1/length(w))-ecdf(zp[-fold][w])(xtest)); cfsub=cummin(cfsub)
ccut[i]=approx(xtest,cfsub-gx*xtest + gx*mx,zp[indices[i]],rule=2)$y
} else ccut[i]=p[indices[i]]
}
}
return(ccut)
}
james_ccut_p <- function(p,q,adj=TRUE,indices=NULL,mode=0,fold=NULL,nt=5000, nv=1000, gx=10^-5){
zp = -qnorm(p/2)
zq = -qnorm(q/2)
mx = max(c(10,abs(-qnorm(p/2))))
my = max(c(10,abs(-qnorm(q/2)))) # maximum limits for integration
#gx=0 #1/1000 # use for 'tiebreaks'- if a point is on a curve with nonzero area, force the L-curve through that point
ccut = rep(0,length(indices))
xtest = seq(0,mx,length.out=nt)
ptest = 2*pnorm(-xtest)
if (mode==2) {
ccut=rep(0,length(indices))
for (i in 1:length(indices)) {
w=which(zq[-fold] >= zq[indices[i]])
if (length(w) >= 1) {
cfsub= (1+(1/length(w)))*ptest/(1+(1/length(w))-ecdf(zp[-fold][w])(xtest)); cfsub=cummin(cfsub)
ccut[i]=approx(ptest,cfsub-gx*ptest + gx*mx,p[indices[i]],rule=2)$y
} else ccut[i]=p[indices[i]]
}
}
return(ccut)
}
james_ccut_z_vals <- james_ccut_z(p, q, adj = FALSE, indices = ind1, mode = 2, fold = fold1)
james_ccut_p_test <- james_ccut_p(p, q, adj = FALSE, indices = ind1, mode = 2, fold = fold1)
par(mfrow=c(1,1))
plot(james_ccut_z_vals, james_ccut_p_test, main = "Comparing ccut value when on P vrs Z scale")
abline(0,1)
###########################
anna_ccut <- function(p,q,adj=TRUE,indices=NULL,mode=0,fold=NULL,nt=5000, nv=1000, gx=10^-5){
zp = -qnorm(p/2)
zq = -qnorm(q/2)
mx = max(c(10,abs(-qnorm(p/2))))
#gx=0 #1/1000 # use for 'tiebreaks'- if a point is on a curve with nonzero area, force the L-curve through that point
ccut = rep(0,length(indices))
xtest = seq(0,mx,length.out=nt)
ptest = 2*pnorm(-xtest)
# GMM for P(Q | H_0)
zq_h0 <- densityMclust(zq[-fold][which(p[-fold]>1/2)])
# equation to evaluate P(Q < q| H_0)
zq_h0_func <- function(x) {
sum <- 0
for(i in 1:length(zq_h0$parameters$mean)){
sum <- sum + zq_h0$parameters$pro[i] * pnorm(x,mean = zq_h0$parameters$mean[i], sd = sqrt(zq_h0$parameters$variance$sigmasq[i]), lower.tail = FALSE)
}
return(sum)
}
joint <- densityMclust(data.frame("zp" = zp[-fold], "zq" = zq[-fold]))
joint_func <- function(zp0, zq0) {
sum <- 0
for(i in 1:ncol(joint$parameters$mean)){
sum <- sum + (joint$parameters$pro[i] * pmvnorm(lower = c(zp0, zq0), upper = c(Inf, Inf), mean = joint$parameters$mean[,i], sigma = joint$parameters$variance$sigma[, , i]))[1]
}
return(sum)
}
if (!is.null(indices)) { # set ccut. NOT equal to cfdr at the points; needs to be adjusted since an additional test point is used in determination of L
if (mode==2) {
ccut=rep(0,length(indices))
for (i in 1:length(indices)) {
w=which(zq[-fold] >= zq[indices[i]])
if (length(w) >= 1) {
cfsub = (ptest * zq_h0_func(zq[indices[i]]))/sapply(xtest, function(x) joint_func(x, zq[indices[i]]))
cfsub =cummin(cfsub)
ccut[i] = approx(xtest,cfsub-gx*xtest + gx*mx,zp[indices[i]],rule=2)$y
} else ccut[i] = p[indices[i]]
}
}
}
return(ccut)
}
# anna_ccut_vals <- anna_ccut(p, q, adj = FALSE, indices = ind1, mode = 2, fold = fold1)
anna_ccut_vals <- readRDS("anna_ccut_vals.RDS")
par(mfrow=c(1,1))
plot(anna_ccut_vals, james_ccut_z_vals, xlab = "anna's ccut values", ylab = "james' ccut values", main = "ccut values")
abline(0,1)