1. De-inflation of large P


I have a problem whereby very large \(p\)-values never reach 1. This occurs whether I am estimating cfdr using either:

  1. “kgrid method”: \(cFdr=\dfrac{p}{kgrid}=\dfrac{p}{P(P\leq p, Q\leq q)/P(Q\leq q|H0)}\)

  2. “Bivariate method”: \(cFdr=\dfrac{P(P\leq p, Q\leq q|H0)}{P(P\leq p, Q\leq q)}\)


Recall that we find the L curves in a two-step approach:

  1. Find ccut=cfdr(p,q) (note that ccut values are identical when using kgrid and bivariate methods)
ccut = interp.surface(cgrid, cbind(zp[indices], q[indices])) 
  1. Construct the L curves by finding the xval2 such that cfdr(xval2,yval2)=ccut for each value yval2 takes. [I.e. we build a cfdr curve for each value of yval2 and find the x coordinate (xval2) where the cfdr curve equals ccut].
# "kgrid" method
for (i in 1:length(yval2)) {
  xdenom=interp.surface(kgrid,cbind(xtest,rep(yval2[i],length(xtest))))
  cfx=cummin(ptest/xdenom)
  xval2[,i]=approx(cfx, xtest, ccut, rule=2, method="const", f=1)$y
}

# "bivariate method" (where cgrid is the ratio of two bivariates)
for (i in 1:length(yval2)) {
  xdenom=interp.surface(cgrid,cbind(xtest,rep(yval2[i],length(xtest))))
  cfx=xdenom
  xval2[,i]=approx(cfx, xtest, ccut, rule=2, method="const", f=1)$y
}

Interpolation methods

Note that different interpolation methods are used to read off the cfdr curves; interp.surface in the first step to get the ccut values, and constant interpolation using approx in the second step to generate the L curves. I stick with the kgrid method for now, as the results are the same when using the ratio of bivariates.

I investigate the differences in using these two methods (to generate ccut values).

ccut1 = interp.surface(cgrid, cbind(zp[indices], q[indices])) 

ccut2 = rep(1, length(p[indices]))
for (i in 1:length(p[indices])){
  xdenom = interp.surface(kgrid, cbind(kpq$x, rep(q[indices[i]], length(kpq$x))))
  cfx = cummin(2*pnorm(-kpq$x)/xdenom)
  ccut2[i] = approx(kpq$x, cfx, zp[indices[i]], rule=2, method="const", f=1)$y
}

ccut3 = rep(1, length(p[indices]))
for (i in 1:length(p[indices])){
  xdenom = interp.surface(cgrid, cbind(kpq$x, rep(q[indices[i]], length(kpq$x))))
  cfx = xdenom
  ccut3[i] = approx(kpq$x, cfx, zp[indices[i]], rule=2, method="const", f=1)$y
}

Findings:

  1. The ccut values are deinflated when using the for loop method rather than the interp.surface method. Perhaps this could be contributing to the de-inflation problem. Could I use the inter.surface method in the final step to prevent the de-inflation. Although this de-inflation is fixed when increasing the resolution of the generated L curves (using xtest rather than kpq$z - which is what is actually used in the method).

  2. Seperating out the kgrid step in the for loop doesn’t change the results (ccut2 and ccut3 basically the same).

  3. By switching method="const" to method="linear" in approx I can get the same results as when using interp.surface (makes sense as they are now both dooing linear interpolation).

  4. Using left continuous (f=1 s.t. the right hand point is used) decreases ccut values, using right continuous (f=0 s.t. the lest hand point is used) increases ccut values.

  5. If I increase the resolution s.t. the cfdr curve is defined on xtest (length 5000) rather than kpq$x (length 500), then the ccut values are increased.

ccut4 = rep(1, length(p[indices]))
for (i in 1:length(p[indices])){
  xdenom = interp.surface(kgrid, cbind(xtest, rep(q[indices[i]], length(xtest))))
  cfx = cummin(ptest/xdenom)
  ccut4[i] = approx(xtest, cfx, zp[indices[i]], rule=2, method="const", f=1)$y
}

ccut5 = rep(1, length(p[indices]))
for (i in 1:length(p[indices])){
  xdenom = interp.surface(cgrid, cbind(xtest, rep(q[indices[i]], length(xtest))))
  cfx = xdenom
  ccut5[i] = approx(xtest, cfx, zp[indices[i]], rule=2, method="const", f=1)$y
}