I have updated the flexible_cfdr()
function to:
if( median(v) < 0.8*median(p) | median(v) > 1.2*median(p) ) warning('v-values very different to input p-values - check results')
locfdr
and add a warning suggesting users to adjust this accordingly if locfdr
prints a warninglfdr <- tryCatch(
{
locfdr(c(zp_ind, -zp_ind), bre = c(kpq$x[-length(kpq$x)] + diff(kpq$x)/2, kpq$x[length(kpq$x)] + diff(kpq$x)[length(diff(kpq$x))]/2, -c(kpq$x[-length(kpq$x)] + diff(kpq$x)/2, kpq$x[length(kpq$x)] + diff(kpq$x)[length(diff(kpq$x))]/2)), mlests = c(mlests[1], b*mlests[2]), plot = 0, df = locfdr_df)
},
warning=function(cond) {
message("Warning from locfdr:")
message(cond)
message("Try running flexible_cfdr again and adjusting locfdr_df parameter")
}
)
As explained on slack, I can fix the disaster
example by censoring q values.
It may be worth checking how the log and qnorm simulation results look when using this censoring, and whether it saves the FDR etc.
init_censor <- function(q, val = 10^-10){
q[which(q<val)] <- val
return(q)
}
In terms of arbitrary data having long tails, I don’t know of a statistic that quantifies the heavy-tailed-ness of a distribution (kurtosis doesn’t seem right). It may be that we do a quick check of their data based on something like percentiles and print a warning if it looks long tailed, suggesting censoring.
set.seed(5)
N <- 100000
q <- -rnbinom(N, 10, 0.7)
hist(q, main = "Histogram of example data\ngray line is 1st percentile\nblue line is half the data range")
abline(v = quantile(q, 0.01), col = "gray", lty = "dashed")
abline(v = range(q)[1]+(range(q)[2]-range(q)[1])*0.5, col = "blue", lty = "dashed")
quantile(q, 0.01) > range(q)[1]+(range(q)[2]-range(q)[1])*0.5
## 1%
## FALSE
load("disaster.RData")
hist(q)
abline(v = quantile(q, 0.01), col = "gray", lty = "dashed")
abline(v = range(q)[1]+(range(q)[2]-range(q)[1])*0.5, col = "blue", lty = "dashed")
quantile(q, 0.01) > range(q)[1]+(range(q)[2]-range(q)[1])*0.5
## 1%
## TRUE
if( quantile(q, 0.01) > range(q)[1]+(range(q)[2]-range(q)[1])*0.5 ) warning("Your data may have very long tails, try left censoring")
## Warning: Your data may have very long tails, try left censoring
Could also try something based on data quartiles:
set.seed(5)
N <- 100000
q <- -rnbinom(N, 10, 0.7)
quans <- quantile(q)
if((quans[3]-quans[2]) > 3 *(quans[4]-quans[3])) warning("Your data may have very long tails, try left censoring")
load("disaster.RData")
quans <- quantile(q)
if((quans[3]-quans[2]) > 3 *(quans[4]-quans[3])) warning("Your data may have very long tails, try left censoring")
## Warning: Your data may have very long tails, try left censoring
I’ve implemented this censoring on Chris’ simulations (pqonly) and it improves FDR control (gets to 0.06 on iteration 5 rather than 0.13) but FDR still isn’t controlled after ~ 4 iterations:
(Side Note: in the pqonly
simulations, although p and q are more correlated (\(~0.08\)) the q vectors are also correlated (\(~0.04\)))