I have updated the flexible_cfdr() function to:

if( median(v) < 0.8*median(p) | median(v) > 1.2*median(p) ) warning('v-values very different to input p-values - check results')
lfdr <- tryCatch(
    {
      locfdr(c(zp_ind, -zp_ind), bre = c(kpq$x[-length(kpq$x)] + diff(kpq$x)/2, kpq$x[length(kpq$x)] + diff(kpq$x)[length(diff(kpq$x))]/2, -c(kpq$x[-length(kpq$x)] + diff(kpq$x)/2, kpq$x[length(kpq$x)] + diff(kpq$x)[length(diff(kpq$x))]/2)), mlests = c(mlests[1], b*mlests[2]), plot = 0, df = locfdr_df)
    },
    warning=function(cond) {
      message("Warning from locfdr:")
      message(cond)
      message("Try running flexible_cfdr again and adjusting locfdr_df parameter")
    }
  )

Long tails

init_censor <- function(q, val = 10^-10){
  q[which(q<val)] <- val
  return(q)
}

In terms of arbitrary data having long tails, I don’t know of a statistic that quantifies the heavy-tailed-ness of a distribution (kurtosis doesn’t seem right). It may be that we do a quick check of their data based on something like percentiles and print a warning if it looks long tailed, suggesting censoring.

set.seed(5)
N <- 100000
q <- -rnbinom(N, 10, 0.7)
hist(q, main = "Histogram of example data\ngray line is 1st percentile\nblue line is half the data range")
abline(v = quantile(q, 0.01), col = "gray", lty = "dashed")
abline(v = range(q)[1]+(range(q)[2]-range(q)[1])*0.5, col = "blue", lty = "dashed")

quantile(q, 0.01) > range(q)[1]+(range(q)[2]-range(q)[1])*0.5
##    1% 
## FALSE
load("disaster.RData")
hist(q)
abline(v = quantile(q, 0.01), col = "gray", lty = "dashed")
abline(v = range(q)[1]+(range(q)[2]-range(q)[1])*0.5, col = "blue", lty = "dashed")

quantile(q, 0.01) > range(q)[1]+(range(q)[2]-range(q)[1])*0.5
##   1% 
## TRUE
if( quantile(q, 0.01) > range(q)[1]+(range(q)[2]-range(q)[1])*0.5 ) warning("Your data may have very long tails, try left censoring")
## Warning: Your data may have very long tails, try left censoring

Could also try something based on data quartiles:

set.seed(5)
N <- 100000
q <- -rnbinom(N, 10, 0.7)
quans <- quantile(q)
if((quans[3]-quans[2]) > 3 *(quans[4]-quans[3])) warning("Your data may have very long tails, try left censoring")

load("disaster.RData")
quans <- quantile(q)
if((quans[3]-quans[2]) > 3 *(quans[4]-quans[3])) warning("Your data may have very long tails, try left censoring")
## Warning: Your data may have very long tails, try left censoring

I’ve implemented this censoring on Chris’ simulations (pqonly) and it improves FDR control (gets to 0.06 on iteration 5 rather than 0.13) but FDR still isn’t controlled after ~ 4 iterations:

(Side Note: in the pqonly simulations, although p and q are more correlated (\(~0.08\)) the q vectors are also correlated (\(~0.04\)))