Selecting parameters of Dirichlet distribution

Note: Code for all of this is on HPC in reweight_prior/new_alpha

Introduction

In the Dirichlet distribution, the concentration parameters, \(\boldsymbol{\alpha}=(\alpha_1, ..., \alpha_n)\), control the shape of the distibrution. In our method we assume that these parameters are all equal so that \(\alpha_i=\alpha \text{ for } i=1,...,n\). The sum of these, \(\alpha_0=\sum\alpha_i=n\times \alpha\), controls the strength of the distirbution (i.e. how peaked it is). If \(\alpha<1\) there are spikes at the corners of the simplex and if \(\alpha>1\), the distribution tends to the middle of the simplex (gets flatter). As \(\alpha_0\) increases, the distribution becomes more tightly concentrated around the centre of the simplex.

We’ve identified that the value for \(\alpha\) needs to vary across studies so that the shape of the reweighted prior accurately reflects our prior belief of variant causality. That is, in lower powered scenarios the resultant reweighted prior should be flatter and in higher powered scenarios it should be steeper. This reflects the belief that in lower powered scenarios we do not believe that the ordering of SNPs tells us much about variant causality whilst in higher powered scenarios we expect that the ordering is indicative of variant causality.

Herein, I consider two approaches to estimate the parameter of the Dirichlet distribution, \(\alpha\), for a simulated genomic region with no correlations between SNPs:

\(\alpha = \frac{1}{sum(BF)}\)
Use the entropy of the system to estimate \(\alpha\).

The idea is that if we can find a method that works in the instance of no SNP correlations, then we can extend it to situations where there are SNP correlations.

1. Set \(\alpha = \frac{1}{sum(BF)}\)

I use alpha=1/exp(logsum(labf)).

Low power

In low power, we’ve already seen that using \(\alpha=1\) upweights the top variant too much and we become “too sure” that this variant is the causal variant. We hope that using this new method to select \(\alpha\) will increase the value of alpha, therefore making the prior flatter.

Low power –> exp(denom) low –> \(\alpha\) big –> flatter prior.

Let’s consider an example.

# low power simulation
NN = 1000
OR = 1.05

tmp <- simref(N0 = NN, N1 = NN, OR = OR)
maf <- tmp$maf

varbeta <- Var.data.cc(maf, N=2*NN, 0.5) # variance of beta

z_low <- tmp$z
pp_low <- ppfunc(z_low, varbeta)

plot(pp_low, ylim = c(0,1), ylab = "PP", main = "PP plot for low power simulation")

mu <- est_mu(z = z_low, f = maf, N0 = NN, N1 = NN)
mu

## [1] 1.581177

o <- order(pp_low, decreasing = TRUE)

W = 0.2
V = varbeta
r = W^2/(W^2 + V)
lbf = 0.5 * (log(1 - r) + (r * z_low^2))

abf <- exp(lbf)
# alpha_1 = 1/sum(abf)
# alpha_2 = 1/sum(lbf)
# alpha_3 = 1/logsum(lbf)

# data.frame(alpha_1, alpha_2, alpha_3)

denom = logsum(lbf)
alpha = 1/exp(denom)
  
nsnps = length(maf)
  
prior <- rowMeans(sapply(1:50000, function(i) draw_order(nsnps, alpha)))
  
o_w = order(abf, decreasing=TRUE)
ordered_pp = abf[o_w]*prior/sum(abf[o_w]*prior)
  
# for comparison do alpha = 1
  
alpha_1 = 1
  
prior_1 <- rowMeans(sapply(1:50000, function(i) draw_order(nsnps, alpha_1)))
  
ordered_pp_1 = abf[o_w]*prior_1/sum(abf[o_w]*prior_1)

Above, the purple points show the PPs calculated using the standard method, the red points show the flat prior, the green points show the prior where \(\alpha=1\) and the blue points where \(\alpha=\frac{1}{sum(BF)}\).

We had hoped that using this method would select a value for \(\alpha\) that is \(>>1\), instead we find it selects a tiny value for \(\alpha\) therefore making the prior way too steep.

As we saw before, using this method upweights the probability of causality for the first SNP too much.

High power

High power –> sum(lBF) high –> \(\alpha\) small –> steeper prior.

Note that when using this approach, \(\alpha\) can be very small and the rdirichlet function returns NAs, so I’ve had to include the following if statement.

if (alpha<0.001) {
  alpha = 0.001
}

# high power simulation
NN = 50000
OR = 1.1

tmp <- simref(N0 = NN, N1 = NN, OR = OR)
maf <- tmp$maf

varbeta <- Var.data.cc(maf, N=2*NN, 0.5) # variance of beta

z_high <- tmp$z
pp_high <- ppfunc(z_high, varbeta)

o <- order(pp_high, decreasing = TRUE)
plot(pp_high, ylim = c(0,1), ylab = "PP", main = "PP plot for high power simulation")

mu <- est_mu(z = z_high, f = maf, N0 = NN, N1 = NN)
mu

## [1] 10.90276

W = 0.2
V = varbeta
r = W^2/(W^2 + V)
lbf = 0.5 * (log(1 - r) + (r * z_high^2))

denom = logsum(lbf)

alpha = 1/exp(denom)

if (alpha<0.001) {
  alpha = 0.001
}
  
nsnps = length(maf)
  
prior <- rowMeans(sapply(1:50000, function(i) draw_order(nsnps, alpha)))
  
abf <- exp(lbf)
  
o_w = order(abf, decreasing=TRUE)
ordered_pp = abf[o_w]*prior/sum(abf[o_w]*prior)
  
######## for comparison do alpha = 1
  
alpha_1 = 1
  
prior_1 <- rowMeans(sapply(1:50000, function(i) draw_order(nsnps, alpha_1)))
  
ordered_pp_1 = abf[o_w]*prior_1/sum(abf[o_w]*prior_1)

Summary of setting \(\alpha = \frac{1}{sum(BF)}\)

This method is not working - I look at the values for \(\alpha\) as a function of minimum \(P\) value (proxy for power).

\(\alpha\) is never greater than 1, which is what we need in low power to make the prior flatter.

logsum <- function(x) {
    my.max <- max(x)
    my.res <- my.max + log(sum(exp(x - my.max)))
    return(my.res)
}

# my code to derive alpha
r = W^2/(W^2 + V)
lbf = 0.5 * (log(1 - r) + (r * z^2))
denom = logsum(lbf)
alpha = 1/exp(denom)

2. Use entropy to estimate \(\alpha\)

Idea: The entropy of the PP system can be calculated with entropy <- function(p) -mean(log(p)) and the closed form of the entropy of the dirichlet distribution can be set equal to this and solved in order to calculate \(\alpha\).

I’ve coded up the entropy of the Dirichlet distribution (https://en.wikipedia.org/wiki/Dirichlet_distribution#Entropy):

\[H(X)=log(\beta({\mathbf{\alpha}}))+(\alpha_0-K)\psi(\alpha_0)-\sum_{j=1}^K(\alpha_j-1)\psi(\alpha_j).\]

[Note that I’ve checked my code is correct by comparing the results to this StackExchange article: https://stats.stackexchange.com/questions/213954/interpreting-the-entropy-of-a-dirichlet-distribution]

mbeta <- function(...) { 
  exp(sum(lgamma(c(...)))-lgamma(sum(c(...))))
}

dir_ent <- function(n, alpha){
  log(mbeta(rep(alpha, n))) + n*(alpha-1)*digamma(n*alpha) - n*(alpha-1)*digamma(alpha)
}

dir_ent(n = 3, alpha = 1)

## [1] -0.6931472

Let’s look at the entropy as a function of \(\alpha\) (note the differences in the \(y\) axis scales!).

For \(nsnps=100\), when \(\alpha>1.76\) the entropy is \(-Inf\).

The maximum entropy value occurs at \(\alpha=1\), let’s investigate how the maximum entropy value changes for values of \(n\) (=nsnps).

For \(nsnps>178\) and \(alpha=1\) the entropy is \(-Inf\).

Explanation of entropy results

Recall that entropy is a measure of uncertainty. The higher the entropy, the more uncertainty in the system.

When \(\alpha\) is below 1, we expect sparser samples taken from the edges of the simplex (steeper prior –> one variant with lots of the probability, others closer to 0). So we can predict with larger certainty what the sample will look like and therefore the entropy becomes lower.

## alpha = 1e-2
draws <- rdirichlet(200, c(0.01,0.01,0.01) )
bivt.contour(draws)

When \(\alpha=1\), the probability distribution is uniform over the simplex so entropy is highest. (This is a basic property of entropy - that uniform distributions have maximum uncertainty).

## alpha = 1
draws <- rdirichlet(200, c(1,1,1) )
bivt.contour(draws)

As the value of \(\alpha\) increases above 1, the samples are more likely to be equal to each other (“flat”) and we can therefore predict with larger certainty what the sample will look like (more likely to be flat), thus the entropy decreases.

## alpha = 100
draws <- rdirichlet(200, c(100,100,100) )
bivt.contour(draws)

Problem with entropy idea

The idea was to calculate the entropy of the PP system and to set this equal to the dirichlet entropy formula before solving for \(\alpha\). However, entropy values calculated using the standard formula shown below are positive (the logarithm of a probability is negative) and we therefore cannot set these equal to the entropy values found above, which are all negative.

entropy <- function(p){
  -mean(log(p))
}

ent_low <- entropy(pp_low)
ent_high <- entropy(pp_high)

par(mfrow=c(1,2))
plot(pp_low, ylim = c(0,1), ylab = "PP", main = paste0("Low power \n entropy = ",ent_low))
plot(pp_high, ylim = c(0,1), ylab = "PP", main = paste0("High power \n entropy = ",ent_high))

We cannot find the corresponding \(\alpha\) values for these probability systems using the entropy of the Dirichlet distribution because the maximum entropy of the Dirichlet distribution for 100 SNPs is -359.

The following plot shows the distribution of entropy values for simulations with various OR values and N0=N1=1000.

Alternatively, we could look at the KL divergence which gives negative values. But these are not “as negative” as those given by the Dirichlet distribution and so the method will still not work.