We expect that size and covered are highly correlated for unordered sets, and that this correlation is weaker for ordered sets. We hope to incorporate entropy in the model for unordered sets to account for some of the extra noise.
Unordered: 0.40
Size and coverage slightly more correlated in unordered
We expect that OR and entropy are highly correlated. These variables are the same in the ordered and non-ordered datasets as they reflect information on the system, and the same systems were used to form ordered and non-ordered credible sets.
Ordered/ Unordered: 0.545
OR and entropy are highly correlated
We expect that entropy and covered are more correlated in ordered than non-ordered sets. We hope to include entropy as a predictor for coverage in ordered sets.
Unordered: 0.202
Since the correlation is low, perhaps entropy will not be a significant predictor of coverage in the following logistic regression section.
We see there is much higher correlation between OR and covered in ordered sets. We hope that by incorporating entropy as a predictor for coverage in ordered sets, we do not need to incorporate information on the OR as this is not known to experimentors.
Unordered: 0.217
OR and covered show high correlation in ordered sets
We see that nvar and entropy have stronger negative correlation in ordered sets.
Unordered: 0.023
nvar and entropy show higher negative correlation in ordered sets
Similarly, nvar and OR have stronger (negative) correlation in ordered sets.
Unordered: -0.064
nvar and OR show higher negative correlation in ordered sets
We see that nsnps and nvar are much more correlated in unordered sets. This intuitively makes sense.
Unordered: 0.875
nsnps and nvar highly correlated in unordered sets
We see that thr and nvar are more correlated in ordered sets - as the threshold increases, as does the nvar. I would expect this correlation to be higher in unordered sets, as if there is a snp with very high posterior probability then this will be included in the set quicker in ordered than non-ordered methods, making the set size smaller? Whereas for non-ordered sets, more snps have to be added to the set before ‘finding’ this high pp snp.
Unordered: 0.189
thr and size more correlated in ordered sets
The next section will analyse the following claims:
Claim 1: \[log(\frac{p}{1-p})\sim log(\frac{size}{1-size})\] works well for non-ordered sets, works less well for ordered sets.
Claim 2: Can we improve the accuracy of the above model in ordered sets by incorporating entropy as a predictor.
Claim 3: Hoping that adding OR to the \(log(\frac{p}{1-p})\sim log(\frac{size}{1-size})+entropy\) model does not improve it too much. Hoping that entropy has absorbed in our knowledge of OR.
Claim 4: Entropy has a non-linear effect on coverage. Use the rcs
function to analyse its non-linear effect.