Selecting q (iteration 1)


Using only the subset of independent SNPs (~500,000 out of 2 million = 25%), I regress the \(\chi^2\) statistics against the co-ordinates for each dimension of the PCAmix results.

## 
## Call:
## lm(formula = chisq ~ ., data = regression_df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
##  -1.583  -0.920  -0.577   0.297 104.827 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.069e+00  2.764e-03 386.568  < 2e-16 ***
## dim.1        1.267e-02  7.947e-04  15.948  < 2e-16 ***
## dim.2        1.038e-04  1.108e-03   0.094 0.925402    
## dim.3        2.820e-03  1.243e-03   2.268 0.023331 *  
## dim.4        3.368e-05  1.318e-03   0.026 0.979615    
## dim.5        1.406e-02  1.297e-03  10.840  < 2e-16 ***
## dim.6       -1.439e-02  1.496e-03  -9.617  < 2e-16 ***
## dim.7       -7.999e-04  1.529e-03  -0.523 0.600924    
## dim.8        1.587e-02  1.532e-03  10.360  < 2e-16 ***
## dim.9        1.287e-02  1.810e-03   7.111 1.15e-12 ***
## dim.10      -9.382e-03  1.664e-03  -5.638 1.73e-08 ***
## dim.11       2.722e-02  1.798e-03  15.140  < 2e-16 ***
## dim.12      -8.858e-03  1.774e-03  -4.994 5.91e-07 ***
## dim.13      -5.232e-03  1.802e-03  -2.903 0.003694 ** 
## dim.14       2.361e-02  1.825e-03  12.939  < 2e-16 ***
## dim.15       1.427e-02  1.915e-03   7.448 9.50e-14 ***
## dim.16       1.813e-03  2.030e-03   0.893 0.371637    
## dim.17      -1.593e-03  2.067e-03  -0.771 0.440920    
## dim.18       1.970e-04  2.033e-03   0.097 0.922773    
## dim.19      -1.184e-03  2.145e-03  -0.552 0.581091    
## dim.20       2.323e-03  2.527e-03   0.919 0.358016    
## dim.21      -5.577e-03  2.452e-03  -2.274 0.022968 *  
## dim.22      -3.404e-03  2.343e-03  -1.453 0.146240    
## dim.23       1.404e-04  2.394e-03   0.059 0.953226    
## dim.24       3.613e-03  2.309e-03   1.565 0.117613    
## dim.25       1.106e-03  2.238e-03   0.494 0.621382    
## dim.26       2.810e-03  2.172e-03   1.294 0.195706    
## dim.27      -1.134e-03  2.091e-03  -0.543 0.587459    
## dim.28       1.128e-02  2.030e-03   5.555 2.78e-08 ***
## dim.29       9.297e-04  2.009e-03   0.463 0.643466    
## dim.30      -5.843e-03  2.162e-03  -2.702 0.006884 ** 
## dim.31      -9.956e-03  2.138e-03  -4.656 3.23e-06 ***
## dim.32      -8.050e-03  2.148e-03  -3.748 0.000179 ***
## dim.33       8.989e-03  2.153e-03   4.175 2.98e-05 ***
## dim.34      -1.527e-03  2.147e-03  -0.711 0.476890    
## dim.35      -6.074e-03  2.162e-03  -2.810 0.004961 ** 
## dim.36       1.241e-03  2.181e-03   0.569 0.569274    
## dim.37      -8.856e-04  2.155e-03  -0.411 0.681075    
## dim.38      -4.526e-03  2.151e-03  -2.104 0.035400 *  
## dim.39       1.579e-03  2.132e-03   0.741 0.458898    
## dim.40       6.740e-03  2.122e-03   3.176 0.001493 ** 
## dim.41      -2.626e-03  2.136e-03  -1.229 0.218939    
## dim.42      -7.717e-04  2.217e-03  -0.348 0.727783    
## dim.43       5.616e-03  2.277e-03   2.466 0.013645 *  
## dim.44      -2.551e-03  2.302e-03  -1.108 0.267773    
## dim.45      -3.993e-03  2.288e-03  -1.745 0.080954 .  
## dim.46       1.702e-03  2.317e-03   0.734 0.462759    
## dim.47      -2.686e-03  2.336e-03  -1.150 0.250239    
## dim.48       1.314e-03  2.329e-03   0.564 0.572754    
## dim.49       2.781e-03  2.338e-03   1.190 0.234192    
## dim.50       1.377e-02  2.265e-03   6.078 1.21e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.678 on 512477 degrees of freedom
##   (5930 observations deleted due to missingness)
## Multiple R-squared:  0.00253,    Adjusted R-squared:  0.002433 
## F-statistic:    26 on 50 and 512477 DF,  p-value: < 2.2e-16

I then rank dimensions by their t-statistic to decide which to iterate over.

##  (Intercept)        dim.1       dim.11       dim.14        dim.5        dim.8 
## 386.56780315  15.94846255  15.14036283  12.93865664  10.84006800  10.36001754 
##        dim.6       dim.15        dim.9       dim.50       dim.10       dim.28 
##   9.61702473   7.44791817   7.11147365   6.07841542   5.63753951   5.55499607 
##       dim.12       dim.31       dim.33       dim.32       dim.40       dim.13 
##   4.99426904   4.65555664   4.17521508   3.74766142   3.17598369   2.90322393 
##       dim.35       dim.30       dim.43       dim.21        dim.3       dim.38 
##   2.80958011   2.70243932   2.46648739   2.27396723   2.26798054   2.10375315 
##       dim.45       dim.24       dim.22       dim.26       dim.41       dim.49 
##   1.74518227   1.56487544   1.45294324   1.29388343   1.22935554   1.18963075 
##       dim.47       dim.44       dim.20       dim.16       dim.17       dim.39 
##   1.14977056   1.10820573   0.91915268   0.89341195   0.77064192   0.74066254 
##       dim.46       dim.34       dim.36       dim.48       dim.19       dim.27 
##   0.73431187   0.71131455   0.56912093   0.56400112   0.55179195   0.54252279 
##        dim.7       dim.25       dim.29       dim.37       dim.42       dim.18 
##   0.52307229   0.49389356   0.46285884   0.41099676   0.34807639   0.09694169 
##        dim.2       dim.23        dim.4 
##   0.09363085   0.05865578   0.02555196

E.g. Here, dimensions 1 is the most significant. I check that this is monotonic in \(p\). Yes - it seems to be.


Results


I check the results using various parameter options. I also only use the 500,000 independent SNPs for the KDE.


  1. res_p=500; res_q=500; nxbin=500 (time ~ 25mins)