T1D GWAS project

Introduction

Topic:

Identifying genetic variants associated with elevated ACR in patients with T1D using longitudinal ACR measurements.

What is novel about this research?

This research uses data on longitudinal changes in ACR, rather than static readings as in previous related research (e.g. Sandolm et al. 2014).
Conclusions from GWAS using static readings: “This genetic variant associates with higher ACR for this ACR measurement snapshot”. Conclusions from GWAS using longitudinal readings: “This genetic variant associates with higher ACR based on longitudinal data which captures within-subject variation over time”.
I.e. using longitudinal ACR measurements allows us to capture within-subject variation over time (e.g. a “mean adjusted ACR trait” as called in Marcovecchio et al. 2009).

Biological significance:

Progression to microalbuminuria and diabetic nephropathy can be predicted by changes in ACR within the normal range (Dunger et al. 2007). ACR is also an independent risk factor for CVD (Dunger et al. 2007).
Thus, if we find genetic variants predisposing T1D patients to elevated ACR, then the patients with this variation could be targeted for interventions including ACE inhibitors or the use of statins.

Possible hindrances:

May not be able to capture more dynamic changes of ACR over time when deriving a single “score” for the longitudinal data, e.g. see fig below from Marcovecchio et al. 2019:

Our sample size may be too small to find anything. Nothing came out as significant when using (cross-sectional) ACR readings in 12,540 individuals in Sandholm et al. (2017) (our sample size is approximately 2000).
How relevant is our phenotype biologically? How accurate is ACR as a biomarker for DKD? “Albuminuria was reported to have a poor positive predictive value for DKD as only about a third of those with microalbuminuria had progressive renal function decline” (Krolewski et al. 2015). Also see Marcovecchio et al. 2018 where they list clinical risk factors to predict GFR decline as age, diabetes duration, HbA1c, systolic BP, albuminuria, prior eGFR and retinopathy status.
The data for many samples will be limited (e.g. NFS only has ACR measurements at 1-2 assessments).

Data

Genotype data

I have imputed and QC’ed genotype data for 1193 NFS samples, 310 ORPS samples and 544 AdDIT samples (\(N=2047\)). The write up for the genotype data processing and QC is available here: https://github.com/annahutch/T1D-GWAS/blob/main/T1D-GWAS_annahutchinson.pdf

ACR

I have 1-3 consecutive ACR measurements for each individual at each annual assessment.
The number of annual assessments varies for each individual (e.g. ORPS have up to 10 assessments, AdDIT have up to 4 assessments and NFS have only 1-2 assessments).

Auxiliary data

Auxiliary data that I have available includes age, sex, age at diagnosis, duration of diabetes, age at assessment, HbA1c, BMI, blood pressure, smoking status etc.
I also have HbA1c readings, but Loredana said that these may not have been measured at the same time as the ACR and in practice they just use the HbA1c reading taken in the same year as the ACR reading.
An additional covariate to consider for the AdDIT cohort is the treatment group (AHT medication or not) since it was an intervention trail.

Our analysis

Each individual \(i\) (\(i=1,...,2047\)) is a member of a cohort \(k\) (\(k=1,2,3\) for NFS, ORPS and AdDIT respectively) and has three consecutive, first voided, early morning urine samples collected for centralised measurement of the albumin:creatinine ratio (ACR) at \(l_i\) annual assessments.
Since the number of annual assessments varies for each individual, we essentially have unbalanced panel data for individuals nested within 3 separate cohorts.
The highest level is the cohort that each sample resides, then the next level down is the longitudinal aspect, i.e. which year the measurement was taken.
I’ve had a quick look at the lmer function:

# https://www.youtube.com/watch?v=QCqF-2E86r0

# Assume ACR is a vector of the (mean of the 3 consecutive) ACR readings at each assessment for all our individuals
# for now, just say that we have one predictor, x

# random intercept for each cohort (fixed slopes)
lmer(log10(ACR) ~ x + (1 | cohort)) 

# random intercept and slopes for each cohort
lmer(log10(ACR) ~ x + (1 + x | cohort)) 

# random intercept and slope grouped by each individual in each cohort (i.e. capturing longitudinal aspect)
lmer(log10(ACR) ~ x + (x | cohort) + (x | cohort:indiv))

I could then do some variable selection process to decide on which predictors to include in the model.
But do we have enough data to predict all these parameters? Most samples (e.g. those in NFS) will only have 1-2 measurements.
Also, is this really doing what we want to be doing? Our purpose isn’t for prediction. If we follow the method by Dunger et al. 2006 then we would use our model to extract residuals which show how “extreme” the ACR measurements are for each individual compared to the fitted model. This doesn’t appear relevant when each individual would have a different line?

Remember that our aim is to develop some “score” for each individual that captures longitudinal ACR measurements and is adjusted for confounders.

Queries and comments

If needed, we have audit data for ACR during adulthood for (some of) the samples.
Loredana’s long-term interest is to develop some predictive model for DN. Our work may help to identify any genetic risk factors to include in the model.
Would we still need to include covariates such as sex etc. in the GWAS, if we’ve already included them in the phenotype definition? Would this be double-y adjusting for confounders?

T1D GWAS project

Anna Hutchinson

Introduction

Data

Related methods

Our analysis

Queries and comments