umx!

Instrumental Variables

This is not finished: e-mail me to prioritise this page if you want it sooner.

IV Instrumental variable analyses are widely used in fields as diverse as [economics], and [genetic epidemiology]. This page shows how to implement these analyses in umx and OpenMx.

IV analyses allow the estimation of causal relationships when confounding is likely but controlled experiments are not feasible. This ubiquitous situation has lead to many non-replicable findings driven by unmeasured or excluded confounders (Ioannidis, 2012).

Motivation: linear regression can be misleading

Consider some explanatory equation e.g. Y ~ 𝛽×X + ε. As a scientific construct, this is interpreted as the causal claim that X plays a causal role in Y, revealed in these data. Some scientists prevaricate about the word cause, but if the relation is not causal, the equation is ambiguous at best, at worst terribly misleading. We seek conditions that allow us to validly interpret significant estimates of 𝛽 as support for a causal role of X If, however covariates are correlated with the error terms, ordinary least squares regression produces biased and inconsistent estimates.[2]

Such correlation with error occurs when:

  1. The DV (Y) causes one or more of the covariates (“reverse” causation).
  2. One or more explanatory variables is unmeasured.
  3. Covariates are subject to measurement error.

Of course one or more of these are almost inevitable, making regression results suspect and often misleading.

If an instrument is available, these problems can be overcome. An instrumental variable is a variable that does not suffer from the problems of the confounded predictor. It must:

  1. Not be in the explanatory equation.
  2. Controlling for (“conditional on”) any other covariates, it must correlate with the endogenous explanatory variable(s).

In linear models, there are two main requirements for using an IV:

  1. The instrument must be correlated with the endogenous explanatory variables, conditional on the other covariates.
  2. The instrument cannot be correlated with the error term in the explanatory equation (conditional on the other covariates)

In this example, we examine the causal influence of X on Y, using an instrumental variable (qtl) which affects only X, based on Professor David Evans’ presentation at the 2016 International twin workshop,

The next block of code simply sets up a simulated dataset containing X, Y, qtl (a SNP affecting X), and U - a covariate which induces a correlation between X and Y and which, if not measured, confounds the association of X and Y, allowing an errant researcher to assert evidence that X affects Y.

Our first analysis one is a simple linear model (or ordinary least squares regression). This will reveal a large association of X with Y and is the kind of analysis that has been criticised in false-positive epidemiology {GDS reference}.

library(umx)
df = umx_make_MR_data(df, nSubjects = 10000)

m1 = lm(Y ~ X    , data = df); coef(m1) # "appears" that Y is caused by X:  𝛽= .35
m1 = lm(Y ~ X + U, data = df); coef(m1) # Controlling U reveals the true link: 𝛽= 0.1

Mendelian randomization analysis

Next, we analyse a Mendelian randomization trial.

A conventional implementation involves two-stage least squares. In R, we can do this using John Fox’s tsls function from the from sem library

m1 = sem::tsls(formula = Y ~ X, instruments = ~ qtl, data = df); coef(m1)
#                 Estimate  Std. Error   t value     Pr(>|t|)
# (Intercept) 0.0009797078 0.003053891 0.3208064 7.483577e-01
# X           0.1013835358 0.021147133 4.7941976 1.635616e-06

Now we can see that X may indeed affect Y, but with the simulated effect size now correctly estimated at .1 (as specified in the simulated data) rather than the confounded .35 reported from the simple lm previously presented.

Implementing MR in OpenMx

manifests <- c("qtl", "X", "Y")
latents   <- c("e1", "e2")

IVModel <- mxModel("IV Model", type="RAM",
	manifestVars = manifests,
	latentVars = latents,
	mxPath(from = c("qtl"), arrows=2, free=TRUE, values=1, labels=c("qtl") ),  #Variance of SNP 
	mxPath(from = "e1", to = "X", arrows = 1, free = FALSE, values = 1, labels = "e1"), # Residual error X variable. Value set to 1.
	mxPath(from = "e2", to = "Y", arrows = 1, free = FALSE, values = 1, labels = "e2"), # Residual error Y variable. Value set to 1.
	mxPath(from = latents, arrows = 2, free = TRUE, values = 1, labels = c("var_e1", "var_e2") ), # Variance of residual errors
	mxPath(from = "e1", to = "e2", arrows = 2, free = TRUE, values = 0.2, labels = "phi" ), # Correlation between residual errors
	mxPath(from = "qtl",  to = "X", arrows = 1, free = TRUE, values = 1, labels = "b_zx"), # SNP effect on X variable
	mxPath(from = "X",  to = "Y", arrows = 1, free = TRUE, values = 0, labels = "b_xy"), # Causal effect of X on Y
	# means and intercepts
	mxPath(from = "one", to = c("qtl", "X", "Y"), arrows = 1, free = TRUE, values =1, labels = c("meansnp", "alpha0", "alpha1") ),
	mxData(df, type="raw")
)

IVModel <- mxRun(IVModel); umx_time(IVModel) # IV Model: 14.34 seconds for 100,000 subjects
coef(IVModel)
plot(IVModel, std = FALSE, showFixed = TRUE, showMeans = FALSE, digits = 3)
# https://www.dropbox.com/s/ljqrhpuskbihda3/IV_Model.png?dl=0

Now an exactly equivalent model in umxRAM


m2 <- umxRAM("myMR", data = df, autoRun = F,
	umxPath(v.m. = c("qtl", "X", "Y")),
	umxPath("qtl", to = "X"),
	umxPath("X", to = "Y")
	# umxPath("X", with = "Y") # Due to OpenMx rules, will be deleted!
)
m3 <- umxModify(m2, "X_with_Y", free = T, value = .2)
plot(m3, std = F, digits = 3)
# https://www.dropbox.com/s/1sg32yglfuzgwoz/myMR.png?dl=0

SB <- umxRAM("SB_IV", data = mxData(df, type="raw"),
	umxPath("ex", to = "X", fixedAt = 1, labels = "ex"),
	umxPath("ey", to = "Y", fixedAt = 1, labels = "ey"),
	umxPath("Tx", to = "X", fixedAt = 1, labels = "Tx"),
	umxPath("Ty", to = "Y", fixedAt = 1, labels = "Ty"),
	umxPath("Tz", to = "qtl", fixedAt = 1, labels = "Tz"),
	umxPath("ex", with = "ey", values = 0.2, labels = "phi"),
	umxPath("Tz",  to = "Tx", labels = "b_zx"),
	umxPath("Tx",  to = "Ty", labels = "b_xy"),
	umxPath(var = c("Tx", "Ty", "Tz", "ex", "ey"), values = 1),
	umxPath(means = c("qtl", "X", "Y"))
)
 
plot(SB, std = F, showFixed = T, digits = 3)
umxCompare(SB, m3)

Ioannidis, J. P. (2012). Why Science Is Not Necessarily Self-Correcting. Perspect Psychol Sci, 7, 645-654. doi:10.1177/1745691612464056