Simulation

Often, having simulated data can help explore an idea, or test an idea.

This post covers:

Using mxGenerateData to create data by specifying the path model that fits it.
Making twin data using umx_make_TwinData
Making data for Mendelian Randomization using umx_make_MR_data

1. Making data by specifying the model it comes from

You can use the mxGenerateData function to build a model, setting parameters to their desired values, and then generate data from that model! (no data required!)

Make a target model:

vars = c("a", "b", "c")
m1 = umxRAM("simData", data = vars,
	umxPath(v1m0 = vars, values= 0),
	umxPath("a", with= "b", values= .4),
	umxPath("a", with= "c", values= .3),
	umxPath("b", with= "c", values= .4)
)

Or in lavaan syntax:

m1 = umxRAM("#simData
a ~~ 1*a
b ~~ 1*b
c ~~ 1*c
a ~~ .4*b
a ~~ .3*c
b ~~ .4*c
a ~ 0*1
")

If you just want a data.frame back:

simData = mxGenerateData(m1, nrows = s1000)
umxAPA(m2) # have a look at it.

	a	b	c
a	1 (0)
b	0.27 (0.03)	1 (0)
c	0.26 (0.03)	0.41 (0.03)	1 (0)
Mean (SD)	0.02 (1)	-0.03 (1)	-0.03 (0.99)

You can implant that dataframe into a new model

m1$expectation$data =mxData(simData)

Or just ask mxGenerateData to do that for you:

m2 = mxGenerateData(m1, nrows = 1000, returnModel = TRUE)

The help on ?mxGenerateData is nice also.

2. Making Twin Data

umx offers umx_make_TwinData

This is a great way to create data for twin models, where you want

An MZ dataset and a DZ dataset
You know the Mzr and DZr or the A, C, and E values you want to simulate.

You can also add a moderator (dragging A across a range according to a moderator)

It’s this easy:

tmp = umx_make_TwinData(nMZpairs = 10000, AA = .30, CC = .00, EE = .70)
 AA  CC  EE 
0.3 0.0 0.7 
   a    c    e 
0.55 0.00 0.84 

The results come back as dataframe with a column for zygosity (“MZ” and “DZ” at present).

How to consume the built datasets

mzData = tmp[tmp$zygosity == "MZ", ]
dzData = tmp[tmp$zygosity == "DZ", ]
str(mzData); str(dzData); 
cov(mzData[,c("var_T1","var_T2")])
cov(dzData[,c("var_T1","var_T2")])
umxAPA(mzData[,c("var_T1","var_T2")])
    

	var_T1	var_T2
var_T1	1
var_T2	0.31	1
Mean (SD)	0.01 (0.99)	0 (1)

Prefer to work in path coefficient values? (little a?)

tmp = umx_make_TwinData(200, AA = .6^2, CC = .2^2)

If omitted, nDZpairs defaults to MZ numbers. But you can control both.

Variance doesn’t need to sum to 1:

tmp = umx_make_TwinData(100, AA = 3, CC = 2, EE = 3, sum2one = FALSE) 
mzData = tmp[tmp$zygosity == "MZ", ]
cov(mzData[,c("var_T1","var_T2")])

Moderator Example

For our meta-analysis of gene-SES interaction with IQ heritability, we found an a` value of .06.

This makes a data set that corresponds to this (with some assumed values for mean A, and for the values of C and E).

x = umx_make_TwinData(100, AA = c(avg = .4, min = .2, max = .65), CC = .2, EE = .4)
str(x)

You can also make Thresholded data, just use MZr and DZr. Finally, the function can create data for Bivariate GxSES (see ?umxGxEbiv)

3. Making Mendelian Randomization Data

umx_make_MR_data allows you to simulate data based on Mendelian Randomization.

You get back a 4-variable data set:

The outcome variable of interest (Y)
The putative causal variable (X)
A qtl (quantitative trait locus) influencing X
A confounding variable (U) affecting both X and Y.

Here’s a simple example:

df = umx_make_MR_data(nSubjects = 1000, Vqtl = 0.02, bXY = 0.1, bUX = 0.5, bUY = 0.5, pQTL = 0.5, seed = 123)

Which looks like this:

umxAPA(df)

	X	Y	U	qtl
X	1 (0)
Y	0.39 (0.03)	1 (0)
U	0.54 (0.02)	0.54 (0.02)	1 (0)
qtl	0.18 (0.03)	0.04 (0.03)	0.02 (0.03)	1 (0)
Mean (SD)	0 (1.02)	0.01 (1)	0.01 (1)	1 (0.7)

umx_print(head(df))

X	Y	U	qtl
-1.0023978	-0.9650462	-0.6018928	1
-0.9593700	-0.1157263	-0.9936986	0
-0.2573603	-0.0975572	1.0267851	1
0.7112984	0.0031000	0.7510613	0
0.0026485	-0.1110663	-1.5091665	0
1.7699183	-0.2656627	-0.0951475	1