vignettes/getting-started.Rmd
getting-started.RmdNow, we simulate some data from a linear latent growth curve model of
educational attainment over time (that is, a random intercept and random
slope over time). The dataset will be called growth_data.
The dataset contains five observations for each individual
(X1 to X5) and two predictors representing
children’s SES (low/high) and the
preschool quality.
set.seed(23)
N <- 1000
M <- 5
icept <- rnorm(N, 10, sd = 4)
slope <- rnorm(N, 3, sd = 1.2)
ses <- factor(sample(c("low", "high"), size = N, replace = TRUE))
preschool_quality <- sample(c(1:5), N, TRUE)
loadings <- 0:4
x <-
(slope + as.numeric(ses) * 5) %*% t(loadings) +
matrix(rep(icept+preschool_quality+as.numeric(ses), each = M), byrow = TRUE, ncol = M) +
rnorm(N * M, sd = .08)
growth_data <- data.frame(x, factor(ses), ordered(preschool_quality))
names(growth_data) <- c(paste0("X", 1:M), "SES","Preschool Quality")Now, we specify a linear latent growth curve model using OpenMx’s path specification. The model has five observed variables. Residual variances are assumed to be identical over time.
manifests <- names(growth_data)[1:5]
growthCurveModel <- mxModel("Linear Growth Curve Model Path Specification",
type="RAM",
manifestVars=manifests,
latentVars=c("intercept","slope"),
mxData(growth_data, type="raw"),
# residual variances
mxPath(
from=manifests,
arrows=2,
free=TRUE,
values = c(.1, .1, .1, .1, .1),
labels=c("residual","residual","residual","residual","residual")
),
# latent variances and covariance
mxPath(
from=c("intercept","slope"),
arrows=2,
connect="unique.pairs",
free=TRUE,
values=c(2, 0, 1),
labels=c("vari", "cov", "vars")
),
# intercept loadings
mxPath(
from="intercept",
to=manifests,
arrows=1,
free=FALSE,
values=c(1, 1, 1, 1, 1)
),
# slope loadings
mxPath(
from="slope",
to=manifests,
arrows=1,
free=FALSE,
values=c(0, 1, 2, 3, 4)
),
# manifest means
mxPath(
from="one",
to=manifests,
arrows=1,
free=FALSE,
values=c(0, 0, 0, 0, 0)
),
# latent means
mxPath(
from="one",
to=c("intercept", "slope"),
arrows=1,
free=TRUE,
values=c(1, 1),
labels=c("meani", "means")
)
) # close model
# fit the model to the entire dataset
growthCurveModel <- mxRun(growthCurveModel)
#> Running Linear Growth Curve Model Path Specification with 6 parametersNow, we grow a SEM tree using the semtree function,
which takes the model and the dataset as input. If not specified
otherwise, SEM tree will assume that all variables in the dataset, which
are not observed variables in the dataset are potential predictors.
tree <- semtree(model = growthCurveModel,
data = growth_data)We can also obtain the parameter estimates for each leaf (terminal node) of the tree. Here we use knitr to pretty print the table:
| Preschool Quality | SES | residual | vari | cov | vars | meani | means |
|---|---|---|---|---|---|---|---|
| <= 3 & <= 2 | not ( high ) | 0.006 | 14.792 | 0.478 | 1.538 | 13.715 | 12.967 |
| <= 3 & > 2 | not ( high ) | 0.007 | 14.384 | 0.088 | 1.098 | 14.940 | 13.052 |
| > 3 | not ( high ) | 0.006 | 17.054 | -0.052 | 1.439 | 17.115 | 12.980 |
| <= 4 & <= 3 | high | 0.006 | 17.533 | -0.003 | 1.417 | 12.871 | 8.011 |
| <= 4 & > 3 | high | 0.006 | 18.046 | -0.743 | 1.388 | 14.718 | 7.831 |
| > 4 | high | 0.007 | 13.541 | 0.541 | 1.438 | 16.438 | 7.923 |
Large trees can be pruned back to display only the first few levels
of splits. The argument max.depth determines the maximal
depth of the resulting tree by counting the maximal number of decision
nodes until a leaf is reached. For example, max.depth=1
results in trees that only show one or zero decision nodes.

There are various (hyper)parameters that govern the tree growing
process. These hyperparameters can be adjusted by a
semtree.control object that can be passed to the
semtree() function. For example, the following control
object specifies that:
tree_control = semtree.control(alpha = 0.01,
min.N = 25,
max.depth = 5,
bonferroni = TRUE)Here is a tree that is grown using the adjusted hyper-parameters in
tree_control.
tree <- semtree(model = growthCurveModel,
data = growth_data,
control = tree_control)
#> Beginning initial fit attemptFit attempt 0, fit=7745.53852713508, new current best! (was 7745.53852713964) Beginning initial fit attemptFit attempt 0, fit=2836.90806227974, new current best! (was 3687.59332338577) Beginning initial fit attemptFit attempt 0, fit=1713.80837616561, new current best! (was 1739.87876535343) Beginning initial fit attemptFit attempt 0, fit=1061.98006591109, new current best! (was 1097.02929672133) Beginning initial fit attemptFit attempt 0, fit=3206.09542712537, new current best! (was 4057.94520393808) Beginning initial fit attemptFit attempt 0, fit=2454.87187575858, new current best! (was 2466.44438222973) Beginning initial fit attemptFit attempt 0, fit=691.794906946785, new current best! (was 739.651045009226) ✔ Tree construction finished [took 3s].
plot(tree)