This vignette describes how
Show 7 can be used to model interactions between latent and observed variables. The models described here can be considered extensions of the covariate measurement error model described in the , by allowing the latent variables to interaction with observed variables. Linear Mixed Model with Latent CovariatesFor this example we use the simulated
8 dataset, of which the first six rows are displayed below:
Model FormulationThe response variable
9 contains both measurements of a latent variable and measurements of the response that we actually are interested in modeling, and the
0 variable distinguishes these responses. In this case we have complete observations for each subject ID, and for a given ID, the measurement model can be written as follows: \[ \begin{pmatrix} y_{1} \\ y_{2} \\ y_{3} \end{pmatrix} = \boldsymbol{\beta}_{0} + \begin{pmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & x \end{pmatrix} \begin{pmatrix} 1 \\ \lambda_{2} \\ \lambda_{3} \\ \lambda_{4} \end{pmatrix} \eta + \begin{pmatrix} 0 \\ 0 \\ x \beta \end{pmatrix} + \boldsymbol{\epsilon}. \] In this equation \(\boldsymbol{\beta}_{0} \in \mathbb{R}^{3}\) is a vector of intercepts, \(\eta\) is a latent variable, the loading of the latent variable onto the first measurement \(y_{1}\) is fixed to 1 for identifiability,\(\lambda_{2}\) is the loading of the latent variable onto the second measurement \(y_{2}\), \(\lambda_{3}\) is the main effect of the latent variable on the response of interest \(y_{3}\), \(\beta\) is the effect of the observed covariate \(x\) on \(y_{3}\), and \(\lambda_{4}\) is the interaction effect of\(x\) and \(\eta\) on \(y_{3}\). We assume that the residuals \(\boldsymbol{\epsilon}\) are independently and identically normally distributed; this assumption is valid in this simulated case, but note that since the response \(y_{3}\) is qualitatively different from the measurements \(y_{1}\) and \(y_{2}\), this assumption will in general not hold, and a heteroscedastic measurement model should be used, or a model with mixed response types. For a more detailed explanation of this way of formulating latent variable models in matrix form we refer to the first four pages of Rockwood and Jeon (). The structural model is simply \(\eta = \zeta \sim N(0, \psi)\), where \(\psi\) is its variance. Model Without InteractionIt can be instructive to start by considering a model in which we fix\(\\lambda_{4} = 0\). This type of model would be estimated with the following code:
In the data generating simulations, the true values were \(\lambda_{1}=1\), \(\lambda_{2} = 1.3\) and \(\lambda_{3} = -0.3\). The former two are very well recovered, but the latter is too positive, which is likely due to us omitting the interaction \(\lambda_{4}\), whose true value was 0.2.
Linear Interaction Between Observed and Latent CovariatesThe measurement model can be equivalently written as \[ \begin{pmatrix} y_{1} \\ y_{2} \\ y_{3} \end{pmatrix} = \boldsymbol{\beta}_{0} + \begin{pmatrix} 1 \\ \lambda_{2} \\ \lambda_{3} + \lambda_{4} x \end{pmatrix} \eta + \begin{pmatrix} 0 \\ 0 \\ x \beta \end{pmatrix} + \boldsymbol{\epsilon}. \] This way of writing shows more explicitly which factor loadings are connected with which observation. In order to fit this model with
7, we must provide formulas for the terms in the loading matrix \[ \begin{pmatrix} 1 \\ \lambda_{2} \\ \lambda_{3} + \lambda_{4} x \end{pmatrix}. \] We specify the factor interactions with a list of lists. The reason for this notation is that we need one list to hold the regression terms for each loading variable specified in
2.
This specifies that for the first two rows, there are no covariates, but for the third row, we want a linear regression with \(x\) as covariate. Next, we specify the loading matrix without the interaction parameter, i.e., we reuse the
3 object that was specified for
4 above. This lets us fit the model as follows:
A model comparison shows overwhelming evidence in favor of this model, which is not surprising since this is how the data were simulated.
The summary also shows that the bias in \(\lambda_{3}\) has basically disappeared, as it is up to -0.318 from -0.195, with the true value being -0.3. The interaction is estimated at 0.233, which is also very close to the true value 0.2. It should of course be noted here that the noise level in this simulated dataset was set unrealistically low, to let us confirm that the implementation itself is correct.
Interaction Between Latent Covariate and a Quadratic TermWe can also try to add interactions between the \(x^{2}\) and \(\eta\). We first update the formula in
5:
Then we fit the model as before:
As can be seen, the coefficient for this squared interaction is not significantly different from zero.
Models with Additional Random EffectsIt is also straightforward to include additional random effects in models containing interactions between latent and observed covariates. The dataset
6 is similar to
8 that was used above, but it has six repeated measurements of the response for each subject. The first ten rows of the dataset are shown below.
0 For these data we add a random intercept for the response terms, in addition to the terms that were used above. We start by resetting the interaction models to a linear term:
1 Next we fit the model using
7. The difference to notice here is that we added
9 to the formula. This implies that for observations that are responses, for which
0, there should be a random intercept per subject.
2 From the summary, we see that also in this case the factor loadings are very well recovered.
3 Model with Smooth TermsWe can also include smooth terms in models containing interactions between latent and observed variables. In this example, we replace the linear term
1 with a smooth term
2. Since this smooth term also includes the main effect of
3, which corresponds to an intercept for the response observations, we must remove the
0 term and instead insert two dummy variables, one for each measurement. We first create these dummy variables:
4 We then fit the model:
5 The summary output again suggest that the factor loadings are very well recovered.
6 We can also plot the smooth term, which is linear. That is, in this case the smooth term was not necessary. Not that we can also see this from the zero variance estimate of the random effect named
5 in the summary above, which mean that the smoothing parameter for this term is infinite, and hence that the smooth term is exactly linear. |