Structural Equation Models - The Effective Statistician

In this episode, Christian Geiser and I embark on a journey to uncover the complexities and applications of SEMs, a statistical method that often flies under the radar at typical biostatistics conferences yet holds significant potential in research, particularly within psychology and social sciences.

How do SEMs differ from traditional statistical methods?
What makes them so valuable in dealing with measurement error and complex variable relationships?

Join us as we explore these questions and more, shedding light on a tool that is both powerful and underutilized.

We also discuss the following key points:

Unpacking Structural Equation models
Applications and Software for SEMs
Learning Resources and Personal Insights

This episode shines a light on the often overlooked yet incredibly potent tool of structural equation modeling. Through Christian’s expertise and practical examples, you’ll receive a thorough introduction to SEMs, from their theoretical foundations to their diverse applications in research. Whether you’re an experienced researcher or new to the field, this episode delivers valuable insights on how SEMs can enhance your analytical skills and refine the accuracy of your research findings.

So share this link with your friends and colleagues!

Christian Geiser, PhD

Christian is a former professor of quantitative psychology. He currently

works as an instructor and statistical consultant with QuantFish, LLC. His areas of expertise are in structural equation modeling, measurement, longitudinal data analysis, latent class modeling, and multitrait-multimethod analysis. For more information, visit https://christiangeiser.com/ and https://www.goquantfish.com/.

Transcript

Structural Equation Models

[00:00:00] Alexander: Welcome to another episode of The Effective Statistician. And today I’m talking with another German. However, he is actually sitting in the U.S. Hi, Christian. How are you doing?

[00:00:15] Christian: Hi, Alexander. Thanks for having me.

[00:00:17] Alexander: Very good. So yeah, we met while are your wife, who’s also a quantitative scientist and yeah, Christian had an interesting career move from Germany to the US through academia into, yeah, now doing, driving his own company. Tell us a little bit about how that went.

[00:00:42] Christian: Yeah, so I went to the United States after my PhD, which I got from Free University Berlin in Germany in 2008. And so in late 2009, I moved to Arizona and joined Arizona State University’s quantitative psychology program for two years. But then my wife, [00:01:00] who I met in Arizona, got a job offer.

[00:01:02] For a tenure track position at Utah State University, and then they were so kind as to offer me a tenure track position as well. And so we moved as a couple to Utah in 2011 and were there for a while. And then after a while, we decided that we wanted to start our own company, be a little bit more independent I think increase our reach with our teaching. And so we founded Quantfish, which is a statistics training company where we offer workshops on advanced statistical methods online on demand workshops. And so that’s what we’re doing right now. And now we’re in West Virginia here in the mountains in the Appalachian Mountains.

[00:01:44] Alexander: Awesome, and today we will talk about a topic that I rarely see being discussed at the typical BIOS stats conferences, and that is [00:02:00] structural equation models. What is that actually?

[00:02:05] Christian: Yeah, it’s actually very interesting that this is something that we as psychologists are so focused on almost obsessed with structural equation models and then all other social scientists as well, like sociologists, for example, but then other statisticians might look at this and might say, well, this, what is this?

[00:02:23] This is weirdly invariable structural equation models. How can a variable be latent? That is strange and they maybe don’t like it. So I’m really excited that you’re giving me the opportunity here to make a case for structural equation modeling. And then maybe people will see it in biostatistics.

[00:02:41] Your audience hopefully will see that it’s not voodoo science, that it’s not weird. It has a solid mathematical foundation and it’s really useful. So what is structural equation modeling about? So you could think of it as the more general model that encompasses regression analysis, multiple regression [00:03:00] as a special case.

[00:03:01] So it’s a lot more general. So we can have more dependent variables. We have, can have multiple independent variables, and we can also have variables that are so called mediator variables. So we can have variables that are at the same time, independent variables and dependent variables or outcome variables so that we can also look at.

[00:03:21] Indirect effect. So it’s a multivariate statistical technique that allows us to analyze complex relationships between multiple variables simultaneously. And moreover, the most general type of structural equation model also includes a measurement model, so say, or a latent variable model, which. Allows us to account for measurement error in our score.

[00:03:48] So we use in a structural equation model, more than one measure for each of the constructs or attributes that we want to measure, for example, let’s say anxiety, depression, [00:04:00] or something like that. We have more than one item or more than one scale for measuring each construct. And then that allows us to account for measurement error by.

[00:04:09] Specifying a factor model where we have multiple indicators that measure a latent variable. And so that latent variable then, so say sucks up the reliable variants that is shared across multiple items for the same construct and allows us to separate unique item variants and error variants from shared variants.

[00:04:30] And so then this latent variable contains only. The so called true score variance, meaning the reliable variance, and that allows us then to look at the structural relationships between the different variables in a more precise way, because those latent variables don’t contain measurement error. And so that is probably the main advantage.

[00:04:52] of structural equation ones is that they allow us to separate out measurement error and then the path coefficients or regression [00:05:00] coefficients and also the correlations that we might estimate between the variables in our structural model or causal model they would be less biased because there’s no attenuating effect, no biasing effect of measurement error anymore.

[00:05:15] And that’s important for us because in psychology, our measurements are just not so reliable. When you are in physics, then you have maybe measurements that are so reliable, so precise that you don’t need this. But many of our measurements aren’t so precise when we, for example, assess Depression or anxiety or intelligence or personality traits, then those are always measured with some amount of error.

[00:05:39] And so then that can lead to distortion of the estimation of path coefficients, regression, coefficients, correlations. And so, in order to account for that, we use latent variables in structural equation models. Also, this framework is.

[00:05:53] Alexander: That is a very, very important point. Yeah. So you see, I think that is very often [00:06:00] not, you know, seen within biostatistics.

[00:06:03] That is a very, you know, typical thing in epidemiology. Yeah. Let’s say you have look for the effect of, I don’t know, rather than exposure on some kind of disease. Then if you’re measurement of the exposure has a lot of variability, a lot of error that will always decrease the regression coefficient.

[00:06:32] And of course you would kind of Always on the estimate how much kind of the effect truly is. And so this is a very, very interesting concepts that kind of, we can much better understand the, the, the measurement error and. Especially patient reported outcomes, but also physician assessed things, everything [00:07:00] that contains. Typically, a lot of measurement error.

[00:07:03] Christian: Right? And Alexander, it’s not just that the coefficients will be underestimated. When you have a complex model with direct and indirect effects, then it might have the opposite effect as well. And it’s difficult to predict. And so that is important for us because people like to estimate these complicated structural models where you have multiple dependent, independent and mediator variables.

[00:07:24] And then when you have measurement error, then you cannot predict whether you’re. Effects will be under or overestimated even, and it could create a whole mess when you don’t account for unreliability in the measures. And so that is the key advantage.

[00:07:40] Alexander: Of that is also an interesting thing you mentioned before we started to talk about kind of in psychology, you would have, for example certain interventions that would, let’s say, trigger certain behavior and kind of help with a certain behavior. And then there’s, why [00:08:00] is this behavior? You also get, let’s say, improved anxiety or less depression or Thank you. Less other kind of stress factors or whatsoever. I think similarly, we have similar kind of scenarios, but in medicine, where, well, and If it’s about psychiatry, of course, that is also medicine if you think about you help with to reduce pain and then through the pain reduction, you also have better physical functioning, or you have improved other health, health outcomes, or you have less caregiver burden or whatsoever. Yeah, there’s all kind of different follow ups from that.

[00:08:46] Christian: That’s exactly right. And so those types of indirect effects where your intervention might target an intermediate variable, not directly your outcome. Those can be assessed with path analysis, which is a special case [00:09:00] of structural equation modeling, where you you’re Look at the mediated or indirect effects.

[00:09:05] They can be estimated. Their confidence intervals can be estimated using bootstrap methods, for example, so that you get an asymmetric confidence interval because an indirect effect is a product of two regression coefficients or more. And so then you need special techniques for figuring out the statistical significance of that, and that’s all possible within the SEM framework.

[00:09:27] And that’s actually something that in my field, people do all the time where they Indirect effects between variables, and you can also estimate nonlinear effects, for example, as well. So product terms you can have, or you can have interaction effects, square terms. So, moderated structural equation modeling, all that is also possible.

[00:09:49] Alexander: Okay. Now you mentioned another terms and that is latent variables. So you mentioned a latent variable is a variable that I don’t, [00:10:00] can’t directly observe. So,

[00:10:03] And you, as an example, you said anxiety. So when, when I think about anxiety, I very often think about, well, we have scores for that. We have, you know, see Hamilton, you know, no, it’s a hospital anxiety and depression score. And we have all kinds of other scales for that. But you wouldn’t say they measure directly anxiety, but indirectly anxiety.

[00:10:32] Christian: Yeah, see, this is a really great question. And we could probably make a podcast just about that. And if we could make one that lasts for more than a week, probably because people have so many different ideas about what latent variables are and how latent they are versus not. My idea of latent variables is actually a lot more, a lot less obscure, I would say.

[00:10:52] I do think that these scales directly measure. anxiety. So that’s the idea that there’s a direct link between a factor or latent [00:11:00] variable that for which these indicators are reflective, we say indicators of that latent factor. So there’s a direct link and it’s actually modeled through factor analysis, specifically confirmatory factor analysis.

[00:11:13] And that has its roots in test theory, classical test theory. So there’s a very solid psychometric found. for what a factor means and how it is identified. So a latent variable in classical test theory is defined as a conditional expectation of the observed variable. So over an infinite number of repeated trials, so say hypothetically, assuming that you could give people the same scale over and over and over again.

[00:11:38] And record their scores without there being any memory effects, test repetition effects, fatigue annoyance or something like that. If you could do that and wipe out their brain in between, then so to say the true score would be over this infinite number of trials would be the mean for that person across these types.

[00:11:56] And so that’s very non latent. Yeah. Right. That’s, I mean, that’s pretty. [00:12:00] Concrete, so it’s very bound to the observed score. And of course, we can’t do that in practice. We can’t get at the true score of a person because we can’t expose people to indefinite trials. But what we can do is we can estimate the variance of a true score variable, and we can estimate the mean of a true score variable.

[00:12:18] We can estimate the covariances and correlations of different true score variables. And that’s all. There’s a solid foundation for how to statistically mathematically identify that. So once you have two measures, for example, you can assume That’s how equivalent let’s say you have two parallel anxiety scales, or you have scale halves.

[00:12:36] You split your scale into two parcels or test tasks. Then you can assume that they might be tau equivalent, which means they measure the same true score variable, the same factor with equal weights. And then, so say you can identify based on that, you can identify the variance of the true score. The fact of as the covariance between the two test scores.

[00:12:57] So that’s not very latent. Right. And so but [00:13:00] other people have very different ideas about latent variables that I don’t share personally, because I like it to be concrete and well defined psychometrically mathematically, but there are a lot of other ideas out there as well.

[00:13:11] Alexander: Okay. And Susie. So if you, for example, have a questionnaire. Yeah. And that has, let’s say 20 items on it. You could have multiple latent variables behind that and the different items would could kind of contribute to these different latent variables in different ways, isn’t it?

[00:13:38] Christian: That’s right. And so that’s what factor analysis is for. So there’s 2 different kinds. There’s exploratory factor analysis and confirmatory factor analysis. So exploratory factor analysis would be applied if you had 20 items, and you had no idea what they’re measuring. So you want to find out what factors are reflected in the covariance or correlations between those 20 items, and [00:14:00] you would subject the 20 items to an exploratory factor analysis and.

[00:14:04] The number of factors and the loading pattern would then be the result of the analysis would be empirically derived. And so you would find that, okay, it’s one factor sufficient. So is this a unidimensional scale or do you need two factors, three factors, four factors that you could find out with an EFA.

[00:14:20] Now, the more typical cases that you have a scale where you already have a theory as to how many factors that scale should measure. So, for example, a 20 item anxiety scale. So you would then look at whether there’s a single factor and you could fit a single factor confirmatory factor model. To it and see if that fits and often it will not because there’s some multidimensionality often involved.

[00:14:43] But yes, so that’s what factor analysis does. And so factor analysis is part of the structural equation modeling framework where that’s to say constitutes the measurement model or the measurement part. And then the structural part is where the latent variables are interconnected through either [00:15:00] regression equations or correlations. So covariance.

[00:15:05] Alexander: And then the, you could from then also include things like interventions, like have two different groups that get two different treatments and you can also measure the effect of the intervention on these latent variables.

[00:15:25] Christian: Absolutely. So absolutely. And in quasi experimental designs, you could also include covariates to control for background things such as age, gender, socioeconomic status, whatever you can include manifest variables for variables where like, for example, age, you would think that that’s probably pretty reliably measured oftentimes.

[00:15:45] So you wouldn’t have multiple indicators for age. Typically, you would just assume that that variable is pretty reliable. Same for gender. probably where you have only a single indicator, but that’s no problem. You don’t have to have a latent variable for each construct in your model. You can have [00:16:00] observed variables, same with intervention and control.

[00:16:02] You would have a binary dummy variable that represents or, or multiple dummy variables that represent intervention and control or placebo groups. And those could be entered into your model. And also your outcome variables could be continuous or binary or so it’s very flexible in terms of the scale level off the variables.

[00:16:22] Alexander: And you could also have something like that has actually measurement error, like the, the baseline assessment of anxiety or depression or, you know, the questionnaires that you have before the intervention as well.

[00:16:38] Christian: Right. And you would typically want them to include that as a latent variable if you can. So you want to control for measurement error if you can. So you’d have multiple indicators for anxiety pre at pre-test, and then you would have the same latent variable again at post-test with the same indicators typically. And so that can be modeled longitudinally. And in fact, [00:17:00] structural equation modeling has a big advantage, especially for modeling longitudinal data, because with longitudinal data, when we measure change across time, then measurement error has an especially devastating effect because in a change score.

[00:17:15] The unreliability from both the pre-test and the post-test goes into the change score. So if you compute and observe different score variable or change score variable between measured variables, then this is especially unreliable. It can be, it can be especially unreliable, which is a problem for some things, not for others.

[00:17:36] So for example, when you want to look at. Inter individual differences in change over time, meaning differences between people in how much they change over time, then it is a problem because then a lot of the variance in the change score will just be error variance, and you don’t want that. And so you can with structure equation models you can model change as a latent variable so you can measure or you can construct. [00:18:00]

[00:18:00] true score difference variables or latent change score variables, which means differences between latent variables. And then those are not affected by measurement error. And that has big advantages when you are interested in finding out, for example, why some people changed more than others. Yeah.

[00:18:16] Sometimes, for example, we see that interventions work for some people. So they see a change and not for others. And then we want to know why. So what background variables covariates might explain that there are differences in change over time between people? Like, for example, is that a gender thing? Is it that the intervention works only for females, not for males?

[00:18:37] Yeah. And so we can then test that as well by regressing the latent change score variable on gender or other variables.

[00:18:45] Alexander: That’s awesome. Now as you have shown, structural equation models are very, very kind of big toolbox with lots of kind of different special cases. Like we talked about these exploratory and [00:19:00] confirmatory factor analysis.

[00:19:01] We talked about looking into changes from baseline also with interventions, being it randomized studies or observational data. Now, if people want to actually do these kind of things and implement them, they have a data set they would like to run these algorithms with. What is What can they use to do and implement that?

[00:19:28] Christian: Yeah, there are plenty of software options available and a bunch of them are free and they work great. So for example, in the R statistical software environment, there is LAVAN, L A V A N, which is an awesome, very easy to use software program or R package that allows you to do pretty much everything that you can imagine with structural equation models.

[00:19:50] And there’s also OpenMX, which is also an R package for Structural equation modeling, and there are a bunch of commercial options as well. So I personally like [00:20:00] M plus the M plus software is very flexible. It integrates a lot of different types of latent variable methodology. So it can, for example, not.

[00:20:09] Only do structural equation modeling and factor analysis, but also latent class analysis and mixture modeling. So you can even combine those and have a factor mixture model or growth mixture model. For example, you can have latent profile analysis, latent transition analysis. You can do multi-level modeling in M Plan.

[00:20:27] Plus, and you can do multi-level structural equation modeling as well. And I like that. So in this package, everything is so say integrated in one package and the syntax is easy. So I like that, but it’s not free. And then there are also other software programs that are commercial, like AMS, EQS, LISREL. Those are all options that can be used for structural equation modeling.

[00:20:50] Alexander: And we’ll put links to these kinds of different options into the show notes so that’s easy to find. And now, where people [00:21:00] can learn more about these things from you?

[00:21:03] Christian: So I now have this workshop business, like I mentioned at the beginning that I do together with my wife.

[00:21:08] It’s good. It’s called Quantfish and undergo quantfish.Com. You can see all our workshops. We have a bunch of free online workshops that you can try out. And on go quantfish.com, we offer a lot of different courses on structural equation modeling and other latent variable, methodologies where, for example, also offer courses on longitudinal modeling, growth curve modeling, latent state rate modeling, all that kind of stuff.

[00:21:34] And you can also check out my YouTube channel, the Quatfish YouTube channel, where I provide weekly statistics tutorials. I have a weekly newsletter that people can sign up for weekly stats newsletter, and I do offer personal consulting as well. So those things can be found on my website, christiangeiser.com.

[00:21:54] Alexander: And last but not least, of course, well, what should I say? Of course, [00:22:00] Christian is also an author. So there’s also books that he has written. And so of course, these are also great resources to learn about these kinds of different things.

[00:22:11] Christian: That’s right. Thanks for mentioning that. How can you forget your books? I know it’s been a while, so you forget after you write a book.

[00:22:19] Alexander: Awesome. Thanks so much for this great deep dive into structural equation models. What they are, how we can apply them in all kinds of different settings, where they’re especially helpful. We talked a little bit about kind of. What these models can look like, what are mediators, what are latent class factors or we can, you know, enrich models with different more complex designs and also how we can implement them using open software like R, but also off the shelf software like M plus, and it’s always great to have another person set Is investing [00:23:00] in helping our field move forward and apply all these kinds of different things. Thanks so much for being on the show.

[00:23:08] Christian: You’re very welcome. Thank you so much for having me. It’s been a pleasure.