The best of causal inference

Interview with Miguel Hernan

When there is one topic, which is really hot, it’s causal inference. I’ve got in contact with it about 20 years ago when analysing observational studies and when nobody working on clinical trials would consider it. But now – after the introduction of the estimands framework – this becomes part of every statisticians toolkit. Today, we have one of the world leading experts in this field as a guest – Miguel Hernan. I’m talking with him about:

What is casual inference?
How can this help generate and analyze data to identify better strategies for the treatment and prevention of both infectious and noninfectious diseases?

In today’s episode, we will be diving deep into this interesting topic and talk specifically about the following:

  • How did you get interested in causal inferences
  • If we have an observational study with 2 arms and differences in baseline variables, what are the best ways to adjust for these?
  • What does this look like for multiple treatment arms?
  • In longer studies, we happen to see switches between different treatments. How do we compare treatments appropriately in these cases?


Miguel Hernán

Director of the CAUSALab at the Harvard T.H. Chan School of Public Health

Miguel conducts research to learn what works to improve human health. He is the Director of the CAUSALab at the Harvard T.H. Chan School of Public Health, where he and his collaborators design analyses of healthcare databases, epidemiologic studies, and randomized trials. As Kolokotrones Professor of Biostatistics and Epidemiology, he teaches causal inference methodology at the Harvard Chan School and clinical epidemiology at the Harvard-MIT Division of Health Sciences and Technology. His edX course “Causal Diagrams” and his book “Causal Inference: What If”, co-authored with James Robins, are freely available online and widely used for the training of researchers. Miguel is an elected Fellow of the American Association for the Advancement of Science and of the American Statistical Association, Editor Emeritus of Epidemiology, and past Associate Editor of Biometrics, American Journal of Epidemiology, and the Journal of the American Statistical Association.

Listen to this episode and share this with your friends and colleagues!

Subscribe to our Newsletter!

Do you want to boost your career as a statistician in the health sector? Our podcast helps you to achieve this by teaching you relevant knowledge about all the different aspects of becoming a more effective statistician.


Alexander: You’re listening to The Effective Statistician podcast, the weekly podcast with Alexander Schact and Benjamin Piske, designed to help you reach your potential, lead great sciences and serve patients without becoming overwhelmed by work. So today, it’s a really scientific topic, causal inference, and I have the world’s probably most renowned researcher in this area here, Miguel Hernan. So stay tuned. This is a really, really great episode. I learned about Miguel Hernan maybe something like 15 years ago where he was publishing about marginal structural models and how you can use them to better understand causal inference for HIV treatment and things like that. And at the time I found it fascinating, how you cannot only adjust for baseline covariates but also for time varying covariates in observational research. So now talking to him himself is really an honor for me. And so stay tuned for this outstanding interview with Miguel Hernan. By the way, talking about outstanding scientific content, there’s a lot of really great stuff coming up in June in Gothenburg at the PSI conference. So if you want to, you know, have a lot of really, really great content packed in three days then, see you there in June, in Gothenburg in Sweden and you probably suffer from having too many choices. Yeah, so usually when you go to a conference you see, hmm, maybe there’s this one talk in the morning here and just another talk in the afternoon there that you want to go to. Well that’s a PSI conference, you very often want to kind of clone yourself so that you can be in different sessions at the same time, you know, like in the Harry Potter, I mean it does. So this is how great the contenders are. So see you there, learn more about this conference at 

Welcome to another episode of The Effective Statistician. And today, I’m really pleased to talk with me, Miguel Hernan. Hi, how are you doing? 

Miguel: Hi, how are you? I’m fine. Thank you for having me here. 

Alexander: Yeah, and it’s actually quite easy to make this interview as he’s currently not in Boston, but in Madrid and that is a much easier time zone to have this interview. But before we get into the main topic of the interview about causal inferences, maybe you can explain a little bit where you were coming from and how you got interested in causal inferences. 

Miguel: Well, I’m a physician by training. And when I was in medical school, something that fascinated me was to learn how you have to treat a patient and I always wonder how they know that when they tell me, you know, for this type of patient, you have to get this particular treatment. My question was always, how do we know that? And that was really the beginning. It turns out to be a question that is very hard to answer and still trying to figure it out. 

Alexander: Okay, okay, and one of a couple of your earlier research was in HIV. Did you specifically work in this area? 

Miguel: Yes, I did. I am doing a lot of work on HIV with my colleagues. It is an area that was first very important. I trained as a doctor at a time when the internal amazing wards of the hospitals were full of people with HIV. It was a problem that I witnessed from very very close. And on top of that it tells how that it is an area in which the treatments are given over time. So these are time in treatments and not only change over time but also, of course, the confounders change over time, time varying confounders. So it was the ideal setting for a lot of the causal inference methods that I was working on with Jamie Robins in the mid-90s. So that became a very important part of my own research.

Alexander:  Yeah. Yeah, and I think it’s also an area where there were becoming more and more treatments available and it was pretty impossible to run head-to-head comparisons against, you know, all treatments against each other. And understanding what works best in all these different scenarios. I think there was much faster observational study coming available than the lots of the clinical trials. Also at that time network meta-analysis wasn’t really a thing. It was coming much later in the research. And if at the time it would have been possible to do these network meta-analysis things and much more kind of indirect comparisons. What do you have going in the same direction at the time? 

Miguel: Who knows? That is a counterfactual question. There are some types of things that you can do with network meta-analysis, some things that you cannot do. Some of the questions that were asked were questions about the effect of treatment strategies that are sustained over time. And that means you need to adjust for non adherence to that treatment strategy, then when things like, let’s start hand over of therapy the first time that we count goes below 500. And that means that network meta-analysis of something like that would be very hard to do even today because you have to work with the groups as they were defined in randomized trials, and it’s typically not the possibility of adjusting for adherence. So, for some questions, you will still need to use observational data with rich information on varying confounders. 

Alexander: Yeah. Yes, that’s a really good point that, you know, network meta-analysis is not the answer to everything especially given that usually for them you don’t have access to patient level data. You need to rely on, you know, literature data. You can only do, you know, what’s published in terms of the summary statistics. You can, you know, recompute any things like that, you very, very rarely have any information about post randomization, time-varying covariates. So that is nearly impossible to have. But let’s start, you know, before we get into the more difficult things, let’s start quite easy. So if we haven’t studies, that has only an observational study that has only two arms and of course, like in usual, observational studies theses have differences in the baseline variables. What is the best way to adjust for these differences so that we can, you can compare more like to like?

Miguel: There are many levels to that question. What is that? Of course, it doesn’t matter whether it is an observational study or a randomized trial. You have an imbalance in baseline risk factors in both types of designs, you will need to adjust, but we expect that to have in an observational study much more often than in a randomized trial. Then about how to adjust for those imbalances in risk factors, the short answer is that any method works really, you have enough data and you can do whatever you like best.  For some people it would be propensity score matching, for some people it would be inverse probability weighting, for others, putting the variables as covariates in the model for the outcome, G estimation, standardization, really any method will work in that simple case. And it becomes a matter of personal taste simply in  many, many cases. But the really important part I think is for causal inference from observational data we have, you said two groups, two treatments or two treatment strategies that we want you to compare. I don’t think the most difficult part is the method that we need to choose to adjust for the confounders. The most difficult part is defining the two groups. And that is what a lot of problems with observational studies are. I mean, if you will, in the past where people were comparing, current users of some treatment would never use those quite so that was the real problem. The problem was not so much adjustment for compounding, you could adjust for baseline imbalances beautifully. And the study would still be biased because we were comparing current users,permanent users, people who have already been selected after having used a treatment for some time with people who were not using. Everybody knows now that we shouldn’t do that. And now we use new used designs or things like that, in which we compare people who start treatment to people who don’t. But that’s only the first step, there are many questions for which there is not really, a new user versus non user is in someone who is starting a treatment strategy versus someone who’s not to start getting treatment strategy. And by starting a treatment strategy, it may be starting in your treatment or it can be stopping it in your treatment treatment or you can be switching to a different treatment or a million other things. So it’s really a comparison that needs to resemble what we would do in a randomized trial. That’s in our analysis of problems with observational studies, in the last couple of decades. It has been really hard to find examples in which the problem was bad adjustment for baseline confounders. It is much more common to find studies that are horribly biased because they were not defining the groups and the start of follow up, time 0 of follow up at the right time. 

Alexander: Yeah. I think if you have observational data that comes for example, from a claims database or from any kind of other databases that is updated on a regular basis, then understanding what exactly index date that you are speaking about, what is he kind of start date for each of the of the patients is not a trivial topic. It becomes much easier if you have something like prospective observational studies that you’re implementing, where you make the inclusion criteria in such a way that, you know, all the patients start a new treatment at baseline or patients start switching a treatment at baseline. Then you have much more kind of clear index date, but in real life we usually don’t have that. So yeah, that is a very very good point. In terms of selecting the groups, this treatment algorithm is a really interesting thing. I’m just thinking about when you know, there’s a new guideline coming out or these kinds of things you want to compare, for example, what’s happening before or after the guideline? And then maybe things are easier to define. What, typically the problems in others, you know, in defining the group since these observational settings. 

Miguel: Typically, at least in many cases, the problem is how to do the observational study, in the observational analysis, what is very naturally and easy to do in a randomized trial is making sure that the start of follow up, the time 0 is the time when an individual meets the eligibility criteria and also is assigned to a treatment strategy and this is obviously done in randomized trials when we start the follow-up at the time of randomization because the person is eligible at that time and also that’s the time when the person is assigned to a particular treatment strategy. So in trials it is very simple, but the fact that it is so simple, may make it hard to see that is a fundamental principle of the study design and any study that doesn’t respect that basic principle is at the high risk of bias. In observational studies, therefore we say, as a rule, or by default there may be exceptions in some cases and we can talk about those. But by default, we should try to make sure that the start of follow-up for every individual is the time when the person meets the eligibility criteria and the person is assigned to treatment strategy. And this is in our experiences the most difficult part for a lot of people. Sometimes it is because as you’re saying before, if we compare starting at members was not starting a treatment or having a vaccination but do not have vaccination,we can define the time zero for the for those who start the treatment, for those who are vaccinated at the time of the vaccinations are the same of the treatment. But what do we do for those who are not proceeding to treatment, what is their time zero? And there are simple ways of dealing with other ways like just choosing a random time 0, that is exactly what we do in a randomized trial, really we are choosing a random time in the life of a person that happens to be close to us when we are recruiting that far from trial. So we could do that until just a random time 0 we could match the time zero of those who are not really treated and those who are treated that there are multiple ways in which that could be done. It’s not always done right, but there’s no reason not to do it right. But then a more complicated aspect of this is when we want to compare a treatment strategy that involves something that you have to do over time. And in there let me give you a very, very simple case. When we say, okay, I want, you know, we’re taking aspirin for three years, is better than taking aspirin for one year for something and kinda like, okay. Now at time 0 in a randomized trial, people would be assigned to three years of aspirin or one year of aspirin and will know who is assigned to each group. So, very easy. But in an observational database, we see people who start aspirin at time 0 when they are eligible, but we don’t know who is going to take it for three years and who is going to take it for one year. If we do the naive thing that I’m sure everyone listening to these knows it’s not the right thing to do. If we do it, I think of looking at people who take aspirin for three years and compare and look at people who take aspirin for one year. That is a recipe for a more trial bias. And that’s a reason why there are a lot of studies that are biased in the literature. So that’s something that we cannot do and this applies to any strategy that is sustained over time. So we can look into the future for time 0 to decide who is in its group while looking into the future some investigators are lost because they don’t know how to classify people. If they know what they’re going to be doing again. There are ways of doing this that are correct. And they never used information that we don’t have at time 0, at baseline, to classify people into one group or the other. Exactly the same as we never use information from the future in a randomized trial, to classify people as baseline. So once these techniques are more white bread, people use them in a more standard way, like things like formulas or things like cloning and sensory and weighting. There are methods that can be used, then we would have eliminated what can be the most important problem in observational analysis, now, which is this mishandling of times 0. And once we do once we do that, now we can start talking about whether they risk factors are not balanced between one treatment strategy and the other, but that conversation we can only have it after we have fixed the other problem because there’s no way we are going to adjust our way out of the bias that we can have the selection bias and a amount of time buyers that we can have one times 0 is not the right place.

Alexander: Okay. Yeah. Yeah. So once we have adjusted for these biases, yeah, between the two treatment groups, we can only adjust for the variables that we have observed, not for those that we haven’t observed or haven’t observed correctly enough. How do you kind of communicate how much variable analysis is against any kind of variables and any other bias that you said you couldn’t adjust for?

Miguel: That is the fundamental problem, of course, of  observational analysis. That’s why all things being equal. We would prefer a randomized trial strategy to an observation strategy. We could do both in the same population with the same treatment strategies, the same for trials of the same outcomes, if everything were the same and the same person would prefer studying with randomisation and baseline. And so what we do in observation is that we get to think hard about what are the reasons why the people in each treatment  are different on average and measure them and adjust for them carefully. That’s really the best that we can do. After that there’s no way of proving that we have succeeded. But there are some indirect ways of building confidence on what we have done. One of them is the use of negative controls. Let me just give you an example and mention the vaccination before, right. So, earlier this year, we were doing the first real world evaluation of the Covid vaccines in Israel. And we had to compare people who have been vaccinated and people who have not been vaccinated and see what the effectiveness works. We had that information from the randomized trial from the phase 3 trial that Pfizer did, this was the biotech vaccine, but that trial was relatively small and we couldn’t have precise estimates for the more serious outcomes or by age group, or for pregnant women, or that were not even included in the trial. So, we were doing this with real data from the electronic medical records from Israel. And after we designed our analysis in such a way that there were no problems with  times 0, we did not use all the things that I mentioned before, because that is the first part. After that the problem was, where are the vaccinated and the unvaccinated, do they come from in terms of their risk of covid-19? Because with us we suspect that they are not, that people who would choose to be vaccinated soon, have a different behavior and have a different way of interacting with others. Maybe they’re using masks more often, maybe they’ve been more careful in gatherings, etc. And that makes direct comparison of vaccinated and unvaccinated is very dangerous. So in fact, when we compare vaccinated and unvaccinated people, we found that the vaccinated without any adjustments except for AIDS and sex, we found that the vaccinated at a lower risk of covid. But they had lower risk of covid. One day after the vaccination, two days after the receiving, so the first week after vaccination, or 10 days, they have the lower risk of it and that was impossible. That couldn’t be an effect  of the covid vaccine because we know from the phase 3 trial that that doesn’t happen. That there is a period during which the vaccine has no effect may be between 7 and 12 days. So by combining information that we had before the observational analysis was done, we could conclude that there was confounding, that the people that they vaccinated were different from the unvaccinated and they could not be compared. But that is very helpful because then it means that we have one way of quote testing when they’re there is compounding. And then we started to adjust for more and more thing and we realized that we’re very subtle differences between these two groups that had to do not only with the population group that they were in, but even with a neighborhood in which they live because some people in some neighborhoods were more likely to get vaccinated or less likely and also that was associated with the incidents of covid in that part of country. So we had to adjust very, very finely by maps in place of residence and many other variables. After doing that we saw that in the first 10 days there were no differences between the vaccinated and the unvaccinated. Then after that you start to see the differences are increasing and the vaccinated had a lower risk of covid, but it cannot be because they’re different from the unvaccinated because they have been different, where you have seen those differences in the first 10 days. So that is an example in which we can use something that we know about how it works, or how the vaccine works to design our observational analysis in a way that gives us confidence. That confounding can explain the whole thing. And now we can proceed and generate inferences for things that we could not get from the randomized trials because it was large enough. 

Alexander: Yeah. Yeah, and I really love this example with vaccination because it also shows that treatment outcomes and the selection can depend on many, many other things other certain kind of so typical biological things that you would measure in clinical databases, like pretreatment symptoms, disease state, all these kind of other things, but it can depend on your personal risk profile on your behavior, on your, you know, on your profession of kind of different things or in this case, your neighborhood. And having this kind of understanding of how the data is happening, is really important. So digging deeper into the background of the datas immensely helpful here. So we have just talked about what we would do if we had two treatments. How does it differ, if we get into multiple treatments? So for example, if you do propensities score you,look for the probability of getting one treatment versus the other. And very often logistic regression is used for similar tools. How would you do it if you have multiple treatments? How do you come up with good propensities then? 

Miguel: When there are multiple groups, we have people who find multiple treatment strategies. That doesn’t happen very often in randomized trials because it’s hard to do a trial with many arms, but that’s really one of the advantages of observational studies that we can do that almost for the same price. So it is a very important setting. I’ll just give you an example. I was talking before about HIV and in HIV, you might want to know this is now known, but I’m talking about the early 2000s, 2010’s, people wanted to know when to start treatment when one was the best time to start antiretroviral therapy, and eventually, they were a few trials, but they have only two arms. You start at level when you’re a safe account first drops below 500 or when it drops below 350, but with the observational studies, we could look at 30 arms. We could look at essentially every level of C-4 count between 600-200 in steps of 10. And that means that is an advantage if we can adjust for confounding in the right way. So, when we have multiple arms, the treatment of strategies are sustained over time. When we are talking about things that are sustained over time, we cannot use some adjustment levels. For example, we couldn’t use propensity score matching because our method is designed too much at baseline, the treated and the untreated. But now we are talking about treatment that keeps happening over time. And at each time, we need to assign whether the person is starting to get a treatment strategy or not. We couldn’t adjust for compounding by putting the variables, the potential confounders of the model, from the outcome because upon her changing over time we put post baseline variables in the model for the outcome. We can get biased. We cannot use instrumental variables, even if we had an instrument because the conventional instrument driver estimator doesn’t incorporate time in treatment. Essentially, we cannot use any conventional method, but we can use what is known as the G-methods. So the G-methods are essentially three types, the g-formula, inverse probability weighting and g-estimation. These are methods that Jamie Robins and the others have developed since the 1980s. These are methods that are designed to do with a in treatments and therefore allow us to compare treatment strategies even without multiple treatment strategies over time. And that is really the way to go in practice. Most commonly used G methods are either a g-formula or even more frequently inverse probability weighting which is the easiest one to use. But as soon as we have sustained strategies over time, whether we have multiple, the only option that we have left in general, is the use of G-methods. And all the other more conventional ways of adjusting for confounding, like a propensity score matching or putting the covariates in the model from the outcome. Like, in a model or any of these classical ways of confounding adjustment, generally kind of be used for sustained  treatment strategies. 

Alexander: Okay. For these G-methods, if you are a statistician and you want to explain how these works to a physician, how would you do that? 

Miguel: That’s a very good question. I’ve been trying to for me, really. I don’t think I’ve always been sexy. Maybe the basis I want to explain is inverse probability weighting because which is actually in the simple cases it’s the same as epidemiologic standardization. Here, you will explain to people. Look, if you have information on the confounders, then you can give different weights to different people in your study population. With the goal of making sure that after weighting, after you give these weights mathematically, you eliminate the association between the confounder and the treatment. That there is association between the confounder and the treatment, which is why there is confounding plus an association which confounder and the outcome. Can you unpackage the association between the confounder and the outcome by giving weight to people in such a way that the association between the confounder on the treatment mathematically disappears. And as soon as you lack that confounding, the weights themselves are eliminating the confounding. The advantage of using this method is that you never have to condition on the confounders, like all the other methods. You conditioning on the propensity score, which is a function of the confounders. You put the confounders in a model for the outcome, you are conditioning those confounders in the model. And it is that conditioning that creates bias when we have treatment confounders and there is feedback between G-methods.  By weighting we never condition. We just weigh people, and we can weigh  people differently at different times. So we eliminate the time varying compounding. The disadvantage of weighting is that there are people in the data who are doing things that are not very common, then they will end up with very high weights and that will increase the variance and can create some problems with the estimate not being stable. And that has to be that way, that there are different ways. But that it’s probably the simplest of these 3 G-methods to explain so, we typically start there. 

Alexander: Okay. Yeah, that’s a good point. And I think the last point that you made is that the weights don’t need to be stable for an individual patient, but they can vary over time as you have time varying covariates that have an impact on your treatment of, you know, C4 counts in HIV. That determines whether you switch treatment or maybe in schizophrenia, it could be some kind of certain symptoms that trigger treatments which these types of things you can take into account, to make sure that you compare different treatments for different treatment strategies in the best way. So for the listeners, we talk now about causal inferences and we talked quite a lot about one of the key things that we need to think about. Whenever we talk about estimands, that is what is actually the treatment that we are interested in or what are the treatment strategies that we are interested in? And we talked about the advantages of observational data versus clinical trial data, even when we use network meta analysis. We talked about different ways to adjust for these baseline and also for time varying covariates. And also, finally how to best explain that to someone that is not a statistician, which I think is really, really useful. So, for the statisticians, what do you think are your key takeaways that they should have when they go into observational research and maybe you can also point to some of your favorite resources that you would recommend.

Miguel: To me, the most important point is to realize that good study design is good study design. And that it applies both to randomized trials and to observational studies. That anything that we do in a randomized trial, and we think is good study design should be done also in an observational study. So if a randomized trial will never compare current users and use some time comparisons of anything that we don’t do in a randomized trial, it’s probably for a good reason and then we should do it in observation study. One example that I like a lot is how in a randomized trial, we always have measures of absolute risk and measures of relative risk. But in many observational analyses, we only see the measures of relative risk, we don’t see the measures of absolute risk. Yes, because people have not been taught how to generate adjusted absolute risk, which is not very hard to do. So, the message is, if you know how to analyze a randomized trial. Well, then, you know how to analyze an observational study, a follow-up observational study well, and vice versa, and really no fundamental difference in the analysis with the difference of course that in an observational study we need to adjust for baseline confounders. And when we are trying to estimate the observational, analog intention-to-treat effect in a randomized trial. But if we think about estimating the per protocol effect in a randomized trial of the observational analog of the other pre protocol effect in an observational study. In both cases the analysis is exactly the same. And in both cases we need to adjust for confounders both at baseline and post baseline. So if we know how to analyze an observational study with time varying confounds  then we know how to conduct a per protocol analysis. In per protocol analysis of a randomized trial with non-random, non-adherence is really the same and we need to do some effort to make sure that is statistician and data analysts can go from one to the other of supposed to be like, I only do trials and do these and I only do observational studies and do these because that prevent us from understanding better, that data analysis for causal inference from follow-up to studies is the same, whether the data are randomized or not. 

Alexander: Yep. That is a very, very good point. As I have worked both in observational studies and clinical trials. I can see that there are a lot of benefits of learning from each other. It’s quite interesting to see that lots of the research that was done in observational studies about causal inference, is now being used in randomized trials with the advent of the estimands and better taking care of understanding what is readers treatment that we’re interested in, which I think wasn’t very well addressed before we talk about estimands in the clinical trial setting. However, it was a topic for lots of discussion in the observational area long before. And that’s a really good point. Any specific references you would like the reader to guide to, the listener to?

 Miguel: I like people like more people to read the causal inference book that I wrote with Jamie Robins, Causal inference-What if? You can just google it and find it. It’s free. You can download it from our website and there we are going in much more detail over many of these issues. There’s also a number of papers that members of the casa lab that I direct at the Harvard School of Public Health, papers that apply this methods, this way of thinking of observational analysis  for causal inference are really just our attempt to emulate a hypothetical randomized trial, which we call the target trial. So, these papers, any other papers that have been published in the last couple of years, which I think have had, can serve as a guide for people who want to learn more about this way of thinking. 

Alexander: Thanks so much. Yeah, and we will share these links in the corresponding blog post. Just head over to The Effective Statistician and search for this episode and you’ll easily find this. Thanks so much Miguel. It was great talking to you. 

Miguel: Thank you very much. 
Alexander: This show was created in association with PSI. And the PSI conference happens in June in Gothenburg. Head over to Thanks to Reine and her team, who help the show in the background and thank you for listening. Reach your potential, lead great sciences and serve patients. Just Be an Effective Statistician.

Never miss an episode!

Join thousends of your peers and subscribe to get our latest updates by email!

Get the shownotes of our podcast episodes plus tips and tricks to increase your impact at work to boost your career!

We won't send you spam. Unsubscribe at any time. Powered by ConvertKit