Estimands in Observational Studies - The Effective Statistician

Ever wondered about the intricate connections between observational studies and the Estimand framework in the pharmaceutical industry?

How do the challenges in multi-country primary data collection unfold, and what crucial role do assumptions play in propensity score methods?

In this captivating episode, I’m excited to have Artemis Koukounari, an accomplished statistician and associate director in real-world evidence at Novartis, join me for an insightful conversation.

Join us as we explore these questions and more, unraveling the complexities of real-world evidence and offering a glimpse into the future of data research in the pharmaceutical realm.

We also discuss the following key points:

Observational Studies and Estimands
Observational Research Experience
Defining Estimands in Observational Studies
Prospective Studies Challenges
Retrospective Studies Challenges
Comparison with Clinical Trials

Whether you’re a seasoned statistician or a curious listener, the insights shared about observational studies, Estimand frameworks, and the future of real-world data research are bound to leave you with a wealth of knowledge.

Share this with your friends and colleagues who would benefit from this. Tune in now!

Artemis Koukounari

Associate Director RWE Manager at Novartis

Artemis has over 13 years of experience post PhD within both Academia and Pharma Industry in observational research and real world evidence. During her 2 pharmaceutical appointments to date, she has effectively supported several development plans across various stages of the drug development lifecycle and therapeutic areas (synthetic controls in early phase clinical trials and post launch-product, Real World Data studies in oncology, multiple sclerosis, hidradenitis supurrativa & filing for COVID-19 treatment and prevention). Prior to these roles she held academic appointments at the London School of Hygiene and Tropical Medicine, Kings College London and the Liverpool School of Tropical Medicine, focusing in applied latent variable modelling methodology and the counterfactual framework for causal inference for infectious disease and mental health epidemiology. [To date, she has co-authored 53 peer-reviewed publications (-17 as the 1st author-) with 2171 citations and H-Index of 23 (Web of Science)].

Transcript

Estimands in Observational Studies

[00:00:00] Alexander: Welcome to another episode of the Effective Statistician. Today I’m really excited to talk with another listener of this podcast and someone that also has a lot of knowledge in an area that I have worked a lot in. Welcome Artemis, how are you doing?

[00:00:21] Artemis: I’m really good. Hi Alexander, thank you very much for having me in your show.

[00:00:26] Alexander: Very good. And today we are talking about two topics that come together here. Real world evidence or observational research, on one hand, which is a pretty hot topic for many years, but it’s only growing hotter over the last years. And the other is estimands. I think the estimands topic, of course, came from the clinical trial area.

But it’s, you know, it’s similar, important, or maybe even more important in the observational area. And what I really love about the Estimand framework is that it kind of pulls a lot of methods, approaches, problems that have been tackled in observational research for many, many years. Into the clinical trial area and so I love that, you know, here we are connecting the dots in a much better way. Before we go into the content part I think it’s please introduce yourself.

[00:01:33] Artemis: Thank you. Thank you very much, Alexander. I have most of my experiences with observational research. I have studied BSc and an MSc in Statistics and then I did a PhD in Statistical Epidemiology. In the last three years, I have transitioned to the pharma industry from academia. And now I work as an Associate Director in Real World Evidence with Novartis. So it’s my great pleasure to be here with you and talk about this topic. Very important topic.

[00:02:08] Alexander: Awesome. So let’s dive into where we’re coming from. Yeah. So in terms of observational studies what is your experience around the history of estimand type questions we have there, both in perspective and in retrospective observational studies.

[00:02:30] Artemis: Yep. In retrospective secondary data use I have been involved in teams looking at comparative effectiveness. And also on hybrid controls where you have actually 2^nd phase trials where the sample sizes are really small and you try to enhance the existing control with historical data, either from trials or real world data.

So over there again the target of inference becomes very important. Who do you want to estimate the treatment effect for the overall population for just the treated and, and so on. In terms of prospective studies, currently, I’m part of a team that tries to set up multi country primary data collection, where again the main aim is comparative effectiveness.

However, the fact that it is from multi country sites creates a lot of issues with the inherent assumptions of propensity score methods, like for instance the consistency assumption, where you must have clear treatments becomes a nightmare of having the same standard of care in all the countries involved.

The positivity assumption is under doubt because there are different access issues. So do all the patients have the same probability of getting the treatment on the question. And then, you know, the main assumption of exchangeability and measured, confounding, can we take into account the confounders that are needed in order to implement these propensity score methods to achieve balance between the treatment groups and be able to infer causal treatment effects.

But in all these experiences I feel that we are not driven by first defining the estimand and then we coupling methods associated with it or with whom are our stakeholders? Like what is the research question that they impose and we define the estimands , so, I think, as in the world of trials, we really ought to become more serious about defining first the estimand as an iterative process and then looking at what methods can back up the target of inference.

And also in, in terms of retrospective studies, another experience that I had was trying to extrapolate the findings of a trial to a target population with real world data. So over there, the estimate was the targeted average treatment effect. And there, there are issues how are the covariates that you are looking from the trial to the real world measured have they been measured in the same way? Do you miss some other effect modifiers? So yeah.

[00:05:36] Alexander: That is, there’s a lot of other topics. Yeah. I am just thinking back in terms of my early time in observational research, yeah, where we also did comparative. Observational studies and I’m just thinking about a particular one that I worked on about 20 years ago, really big schizophrenia observational study with many different treatments.

And at the start, it was pretty easy. We looked into three months follow up data and, well, in the first three months after starting therapy, nothing changed a lot. Yeah. So it was pretty easy. Well, we look into the data as based on a treatment policy, more or less, you know, what you start with, we analyzed it accordingly.

Yeah. But then, of course, the first problem was, okay, in an observational setting, you don’t have a randomization, and so you do the propensity scoring approach to compare apples to apples. Yeah, similar patients. And thinking back, I, you know, we had sent these different propensities categories. Yeah.

Having the reference treatment probability of zero to 20%, 20 to 40 percent and so on. Thinking back, we, you know, we just pulled across these. Yeah. That’s it. But we never really thought about, okay, does pooling actually make sense. Yeah. Is there treatment by propensities called interaction? And if there is.

Well, that is pretty interesting, yeah, because then, of course, the treatment effect depends on your propensity score. Now, the propensity score is not something that, you know, you can give to patients easily, yeah, so you can’t say, well, you can’t look just into their data and say, hey, you have a propensity score of that, and therefore your treatment effect will be this.

It’s a calculated score. Yeah. So that comes with completely different topics than with the typical things in clinical trials. Yeah. Where you have randomized comparisons, unless of course You also do propensity scoring there. You get into the same problem. Yeah. The other thing that I found really interesting that I learned from observational studies was this kind of survival bias.

Yeah. So in the end, this observational study expanded not just for three months, but for three years. And already after one year, if we compared the patients that started on a medication and that were still on the same medication at one year, they were all the same. Yeah. There was no treatment effect whatsoever anymore.

Yes. And then the physician said, well, that’s pretty clear because If you haven’t responded, yeah, and you’re not working well, then you will switch. So, more or less, because of the standard of care, of the practice, by design you don’t see any differences. And kind of on treatment S Demand after one year and this kind of thing was pretty useless because everybody would be useful.

[00:09:21] Artemis: Yes.

[00:09:22] Alexander: And so yeah, that, that showed that, you know, this kind of problem about different estimands and yeah, only more than 15 years later, this kind of thing. Thing emerged upon the estimands.

So how do you see it? You mentioned one thing that is the positivity assumption. Can you speak a little bit more to what this is and why that might be a challenge?

[00:09:52] Artemis: Well with the propensity score methods as you know, nothing comes for free. It’s a very nice method of trying to tackle confounding and measured confounding and achieve balance of those confounders among the treatment arms, but it comes with assumptions. The main assumption is one of the main assumptions that is the exchangeability which refers to measured confounding and conditional dependence of these covariates on the treatment. Then there is the positivity assumption.

Every patient needs to have positive probability of receiving the treatment that you are studying, not having any, you know, not being excluded from the treatment from some contraindication or something. And then there is, as I said, the consistency assumption, which in my case, as I have seen it now in practice becomes a nightmare in a multi country study.

Like how clear your, the definition of your treatments are and your treatment strategy. And so on. So what I’m trying to say is that if any of these assumptions by reality, by the real world, by the practice gets violated, your propensity score methods will be problematic and you will have bias in the obtained results.

But, as statisticians we still have hope in the sense that we can use diagnostics to detect whether, you know, some of these assumptions might have been violated. And positivity, you can get some hints if you apply weighting propensity score methods by extreme weights, by some specific graphs And there, you know, that gives you a ledge of, perhaps trying to apply some trimming or truncating, but then you have to be careful what does that make, to your estimand.

What are you left with? How many, you know, patients you chopped out because they, they might have you know done something to your positivity assumption.

[00:12:16] Alexander: I think positivity assumption is sounds kind of easy. Yeah. Well, but it’s actually not. I’ve seen, for example in an oncology study where certain patients had contraindications. Exactly. Treatments. Yeah. Exactly. And of course it doesn’t really make a lot of sense to compare, to, to include these patients in your comparison because they’ll never get the other treatment. Yeah. Exactly. But it could also be what you mentioned in real life HTA things. Yeah, that maybe all patients first need to step through one treatment and then go to the other treatment.

Yeah, and then, of course, nobody that hasn’t gone through the first treatment can get the second treatment. Yeah. And so this kind of understanding in the real world is really, really important. So checking on the data is important, but also understanding what are the local guidelines, what are the typical kind of problems, what are the standards of care is really, really important. And you don’t have that in clinical trials because there you just tell how to treat patients.

[00:13:37] Artemis: You design the perfect experiment and you mandate everything through your protocol, right?

[00:13:43] Alexander: Yeah, yeah, yeah. About the exchangeability that is, if I understand it correctly, on the treatment side. Yeah, so basically, it’s that the, all the patients that you, well in your database, they get treatment X all get the same treatment X, isn’t it?

[00:14:01] Artemis: The exchangeability, as I understand it is more about the measured confounding. The same treatment, I think it’s about the consistency assumption. So one thing that often I see that we neglect is that, oh, we applied propensity score methods. Yes, but on your measured confounders. What about unmeasured confounders?

It’s a fundamental assumption. And this year, if I’m not mistaken, there was from Duke Markley, some initiative together with FDA on negative controls on trying to leverage epidemiological tools in order to You know check do I have a measured confounding and to detect whether, you know, I applied the propensity score is my inference robust and how sensitive it is to unmeasured confounding.

And I found that fascinating. So these tools are there for decades, but I personally had never heard about native controls before, but I think there is an amazing opportunity on, you know, the wealth of real world data to also investigate those negative controls in order to enhance the inference about the propensity score methods.

Because imagine you go to a health authority and you say, I applied propensity score methods. But on the top of that, I’ve also adopted negative controls to show you that, you know, my methods/results are robust or they are not. I think we really oweto be transparent and rigid in such evaluations when we try to infer causal treatment effects from such data.

[00:15:57] Alexander: Completely agree. And The estimands framework really nicely kind of lays it out. Yeah, that basically you have the same estimate then, but you do sensitivity analysis here. Exactly. Because you check on your assumptions and you put in different assumptions and then you can see how robust they are, you know, how much kind of unmeasured confounding do you need to have to basically Yeah.

Yeah. Invalidate the results and come to something completely different from a conclusion point of view. And then you can think about, yeah, is that sensible or is that pretty unlikely to happen? Awesome. Very good. If you, if let’s go through the ICH framework, yeah, step by step, we have the treatment, we have the population, we have the variable, we have population level summary and handling of intercurrent events.

So, we already talked that treatment. There’s surely a difference between clinical trials and observational studies, yeah? And that, you know, that could be varied between the different countries things like this. What else could happen in observational research that usually doesn’t happen in clinical trials in terms of Managing treatment and the assessment framework.

[00:17:28] Artemis: I think also the, if I may say the treatment history or treatment patterns in the clinical trial, you know, you highly select the population exclude specific, Perhaps pre treatment history and stuff in the real world.

No, although you can go in secondary data and try to apply the same. Eligibility, inclusion, exclusion criteria of trials, but then with treatment also, and that, I think it overlaps and I will talk about this more when we go into the intercurrent events, then you get, you know, all the issues of adherence, compliance, switching treatment. But if you prefer, we can talk about that more when you asked me about the intercurrent events.

[00:18:22] Alexander: And these kind of two things are closely related, especially in observational research. I think in clinical trials, you usually have providing your protocol, what to do, what kind of the different steps in terms of changing treatment.

But in real life, Lots of different things can happen. Yeah. So you just see the variety of different treatment patterns. Yeah, much better, much bigger. And if you kind of think about that you want to do estimands based on these. Real treatment patterns. Yeah. And then you have for sure an observational setting in clinical trials.

So, you know, you really have this kind of problem with propensity scoring and all these kinds of different things. So that’s what I mean by observational. Of course, it’s still a clinical trial, but Then you have these problems, yeah, and clustering even kind of, you know, different treatment patterns together is the topic, yeah, so it’s kind of, okay, for patients that don’t know, let’s say, switch from treatment A to treatment B within a certain time period, yeah, where maybe in a clinical trial you only allow that at a certain time point.

Or if a certain criteria is met, yeah, all these different things are different in real life. And so it’s just see, define what treatment is. When it’s not just treatment policy, it’s really, really much more difficult in clinical trials, in observational research, in clinical trials. In terms of the population, you also mentioned kind of pre treatment is, is surely one of the topics. It’s kind of generally observational studies have much more heterogeneous patient population. Compared to clinical trials, usually. Where else do you see problems in terms of differences between complex?

[00:20:27] Artemis: I’ve seen a very nice paper specifically on this topic on estimands on trying to extend what we discussed on observational studies, keeping everything else steady. The intercurrent events, the treatment, all the four other attributes, just talking about the population. And it talks about that in the population element for observational studies then inherently, you have also the covariates, because in observational studies, because of what we talked, the lack of randomization, based on these baseline covariates you are trying to achieve the balance between the treatment arms; and that in observational studies specifically for estimands what differs from clinical trials arethese average quantities which are not equal, the average treatment effect, the average treatment effect among the treated, the average treatment effect among the untreated;in observational studies these are not the same quantities while in clinical trials, you wouldn’t be worried about the inequality of these quantities.

And then different propensity score methods, they are, how to say, they are coupled with these estimands, like pair matching can only give you average treatment effect among the treated. Full matching can give you both. And sometimes, I’ve even seen , when you apply Caliper on the propensity scores you lose that quantity of estimand because you put that value to achieve the balance.

And sometimes I’ve seen people going and comparing these methods, but you are not targeting the same estimands. And also, those estimands are coupled with different research questions. So if a question concerns a treatment policy intended to apply to all qualifying patients, then your target population should be the whole indicated patient population.

And this estimand should be the average treatment effect. If the question concerns a policy of withholding a treatment among those currently receiving treatment, or not receiving it, the estimate should be the average treatment effect among the treated, or among the untreated. And then it’s also what kind of data you have at hand.

If you have a product registry, then you’re confined only to the average treatment effect among the treated So, those elements, I think they are, you know, one can classify them under the population attributes of the ICH estimand, and these are the fundamental differences.

[00:23:20] Alexander: Let’s go to the third one. It’s a variable. Yeah. The end point that we look into. So the first thing that comes to my mind is that very often we don’t even observe, you know, these, these highly specialized end points in clinical in observational research that we have in clinical trials. Yeah.

Because you need a lot of specific. Interventions, or you need as educated physicians that can actually collect these data and these kind of things. Or you may have, you know, a claims database where you don’t collect these kind of things at all. So that’s the first thing that comes to my mind. What, what other things do you see are difference between clinical trials and observational research? With respect to the endpoints

[00:24:14] Artemis: The choice of endpoint in observational research does not just depend only on the research question, but also the available real world data sources. And within those, you need to understand how those measures of variables that they list, they were derived you know, for instance through records of the physicians.

Did they apply some machine learning? How, how did they extract these data? And for instance former colleagues, in Roche, they I’ve seen yesterday, they have a very nice website and even R code on calculating real world PFS progression, free survival because that can vary by cancer indication by treatment by , you name it. And so they have a very nice case where they studied such an end point between the clinical trial and between real world data, the Flatiron, which are amazing real world data; and they found that for chemotherapy, the survival was pretty much the same, but for immunotherapy, it was not. And there are so many things for one to consider over there and what you were talking about before the time window and what sensitivity analysis you do. And they have very nice guidelines for how to go with experts about what is already known. So, basically, they say that the mechanism of data collection differs between real world data and clinical trials afor instance in the case of the PFS, progression free survival,In clinical trials, tumor assessments are scheduled at regular intervals; progression is assessed using standardized criteria but then in routine health care, there is no protocol and no regularly scheduled tumor assessments. And progression may or may not be assessed following standard criteria. And Yeah this is why, for instance, we talk about real world PFS and the clinical trial PFS, and they recommend more studies of trying to acquire more empirical evidence per cancer indication, per treatment.

To make sure that you know, let’s say that you’re interested in an external control of PFS are your endpoints similar. And then from my previous experience, having worked a lot on measurement error issues and patient reported outcomes. There I can also see a lot of issues, like how did you collect or did you record the quality of life?

Did you use the same questionnaire for all patients? And even if you use the same questionnaire, did that same questionnaire have the same number of questions? Because in some questionnaires you have short or long versions ofthe scale. So if you look in different real world data sources, how do you pull all of these?

And there are methods where one could look into harmonization of such measures, but in my experience in three years in the industry, I haven’t seen such techniques to be honest, to be implemented or even discussed.

[00:27:45] Alexander: This one thing that you just mentioned that I think I’ve never seen, you know, more elaborated on is the, it’s a variability.

Yeah. It’s a variability of the of the endpoint itself. Yeah. That is kind of can be much more harmonized and structured in clinical trials. Because training and all these kind of different things that’s a variability, you know, can be much bigger and observational studies. Yeah. Lack of training or because it’s not exactly all at the same time point or because, you know, the patient population is different.

There’s so many different factors that can influence that they, you know, inherent variability within your endpoint. Yeah. It’s very, very different. And therefore, of course, That has an impact on how likely it is that you’ll see treatment differences and all those kinds of different things. Yeah. Also, if you think about any kind of standardized measures standardized by deviations.

Yeah. Then, of course, that has an impact as well. By the way, probably the same is true for measurement errors in the covariates. That is yet another completely different topic. Yeah. Yes. Which could be different in clinical trials to observational data. There’s one thing that, however, I think is really good about observational data is that In observational data, of course, you have also these endpoints that people will see in clinical practice.

Yeah. So at least if you have some kind of, you know, medical data in it as well, yeah, not just the claims data, but you can see kind of patient reported outcomes there. You can see endpoints like how long do they stay on treatment if that is of interest in your indication. Yeah. Especially for chronic indications, that is of a good.

Good working treatment. If patients stay on it, it works. If they don’t, it doesn’t for them. Yeah. Yeah. So these kind of things can be amazingly nice. For clinical trial for, for observational research and maybe much harder to actually study in clinical trials, because these are studies, yeah, the treatment is provided.

It’s not, you know, coming through the reimbursement criteria and these kind of things. There’s no, usually no copay by patients and all these kind of other things. That happened in real life. And so there are some areas where clinical trials, of course, are definitely better. Yeah. But there’s also some areas where observational research has its advantages with respect to variables.

The next topic in terms of the ICH framework is population level summary. Do you see any differences there between clinical trials and observational research? Okay.

[00:30:56] Artemis: Yes, because yes, I do. In clinical trialls again, because of the randomization, the results bear statistical interpretation and simpler statistical methods might often be required to estimate the estimand, but in real world data, we need causal methods often to account for issues with non randomization.

And we can only interpret causal results if the causal assumptions hold. So I think t that per se creates what we were talking before. We have a duty to, you know, verify the inherent assumptions of the causal methods that one might use. And then another thing very, very simple, just mathematical fact , the non collapsibility might complicate many important areas. In clinical trials when you have, you know, a logistic regression model or a Cox regression model,

log odds ratios or hazard ratios if you adjust for covariates in in clinical trials there are issues with the standard errors when different baseline covariate adjustment sets have been used. But if you do that in observational studies it’s not only about the precision adjusting for covariates.

It’s also the problem of confounding , and so that comes in terms of estimands do I aim for a marginal estimand or a conditional estimand? Right. Those are different things. With marginal estimands, we would have everything that we talked before, ATE, ATT, ATU, and so on. I’m not going to go over there.

But if you are in the land of non collapsible effects with oods ratios or hazard ratios, it’s not the same. I run a Cox regression model and I adjust for covariate, or I run a weighted cox regression model with the weight as the propensity scores. ’one approach is giving me conditional effects and the other is giving me marginal effects.

[00:33:11] Alexander: That is really, really good. Yeah. You talk about odds ratios or hazard ratios in both cases. But really, it’s different. Yes.

[00:33:23] Artemis: Yeah. And I’ve seen many times observational study applying anadjusted Cox regression model, and then comparing that with the clinical trial results. You know, you can’t, because normally clinical trial gives you a marginal effect, let alone, you know, what population, what the inclusion exclusion criteria, let’s not go there. But yeah.

[00:33:47] Alexander: This is brilliant. Yeah. Yeah. And lastly, of course, we have the different intercurrent events. And we talked already about lots of lots of different things there. What haven’t we covered yet?

[00:34:01] Artemis: Yes, I think we haven’t covered that in clinical trials and observational research what would be the samefor intercurrent events comes in the lack of efficacy or the safety.

But then in the observational research there are the personal factors; you know, the relationship that the patient has with his physician or a family member I don’t know, influenced him to get this treatment that. you probably wouldn’t have in a clinical trial, or at least in the clinical trial, perhaps in the protocol, you would have recorded it.

And then also events related to non behavioral factors, like change of health insurance program, relocation to another place where the current therapy is unavailable development of new conditions that contradict the use of the current therapy. Participation in a clinical trial that requires discontinuation of the current therapy.

[00:35:06] Alexander: A lot of additional things to have in mind when you write your intercurrent events. Yeah, yeah. Awesome. Thanks so much for going through all these kind of different things. And it is super helpful to understand what we can learn about the estimand framework from observational studies from clinical trials, and how they actually Get together.

And it was really, really interesting to understand kind of how treatment population variable population level summary and intercurrent events all differ for clinical trials to observational data. Where do you see the future? Of real world data research going as a result of this estimands framework?

[00:35:58] Artemis: I think as we discussed with real world data, we have to take into account many additional considerations in relation to estimands. We also have to understand the stakeholders and their questions of interest and then accordingly evaluate how fit for purpose the real world data that we have at hand or we can find or we can seek are coupled with all these issues. And definitely articulating each of these five attributes can help identify and define appropriate estimands and we should definitely not miss other frameworks that they’re closely related to like the causal inference and the target trial framework.

And yeah basically to align the analytical methods we selected to real world evidence studies estimands. And definitely apply rigorous sensitivity analysis to ensure the robustness of the study findings.

[00:37:05] Alexander: Thanks so much. That was an awesome outlook and summary for this podcast episode. Thanks for being on the show, Artemis.

[00:37:15] Artemis: Thank you so much. Thank you, Alexander. It was a pleasure and honor to be here and discuss these things with you.