RWE and the FDA (Part 1)

Interview with Josephine Wolfram

  • What is Real World Evidence (RWE) about?
  • What is the perspective of the FDA on real world data and evidence?
  • The FDA guidance contains 3 big topics – what are these?

RWE is one of the most hot topics in the industry and the way the FDA views this data source will shape the industry. 

Listen to Josephine Wolfram, an expert in RWEs, and me, and learn more about the way the FDA sees the value of RWEs and how this factor affects the overall procedures considered when approving new medicines and medical procedures that are offered for patients to consider.

Here are some key takeaways to look forward to when listening to this podcast:

  • What is RWD from an FDA perspective?

    • FDA defines RWD as “data relating to patient health status and/or the delivery of health care routinely collected from a variety of sources”. They give the example of patients electronic health records, medical insurance claims data as well as data from product and disease registries, patient-generated data including from in-home use, and data gathered from other sources that can inform on health status, such as digital health technologies.

    • The likelihood here is that the data isn’t being collected primarily for the purpose that you are going to use it for, you’ll be making what’s called secondary use of that data. Some might also see data collected for primary use as RWD as long as you don’t intervene in the patient’s care such as randomising a treatment. This guideline is looking specifically at data from electronic health records – so those notes the doctor makes when you visit them, as well as claims data, which is those data that are captured through the process of payment by the health insurer (commercial providers or Medicare or Medicaid). These insurance claims also called administration claims data include the diagnoses, treatments and the costs billed and paid.

  • What is RWE from an FDA perspective?

    • FDA defines RWE as “the clinical evidence about the usage and potential benefits or risks of a medical product derived from analysis of RWD.” I think this is a pretty clear definition, and it’s useful to distinguish clearly the data itself from the evidence that can be derived thereof.

  • What are the 3 big topics of this guidance?

    • Indeed the guidance is organised into 3 major themes, Data Sources, Study Design Elements and Data Quality.

    • Data Sources discusses the general nature of claims and EHR data and the limitations that come with that, and discusses important elements you must consider in deciding if they will meet the needs of your study.

    • Study Design Elements talks about the ascertainment and validation of the key building blocks for your study – it’s not really talking about statistical design though, like what methods should you use to compare to a control group, but In the future, FDA have said they intend to issue guidance on designing studies that use RWD.

    • And finally Data Quality is about what it says on the tin, provides points for consideration when examining the quality of data over the course of the data life cycle, and what information are FDA wanting to be available to them to enable their evaluation of the data.

  • What is the reliability and relevance of RWD?

    • reliability and relevance are key terms that are used repeatedly through the guidance and FDA provide definitions for these.

    • Reliability of RWD includes data accuracy, completeness, provenance and traceability. All terms that in turn have definitions provided! This is getting at the quality of the data and the ability to demonstrate the quality.

    • The term relevance includes the availability of key data elements and sufficient numbers of representative patients for the study. Those key data elements are information on the exposure, typically a drug, that you want to study, information on the outcomes you are interested in and also information on important covariates – particularly confounders that might influence the choice of treatment and the outcome as these should be accounted for. And then even if you have those key data elements then if there’s not enough patients in there that fit your trget population then that data source won’t be relevant for your study.

  • The guideline speaks about 4 issues, which need to be addressed to determine reliability and relevance. What are these?


  1. The appropriateness and potential limitations of the data source for the study question and to support key study elements.

  2. Time periods for ascertainment of study design elements.

  3. Conceptual definitions and operational definitions for study design elements – and these key elements are the inclusion/exclusion criteria to define the study population, the exposure, the outcomes and the covariates, along with validation of these definitions

  4. Quality assurance and quality control procedures for data accrual, curation, and transformation into the final study-specific dataset.

There’s more to come in the second part of the episode.

Listen to this episode and share this with your friends and colleagues!


Alexander: You’re listening to The Effective Statistician podcast, a weekly podcast with Alexander Schacht and Benjamin Piske, designed to help you reach your potential, lead great sciences and serve patients without becoming overwhelmed by work. Today, we are talking about real-world evidence and the viewpoint of the FDA on it. Stay tuned for a really really good discussion with Josephine Wolfram and by the way, this is part 1 of it.

There is hardly any other topic that is so hot as real-world evidence. It goes into, so so many perspectives, and of course combining this hot topic with the viewpoint of the FDA on it makes it super hot. So I’m really glad to have Josie on the call today. She’s an expert in real-world evidence and has worked on it quite a lot and we are talking about what the FDA currently thinks about it. What is currently out there and what is currently hot in that regard? So stay tuned for this really really good episode. By the way, we have so much content that we split it into two episodes. So this is part 1, part 2 you know, you need to wait a little bit or maybe by the time you listen to this the second part has also been published, then just scroll a little bit in your podcast player to find part 2 as well. Otherwise, you just need to wait a little bit for this. 

I’m producing this podcast in association with PSI, a community dedicated to leading and promoting the use of statistics within the healthcare industry for the benefit of patients. And PSI is doing a lot in terms of real-world evidence. There is, for example, the real-world data special interest group so if you want to learn more about this, head over to and you will find there’s this special interest group, as well as other special interest groups that might be of interest for you and now let’s dive into the content. 

Welcome to another episode of The Effective Statistician and today I have Josie with me and we will talk about the FDA and about real-world data which is a really, really hot topic. I recently had an episode with Rachel Tom and we published it first of March within two weeks, we had over 1,000 downloads, which is amazing given that very often episodes get to kind of 500- 600 after a couple of months. So you can see it’s a really, really hot topic. So I’m really happy to have Josie with me today. How are you doing? 

Josie: I’m doing fine. Thank you. And you? 

Alexander: Very, very good. Maybe you can start a little bit with an introduction of yourself. How did you get into the industry and how especially did you get into real-world data? 

Josie: Yes.  Hi, I’m Josephine Wolfram, I work for Astellas Pharma and in my current role, I lead a small team focused on leveraging real-world data to support development projects, across various use cases. I made this leap to focusing on real-world evidence three to four years ago. Given that I joined the industry 25 years ago, I guess it’s a relatively sort of short part of my career. Previously I was focused on providing statistical support to drug development and also marketed products with a large part of that focus being on clinical trials. However, over the years, I did get the chance to sort of experience what real-world data can offer in a few sort of interesting ways. And most particularly, I was involved in a couple of post-approval safety studies which I found fascinating and as it opened up this sort of whole new world of different methods and different sorts of opportunities that leveraging real-world data can provide. So that kind of picked my interest and then in a company we had the opportunity to change departments and I was excited to do that and that is how I look back.

Alexander: That’s cool. So you’re from an organizational structure, and you don’t report to the statistics department anymore but to the real-world evidence?

Josie: Yes, that’s why I sit within the real-world data and evidence group, which for our organizations sits within a division called Advanced Informatics and Analytics. But I still work quite closely with my statistical colleagues and I think having made that move helps build a bridge there as well between these two different groups since we have a lot in common and as important to work together. 

Alexander: Yeah. And statistics is probably a more operational R&D type of space, not so much in this advanced analytic space, is it? 

Josie: It sits within the development organization. I hesitate to say, in operational organizations, are recognized for their strategic contributions. 

Alexander: Yeah, I know it’s a pity that very often statistics organizations are cornered, so to say, in this part of bigger Pharma organizations and it’s not unusual that you see these kinds of splits in terms of responsibilities and the more even important that is to work closely together and break down any silos that there within different companies. 

Josie: Fully agree and I think there are so many topics, there’s always this tension between specialization and having generalists. And actually, maybe before we continue, please let me share my disclaimer, actually, having introduced the company I work for.

Alexander: Yes, for sure. 

Josie: So, I must say that the opinions expressed in this podcast are solely those of myself and not necessarily those of Astellas. And Astellas does not go into the accuracy or reliability of the information provided here. 

Alexander: Okay. Let’s briefly talk about a really cool thing that you’re actually cool and that is real world data special interest group. How did that come to place? 

Josie: Right. So actually, that’s the role I have that’s led to us having this conversation today. So together with my story, I’m co-leading the PSI real world data SIG which started I think about this time last year. I’m not sure I can completely answer how it came into place because you know, I think I saw that it was being formed by Annie who I also knew from being a colleague in the past and was leading it and I reached out and joined the group. And then started to co-lead with her a little bit further down the line. 

Alexander: That is so cool. It’s exactly this type of community that makes PSI so great and it helps us to work effectively within PSI. We know each other and then when there’s some kind of common interest, you can form these platforms to work together. And so if you’re interested in this space, just call Josie or send an email and maybe kind of want to join this group with these lots of very interesting discussions. There are also ongoing meetings as contributions to conferences, webinar setup and then things like this. But you can also learn so much about real-world data in these kinds of things. So you don’t necessarily need to be a super expert with 20 years of experience to actually join it. If you have passion for it or have an interest in it, you want to contribute, you want to move things forward, then that’s a great place to start. So just a little bit of promotion about special interest groups, let’s talk about real-world data and especially today we want to focus on the FDA perspective on it, which I think is a great thing having real-world data and FDA in one title is usually the guarantee for having lots of lots of attention. So let’s start first with when the FDA talks about real-world data, what actually do they talk about?

Josie: Yes. Well, the FDA has a definition of real-world data that they mention in their real-world evidence framework document and recently released draft guidance. So the FDA definition real-world of data is and I’ll quote, “Data relating to patient health status under the delivery of healthcare routinely collected from a variety of sources”.  They give examples of what those sources can be: a patient’s electronic health records, medical insurance claims data, as well as data coming from product and disease registries, it can be patient-generated data, including from in-home use, or data gathered from other sources that can inform on health status such as digital health technologies. So I think the likelihood here is that the data isn’t being collected primarily for the purpose that you’re going to use it for, but you’re making what’s called secondary use of that data. 

Alexander: It’s not necessarily that way, isn’t it? So its data relating to patient health status is actually a pretty broad thing, isn’t that? So, how would that, for example, be patient-reported outcomes? Is that part of real-world data? Potentially isn’t it?

Josie: I guess potentially. Yes, it is. I focused here perhaps on data reuse but some would also see data collected from the primary uses of real-world data as long as you’re not sort of intervening in the patient’s care such as randomizing a treatment or let’s say, taking that care away from what would happen to them in kind of just under real-world treatment if you weren’t enrolling them in an experiment. 

Alexander: Yeah.

Josie: But I think definitions of real-world data vary according to those making them.


Alexander: Yes. Just quite interesting things if you, for example, look at the IFSO guidelines, they make more of a definition based on how you collect data and not so much on what data you collect. And here the FDA, I think more focuses on what, not how, and how comes a little bit later, doesn’t it? From the FDA perspective, they mention that it’s likely a kind of secondary, but it’s not necessarily only, isn’t it? How would you read it? 

Josie: I think you’re right, that it’s not necessarily a sort of secondary use. I think, when we look across the set of guidelines, there’s a sort of suggestion that at least part of your data probably is secondary use. But you make a compliment that way since with additional data that you can collect and also a primarily to compliment. 

Alexander: Yeah. And then the next step is, how do we get from real-world data to real-world evidence?

Josie: The definition for real-world evidence is the clinical evidence about the usage and potential benefits or risks of medical products derived from analysis of real-world data. I think that you know a fairly clear definition and it’s useful to distinguish clearly between the data itself and the evidence that you derive there often from those data. That the definitions of the two are very closely intertwined that way. 

Alexander: Yeah. It’s kind of like I’ve once seen something like real-world data plus the right statistical tools leads to real-world evidence. Do you agree with that kind of equation or is there anything else in it? Will data plus statistical tools equal real-world evidence?

Josie: I agree with that, but I think you probably have to add the right sort of study design to it. So you know your sort of research question and objectives should come first to derive whether you’ve selected the real-world data and statistical analysis to derive your evidence. 

Alexander: Yep, that’s a good point. The framework, the strength of the data, and all these kinds of things play a role in the real-world evidence because well, even here with sophisticated methods, garbage in, garbage out. Yes, that’s great. I think the other is about the usage and potential benefits or risks of medical products. So it’s about three parts. You’re not only about the benefit-risk ratio but also how it’s used. When you see this word usage, what kind of questions come to your mind that this could be about? 

Josie: I think that you know the usage of a product in practice can be related to perhaps what the intended usage was. If we’re talking about something that’s already marketed, is it being sort of used in accordance with the label and just sort of understanding the nature of the population that’s really receiving a given treatment? How long are they receiving it in full? What doses are they receiving? I think that sort of the whole picture about the actual usage of treatments when they were on the market, the real-world data is very valuable. 

Alexander: I think this is really, really fascinating because that gives you data that you don’t get from clinical trials, that you don’t get from lots of other things and that can help you understand, for example, maybe safety events that you see, that may be in real life, much higher doses I use, or much higher frequencies of doses or that maybe patients treated that have an additional kind of comorbidities or as a complementary therapy and all kind of different things. And that may lead mainly to safety events that you haven’t seen before. And so I think that is really, really important to have this data. Awesome, let’s go a little bit more into the guidance itself. I’d say there are three big topics in this guidance. What are these about? 

Josie: Yes. Well, they have this set of four guidance documents that were released in four consecutive months last year between September and December. So here we’re talking about that first one on using electronic health records and medical claims to support regulatory decision-making for jobs or biologics. Actually, this is the largest and I think the most comprehensive of the set of guidance recently released and they organized that into three major themes as you say and these were data sources, study design elements, and data quality. So that sort of first big theme on data sources discusses the general nature of claims and electronic health records. Let me just say, EHR, we shall probably mix with EMR and data. Also, the limitations that come with those data types and they talk about what are the important elements you must consider in deciding if a given data source will meet the needs of your study. Then in terms of the study design elements, that’s really about the ascertainment and the validation of the key building blocks for your study. 

Alexander: So that is the real-world evidence study, so to say.

Josie: Yeah. The real-world evidence study. Though I think here they don’t talk so much about a sort of study design from a statistical perspective. They’ve indicated that there’s likely to be further guidance in that space. It’s more about the definitions of the critical concept like your exposure definition, your covariates, your population in your outcomes, and sort of copper properties around those definitions and how the data can support those.

Alexander: Yeah, that is really good. In terms of these three aspects, I think covariates and outcomes, and points, that’s a pretty clear thing and we use that from clinical trials. Exposure, is a very, very different beast if you look into real data, isn’t it? 

Josie: Well, you need to know that the exposure you’re interested in studying can be reliably ascertained from your data source. So you know, say you’re using medical records data and you don’t see evidence that they received a certain treatment in that data source, are you confident they didn’t receive the treatment? An obvious example might be a drug that’s available over the counter, maybe you’re going to miss classifying them as sort of not exposed to your treatment when actually they were. And then you are able to make sure you connect the indication that they received the exposure fully within your data source. That’s a changing topic a little bit more to your inclusion-exclusion criteria in that aspect. Whether you know the dose, maybe they were prescribed the treatment, do you have evidence that they actually filled that prescription and took it? That’s not a unique problem actually, in clinical trials, I guess you also have the issue of today. 

Alexander: There’s a pick on the topic. That’s really interesting. I think when you look into these kinds of aspects don’t just kind of glance over them, it’s clear, clear, clear, it’s actually not. There’s much more kind of salt to be put in than in a clinical trial where we just write things into your protocol and so it’s collected that way, it’s usually not. 

Josie: Yes absolutely. And in this guidance document, FDA really put a lot of emphasis on defining your concept definition, your case definition for full exposure, for outcomes, and then also, how are you going to sort of validating that your operational definition is really pulling out the concept that you wanted to on. They’re really sort of seeking information on what you’ve done to validate that. Have you ideally taken all, if not some random sample of your patients and then compared some kind of alternative data source or source notes versus what you’ve derived for your operational definitions to assess whether you have classified the patients correctly and to quantify any misclassification you have? We’re talking about sensitivity, specificity, positive predictive value, and negative predictive values of these measures that help quantify the extent of classification you may have. 

Alexander: Yeah. Well, you can rely on some other research that was done in maybe in the same field that someone else did already for you and published it and you can just kind of reuse certain things. Yeah, that is really good. What’s the third topic? 

Josie: The third topic is data quality. Well, data quality is the quality of the data that suffices to support your study and they give points for sort of examining data quality really over the course of the data lifecycle, outlining what information the FDA would like to have available to them to enable their evaluation of data quality. So they recommend automated data quality reports be generated and I mean from that from a clinical trial perspective, I think familiar with FDA inspections starting from that data collection point wanting to follow that journey through to your report. And I think that same thinking is applying here in the real-world data space. They want to be able to be assured of that data integrity from source to use in the study. I think here, the complexity comes in the ecosystem because it’s not just the sponsor that collected that data, it’s come through vendors and then sort of chains of identification. 


Alexander: Yeah, it’s not as easy as clinical trials for sure. The other point is that it’s not stable. The clinical trial will finish locked. Real-world evidence is especially when you look into claims data and all these kinds of things, its databases are updated or on a continuous basis. So, that’s another piece that is very, very different from, clinical trials, so having that in mind is important.  Data quality is a really, really big thing always for real-world data. There are two things that are kind of coming up again and again in the guidance, their reliability, and relevance. What is that actually?


Josie: Right. As you say, these terms do appear in many places in the guidance documents. Actually have sort of sub-definitions within them, they have a glossary where they define all these terms, a useful reference. So with the reliability of real-world data, they’re talking about data accuracy, data completeness, data provenance, and data traceability. And each of those terms also in and of themselves have sort of definitions. So reliability is getting the quality of the data and the ability to demonstrate that quality as well. 

Alexander: Okay. And I think here, quality is not kind of the tick box kind of high quality, no quality. It’s kind of probably a more granular assessment in terms of quality, is it?

Josie: I think you’re right. And I think it has to relate to the intended objectives of the study itself. Perhaps you have qualified for one study and perhaps not for the other study according to which parts of the data you’re using. 

Alexander: Yeah. And who can also probably assess the kind of what the likely impact of the quality is. Will it, for example, likely tend to overestimate or underestimate something? Which direction is bias going? These kinds of things, is it? 

Josie: Yeah, absolutely. And I think if you look through some of the comments that instead of publicly shared by various groups that have commented on the guidelines, this question of what they mean by quality sort of comes up and there’s a request to give perhaps more detail on what quality really needs to have you in it and also an encouragement to take a sort of risk-based approach to looking at that. What’s the impact on potential buyers from the study?

Alexander: Yeah. That’s a really good point. So for one aspect that might be good enough for another aspect, it’s not good enough. Yeah, that’s great. What about relevance? I’m not sure I’ve heard that term very often within statistics other than relevant clinically meaningful differences or something like this. What’s the relevance?

Josie: So relevance is referring to the availability of key data elements that are needed for your study as well as sufficient numbers of representative patients. Perhaps you could say that data reliability is almost an attribute more of the data, and then the relevance is where it’s really in how that data source relates to your objectives. So when they talk about the availability of the key data elements, those elements are the exposure, the outcomes, and also information on covariates and most particularly information on confounders. So those variables might influence both the choice of treatment for a patient and their outcome and you need to have those present to be able to adjust for that using appropriate statistical methods to support inferences about the treatment, so that’s sort of relevant in terms of having the data elements. And even if you have those key data elements, if there are not enough patients in there that fit your target population, then that’s a bit like if clinically relevant differences are too small for the number of patients you’re going to be available then. Of course, not going to suppress that. 

Alexander: Yeah. And I think looking into the availability of covariates, this is a really, really important thing because, in clinical trials, we can trust to include some into the protocol and kind of think about them, ‘all the things that could be important, let’s collect that’. But that’s not the case in real-world data, so that’s an important thing. The guidelines speak specifically about four issues that need to be addressed to determine reliability and relevance. Can you speak a little to these ones? 

Josie: Yes. So they speak about the appropriateness and potential limitations of the data source for the study question, and to support those key study elements that we described. They talk about the time periods and ascertainment of the study design element. So when you collect data prospectively, then your sort of time period is clear, visibility of your baseline or whatever that if you’re particularly, if you have retrospective study, then you have to define what those time periods are. 

Alexander: Yeah, that’s a really important thing and in this previous episode with Rachel, we talked quite a lot about the index state and that is not into science. But for sure you need to really deeply understand your data and know kind of what is really baseline, where patients start, and how you can align all the different patients in a certain way. In clinical trials, there automatically aligns with the study design, that’s not the case in real-world data. And especially if you could have looked into diseases that are chronic for a very long time and there go on drag on, off the drag on and all kinds of different things happen that can be quite difficult.

Josie: And particularly for your comparative group. So you may say index data starting your exposure interest or if you can’t compare them to patients that don’t receive that exposure. When didn’t they? 

Alexander: You can’t prepare for when they didn’t start a placebo.


Josie: Right.

Alexander: What is the third point? 

Josie: The third point is around the conceptual definitions and operational definitions for the study design elements. So they use these two terms of conceptual and operational definitions again they used throughout the document. Your conceptual definition is the concept you’re trying to capture, your operational definition is what codes and what vocabulary you are going to look for in that table in your data. 

Alexander: Yeah. It’s kind of the former is more on what would be in the protocol and the latter is more would be in the SAP, so to say. 

Josie: Yeah. And these definitions are needed for the key elements which are again, the inclusion-exclusion criteria that define your story population, exposure, the outcomes, and the covariance of these sorts of critical pieces. 


Alexander: Let’s go to the last of the four issues that the guideline talks about which has to do with quality assurance and quality control. How does that play a role here?

Josie: So this is sort of coming back to the data quality topic that we were talking about just before. So those procedures that are in place, the data occur, the curation and the transformation in the final study specific data set. How the journey that data takes from a patient through to your final analytic data set. 

Alexander: Yeah, so kind of what usually stands by data management in clinical trials.

Josie: Right.

Alexander: And I guess, here you need to have much more kind of statistical input into it because it’s just not that straightforward. It’s much messier. Okay, very good. We already talked quite a lot about all kinds of different things around this guidance. And to be honest, in the outline, we had at least enough material for yet another episode on this topic because it’s such a rich topic. We talked about what actually is real-world data and what real-world evidence is and how that is looked at from an FDA perspective and then compared to the other definitions. We looked at what are the three big topics of this guidance, that is data sources, study design elements, and data quality. And we talked about two key things, reliability, and relevance in the topic. So I’m pretty sure we could move on to talk for another hour about these given that we are already quite long into the episode. Let’s have a cut here, and thanks so much Josie, for this awesome discussion. Maybe any final recommendations that you would give to our reader or our listener if the listener would like to read this. What would be your number one recommendation? Is that starting with the definitions of the background or what would be your guidance for our reader? 

Josie: Wow, that’s a great question. Perhaps it depends on the level of experience. I think, if you’re perhaps quite new to the area, indeed, starting with those definitions is a good place to start. I’m sure some more experienced colleagues but perhaps have a particular interest in the data quality area or in the validation area likely to sort of zoom into those areas. I think the other sort of big area that’s interesting that we didn’t talk about so much is the importance of piece specification, early consultation, and some of these themes. I would also call out to the reader to keep an eye on expectations from that perspective, too. 

Alexander: Yeah. Maybe we can do another episode about this.

Josie: I would be delighted. 

Alexander: Okay, so watch out for more to come about this topic in the future. Thanks so much, Josie. 

Josie: Thank you. 

Alexander: This show was created in association with PSI. Thanks to Reine and Kacey, who helped with the show in the background. And thank you for listening. Reach your potential, lead great sciences, and serve patients. Just be an effective statistician

Never miss an episode!

Join thousends of your peers and subscribe to get our latest updates by email!

Get the shownotes of our podcast episodes plus tips and tricks to increase your impact at work to boost your career!

We won't send you spam. Unsubscribe at any time. Powered by ConvertKit