Solving for Missingness: Enhancing the Utility of Real-World Data for Oncology Research

Real-world data (RWD) from electronic health records (EHRs) is increasingly recognized as a powerful asset for oncology research. It enables stakeholders across industry and academia to evaluate treatment patterns, assess comparative effectiveness, identify safety signals, and inform regulatory and market access strategies. RWD has supported external control arms, augmented clinical trial enrollment, and generated post-marketing insights that are increasingly acceptable to regulators and payers alike, as described in the FDA’s Real-World Evidence Framework.

Yet the very nature of RWD introduces complexity. Because it reflects care delivered in heterogeneous, uncontrolled environments, the completeness and consistency of data capture can vary significantly. Critical clinical elements are often buried in unstructured physician notes or are inconsistently documented across sites.

The potential variability in real-world documentation creates the need for robust data curation processes and a strong quality management system.At COTA, we address this challenge through a hybrid methodology that combines technology with deep oncology expertise. Our processes includes:

  • Technology-enabled data curation that leverages both available structured data and trained, audited manual abstraction
  • Medical review to validate clinical plausibility and contextualize ambiguous documentation
  • The development of algorithms to help bridge the gap between real-world data availability and the desired application(s), such as external control arms or other regulatory use cases.

One example of necessary algorithm development is the need to identify lines of therapy (LoT) in oncology. In clinical trials, LoT is protocol-defined and prospectively captured. In routine practice, clinicians follow appropriate treatment sequencing based on patient needs, yet documentation often lacks clarity about where a specific regimen fits in the treatment course as defined by a LoT. Dates of administration and prescribed agents may be present in the EHR, but without explicit annotations, algorithms may struggle to infer line progression. COTA’s LoT algorithms are developed internally leveraging clinical expertise and extensive knowledge of our data, and subsequently vetted with external oncologists.

We apply this approach to data curation and algorithm development across multiple oncology research use cases. For example:

COTA’s methodological rigor ensures that the curated RWD is both scientifically credible and analytically reliable. Our flexible data model also allows for customized algorithm development , enabling sponsors and investigators to tailor datasets to specific protocols or regulatory requirements.

As regulatory agencies, life sciences companies, and clinicians continue to embrace RWD as a complement to traditional evidence generation, addressing nuances of real-world documentation with precision and transparency will be essential. At COTA, we are committed to delivering real-world oncology data that is curated with clinical intent and designed to support research that meets the highest standards.