Minimizing “missingness” in cancer mortality statistics with real-world data

When it comes to analyzing patterns in clinical oncology care, sometimes what’s missing is just as important as what’s present. 

Researchers need complete and comprehensive datasets to generate an accurate portrait of how patients interact with different therapies and treatment options, especially when trying to use real-world data (RWD) to compare the impact of new approaches to existing standards of care.

When key data elements are missing, it indicates a problem somewhere in the pathway between clinic visit and data curation.  Did a provider fail to capture the right documentation?  Is there an issue with how technical standards are being used while transferring datasets from one system to another?  Are there gaps in the methodology during the data curation process?

Understanding why, where, when, and how data elements go missing is a crucial part of working successfully with partners to improve the RWD lifecycle. 

But life-changing cancer research can’t wait for every variable to fall perfectly in line across the entire data ecosystem – and it doesn’t have to. 

Instead, researchers can address “missingness” from a different angle: finding reliable and trustworthy proxies, alternatives, and supplements for missing elements that can offer equally (or more) powerful insights into high-priority areas of concern.

To demonstrate the validity of this strategy, researchers from COTA and the University of Texas Southwestern Medical Center explored the best method for identifying mortality data in more than 20,000 patients with certain hematological cancers.  Accurate mortality data is extremely important for cancer research and is fundamental for assessing the clinical benefit of therapeutic agents.

The research, published in the Journal of Clinical Oncology, looked at how a composite mortality variable, generated from combining EHR data and obituary data sources, compared to using either of these data sources alone. They also investigated how the composite variable performed against gold-standard information from the National Death Index.

The team found that the use of a composite mortality variable actually improved the capture of death data as compared to either structured EHR data or obituary data sources independently. The composite RWD variable also performed strongly in comparison to the National Death Index in terms of assessed validation metrics.

These results indicate that using composite variables, compiled from available RWD sources, can be an effective way to gain additional clarity into endpoints that support more robust and accurate clinical research, and can help to bridge unintentional gaps in raw datasets. Furthermore, inaccurate mortality data may cause unnecessary burden for hospital systems through erroneous patient follow up and other operational inefficiencies.

By taking a creative approach to developing composite measures that are stronger than any single element alone, research teams can access valuable insights into important areas of concern despite the fact that there might be some degree of missingness in their original source data. 

As RWD becomes more deeply integrated into the clinical research process, it will become increasingly important to establish and share this type of evidence into how to best leverage available data to support meaningful breakthroughs in cancer research