How to identify and address data biases when creating an ECA

In this blog, we’ll explore the most common biases encountered in ECAs and how to mitigate their impact on research results.

Datasets used for an ECA are a reflection of the people and circumstances in which they were created. And since people and circumstances are rarely perfect, large datasets typically include biases that could skew the outcomes of research in one direction or another.

In the context of a clinical trial, where we are interested in assessing the differences in outcomes based on specific exposures, we may consider the results biased if the data used lacks internal validity or proper control of underlying factors that could influence outcomes. Naturally, minimizing bias is critical.

Ideally, researchers conducting a traditional randomized clinical trial (RCT) will eliminate as much bias at the design stage via randomization, blinding, and transparent data collection. But in an ECA, blinding and randomization isn’t possible. That creates new challenges for addressing biases in both the design and analysis phases.

The five types of bias in ECA datasets

ECA biases can be divided into five main buckets:

  1. Selection Bias
  2. Misclassification or information bias
  3. Confounding bias
  4. Immortal time bias
  5. Temporal bias

Let’s take a closer look at each.

Selection bias arises from selecting patients based on characteristics that are related either to the exposure or the outcome, creating systematic differences between the groups chosen for different interventions or outcomes analysis.

When creating an ECA, we are most interested in identifying a set of patients that are generalizable to the patients in the investigational arm. We do this by applying the same set of inclusion and exclusion criteria to both groups to create a fit-for-purpose (FFP) database. If the data is incomplete or significantly different than the investigational arm in some way, selection bias can creep in.

Sometimes, minor imbalances can be handled by propensity score matching and inverse probability weighting at the analysis stage. But researchers should strive to develop the best possible dataset from the beginning to avoid potential complications.

Misclassification or information bias occurs due to differences in capture of key covariates and outcomes in the clinical trial and ECA. At the design stage, researchers need to pay attention to capturing the outcomes and covariates similar or dissimilar between the trial and ECA patients. The best way to avoid this bias is at the design stage by ensuring similarity in the capture and quality of the information in the two sources of data. At the analysis stage, researchers can conduct sensitivity analyses to quantify the bias and its impact on the treatment estimates.

Confounding bias is a major concern for ECAs. This type of bias refers to an impact in the estimated effect of exposure on outcome because of the influence of other variables that are associated with both the exposure and outcome.

For example, one of the arms may include a disproportionate number of patients with higher disease severity, leading to erroneous conclusions about the benefits (or lack thereof) of the intervention. This is somewhat different from selection bias. In selection bias, patients with certain features are actively excluded from analysis to begin with, which affects the external validity of the data. With confounding bias, these patients are included in the analysis, which can compromise the internal validity of the results.

Researchers can compensate for confounding bias at the design stage with matching and population restriction methods. At the analysis stage, they can use matching, weighting, stratification, and regression to adjust for its effect and reduce the impact of such biases.

Immortal time bias happens when the initial timepoint for assessing outcomes is different in the ECA and the investigational arm, thereby creating an advantage in one of the arms.

This bias can be best avoided at the design stage by carefully considering the start date for the follow-up period after randomizing a patient to a treatment. However, challenges arise when no treatment is the standard of care. Possible solutions include using a discrete identifiable clinical event, like date of surgery or radiation if treatment is initiated soon after, or date of myocardial infarction or stroke.

Temporal bias arises when there are historical differences in practice and collection of real-world data (RWD) as compared to the current times, such as when comparing a study conducted several decades ago with a modern investigation. This bias can be easily eliminated by using a contemporaneous RWD source and demonstrating that the standard of care remains similar. If the treatment landscape has changed over time, then appropriate data cut-offs can be used for the RWD.

Getting ahead of biases in ECA design and analysis

Biases are by no means limited to ECAs and can also occur in traditional RCTs. But ECAs often face additional challenges when collecting and curating real-world data that is appropriate for use, especially if values are missing at the source. Incomplete collection of data and missing values can create a multi-bias situation, which in selection bias, misclassification bias, and confounding bias all have the potential to create less-than-ideal results across the trial.

For example, Omburtamab is a drug that was recently discussed at the ODAC in October 2022. A single arm trial conducted in the US was compared with an ECA constructed with patients from a German registry. Many of the biases discussed above, including immortal time bias and selection bias, were cited as reasons for not taking a positive action during the meeting.

Researchers can avoid a similar fate by carefully choosing their RWD at the design stage based on the completeness of the elements that are directly related to the inclusion and exclusion criteria at hand. When gaps are unavoidable, researchers can use a variety of sensitivity analyses to compensate. Additionally, teams should ensure appropriate firewalls between the data investigation and feasibility teams and the analysts that conduct the outcome analysis.