Understanding Sample Size: Lessons from the SOHO Trial

In our latest CRITKNOW episode (SOHO trial: https://youtu.be/S9ClzTV5z9A?si=p2Kvz5oJ0rtVlCWX), we discussed the assumptions underlying the sample size calculation for the study. In this post, we take that conversation a step further and explore the key considerations that influence how researchers determine sample size.

Unless otherwise specified, the discussion below assumes a parallel-arm superiority trial.

The Four Key Components of Sample Size Calculation (Figure)

1. Alpha (Type I Error Rate)

Alpha refers to the probability of detecting a difference when none truly exists—essentially a false positive or incorrect rejection of the null hypothesis.

By convention, most studies use an alpha of 5%.

Implication for sample size:
Lowering the alpha (e.g., to 1%) reduces the risk of false positives but requires a larger sample size.

2. Beta (Type II Error Rate) and Power

Beta represents the probability of failing to detect a difference when one actually exists—a false negative.

Researchers typically choose a beta of 10% or 20%, corresponding to a statistical power of 90% or 80%, respectively.

Implication for sample size:
Lowering beta (i.e., increasing power) increases the required sample size.
For example, a study designed with 90% power will need more participants than one designed with 80% power.

3. Control Arm (Baseline Event Rate)

This refers to the expected rate of the outcome (e.g., mortality) in the control or standard care group. Ideally, this estimate should be informed by prior data from similar populations—either from previous trials or observational studies. In the absence of such data, a pilot phase may help inform this assumption.

Implication for sample size:
If the actual control event rate turns out to be lower than expected, the trial’s power decreases. This is a common and important issue in clinical trials.

4. Expected Treatment Effect

This is the anticipated improvement in outcomes with the intervention compared to the control.

For instance, in the SOHO trial, the investigators assumed a 6% absolute reduction in 28-day mortality with HFNC compared to standard oxygen therapy.

Implication for sample size:
Assuming a larger treatment effect reduces the required sample size. However, overly optimistic assumptions can be problematic:

They may not be realistic or justifiable
They increase the risk that the trial is underpowered if the true effect is smaller

For example, if the true mortality reduction is only 3%, a study powered to detect a 6% difference may fail to identify a clinically meaningful benefit.

Challenges in Critical Care Trials

Sample size estimation becomes particularly challenging in trials involving critically ill patients.

Prior data (https://pubmed.ncbi.nlm.nih.gov/33031148/ and https://pubmed.ncbi.nlm.nih.gov/34353458/) suggests that:

Over 75% of such trials overestimate control arm mortality in their power calculations
The median observed absolute risk difference in mortality is very small (around 0.2%, with an interquartile range from −1.7% to +2.0%)
In an analysis of 100 trials, only 5 demonstrated an absolute risk reduction greater than 10%

In practice, researchers determine sample size based on:

Statistical assumptions (as outlined above)
Feasibility (number of centres, recruitment rates, eligible patients)
Funding constraints

These competing factors often lead to trade-offs, increasing the risk of underpowered studies and potentially contributing to research waste.

Interpreting the SOHO Trial

In light of these considerations, the SOHO investigators’ assumption of a 6% mortality reduction is not inherently unreasonable. However, the plausibility of this assumption depends heavily on the nature of the intervention.

A key question for readers is:
Is it realistic to expect HFNC to reduce mortality by 6%?

Additionally, the trial faced a common challenge—the observed control group mortality was lower than the assumed 18%. This further reduced the study’s statistical power.

Should Mortality Be the Primary Outcome?

This raises a broader and important question:

While mortality is correctly considered as the one of the most important patient-centered outcomes, is it the most appropriate primary outcome for an intervention such as HFNC?

From the initiation of oxygen therapy to death, multiple pathways and factors influence outcomes. A respiratory support intervention may not impact all of these pathways, which complicates the ability to detect mortality benefits.

Take-Home Message

The assumptions underlying sample size calculations are not just technical details—they are central to how we interpret trial results.

Understanding these assumptions helps us critically appraise whether a “negative” trial truly reflects a lack of effect, or simply a lack of power to detect one.

Suggested reading: