1. Clinical effectiveness assessment in HTA
1.3. Understanding the differences between outcomes
Once all important outcomes are identified, there may still be several challenges in comparing the effects of a new medicine with the standard of care. Outcomes may be measured in different ways, or two medicines may seem to have similar outcomes until closer inspection shows differences.
In cases where the important identified outcomes are difficult to measure, or have never been measured before, scientists must carefully create a measure that can then be reproduced in a study. For instance, a patient may want to know how a medicine will help them return to work or get out of bed. Scientists may create a numeric pain-rating scale for patients with lower back pain. Where a study measures a change in such a ‘laboratory’ parameter (methodologically reproducible and comparable measurement), this change needs to be transformed into a measure that matters more to patients – such as the ability to return to work.
Some outcomes may seem intuitive but, upon closer examination, may be difficult to interpret. For example, a reduction of the risk of five-year mortality (death within five years) by an average of 50% does not mean that the medicine can prevent premature death in 50% of all cases. It could simply be:
- extending life expectancy from 4.9 to 5.1 years (or worse, 4.99 years to 5.01 years) in some patients, or
Even if differences in measures that are meaningful to patients are observed, these may still be ambiguous to interpret. For example, studies may indicate a new medicine reduced the risk of hospitalisation from infection by 33%. However, this may mean different things. It could mean that:
- The chance of hospitalisation is reduced by 33% relative to the chance of being hospitalised without medicine (this is called a relative reduction in risk). A 33% reduction may seem to be a considerable benefit.
- If the chance of being hospitalised in the absence of the new medicine is 3 out of 1,000, then a 33% reduction reduces this to 2 out of 1,000. This means 1 out of every 1,000 people taking the medicine will benefit. This is called an absolute reduction of risk. A benefit which is for 1 out of 1000 patients may well seem marginal.
A final challenge with orrectly interpreting the differences between a new health technology and the standard of care is the use and misuse of statistical tests. Statistical tests are intended to help researchers know if the differences they have detected are likely to be causally related to taking a medicine or have happened by chance. Often, this is reported in the form of a p-value. However, p-values do not reflect the magnitude (size) of the difference, or whether that difference is clinically meaningful. This means that the smaller the p-value, the higher the probability that the observed effect was not by chance.
Other statistical measures such as confidence intervals are more informative than the p-value, because they also
give some sense of the size of the difference between the new health technology
and the standard of care. Confidence intervals also reflect any uncertainty
about the estimate of the magnitude of difference. For example, a new medicine
may be reported to reduce the chance of having a future heart attack by 33%
(with a 95% confidence interval of 5% to 45%) relative to the current chance of
having a heart attack.