2. SAP Contents

2.3. Analysis Methods

The SAP describes which statistical methods are to be used to analyse the data. The following aspects need to be covered where applicable:

  • Main/primary analysis: to obtain the main clinical trial results on the specified trial endpoint(s).

  • Supportive/sensitivity analyses: analyses on different sets of patients or using different analysis techniques than for the main analysis. These are used to confirm the conclusions of the main analysis.

  • Exploratory analyses: all other analyses e.g. further data exploration.

(Primary) Analysis Set

This section is designed to identify which of the recruited patients are to be included in the different analyses:
The criteria typically relate to when the intended protocol could not be or was not followed. For example, if a patient who did not fulfil eligibility criteria was wrongly included. It’s important to identify these ‘protocol violations’ and to deal with them appropriately in the analysis. This is because they may bias the final results of a trial or impact the power of the final analysis.

Case study: Consider the setting of a clinical trial comparing a new experimental treatment to the standard of care. However, some patients taking the experimental treatment are too sick, because of side effects, to go to the next visit within the allotted time. A possible approach would be to include only patients with complete follow-up (all the visits), so to exclude these patients with incomplete follow-up (missing visits) from the analysis. However, by doing so, one selects a sub-group of patients whom, by definition, will present an artificially positive picture of the treatment under investigation.

One potential solution to this problem is a statistical concept called intention-to-treat (ITT) analysis. ITT analysis includes every randomised patient and will consider that every patient received the treatment assigned by the randomisation. As such, ITT analyses maintain the balance of patients' baseline characteristics between the different trial arms obtained from the randomisation. ‘Protocol deviations’ such as non-compliance to the assigned treatment (schedule, dosing, etc.) are part of daily practice. Therefore, treatment-effect estimates obtained from ITT analysis are considered to be more representative of the actual benefit of a new treatment in real life.

Per protocol analysis (PP): This analysis population is restricted to the participants who strictly fulfil the protocol requirements in terms of patient eligibility criteria, treatment compliance and outcome assessment. PP analyses usually exclude patients who have not had at least one dose of the allocated treatment, all ineligible patients, patients with major protocol violations and sometimes patients with incomplete data for the targeted endpoint. A PP analysis is useful in determining the biological effect of a treatment. However, the value of the treatment may not be shown in a real-life situation since PP analysis is restricted to a highly selected patient subgroup corresponding to an ‘ideal’ setting.

Safety population: All randomised patients who have started their allocated treatment (at least one dose of the trial medicine). This analysis population is often used to describe the safety profile of a treatment.

Sub-group Analysis

This section of the SAP aims to detail which sub-group analyses will be performed. Controlled clinical trials are designed to investigate the effect of a treatment in a given population of patients. Sub-group analyses involve splitting the trial participants into sub-groups. This could be based on:
  • demographic characteristics (e.g. sex, age)
  • baseline characteristics (e.g. a specific genomic profile)
  • use of concomitant therapy.
The principle is to look at the effects of treatment separately in different types of patients in order to collect information on who will benefit most from the investigated treatment. Sometimes, sub-group analyses are used to clarify heterogeneous treatment effects, e.g. when certain patient characteristics are driving the response to treatment.

Findings from sub-group analyses might be misleading for several different reasons. Firstly, sub-group analyses are observational (sub-groups are defined on observed patients’ characteristics) and not based on randomised comparisons. The ‘hindsight bias’, also known as the ‘I-knew-it-all-along’ bias, is the inclination to see events that have already occurred as being more predictable than they were before they took place. This is why sub-group analysis should be pre-planned.

Even when pre-planned, they are still open to criticism of ‘multiplicity’. When multiple sub-group analyses are performed, the risk of finding a false positive result (i.e. a type I error) increases with the number of sub-group comparisons. Multiplicity issues are in general related to repeated ‘looks’ at the same data set but in different ways until something ‘statistically significant’ emerges. With the wealth of data sometimes obtained, all signals should be considered carefully. Researchers must be cautious about possible over-interpretation. Techniques exist to protect against multiplicity, but they mostly require stronger evidence for statistical significance to control the overall type I error of the analysis. Here is a list of some of the common methods: (see also FDA Guidance on complex trial results).


Finally, there is a tendency to conduct analyses comparing sub-groups based on information collected during the trial. A typical example is looking at the difference in survival between patients responding (yes/no) to treatment. Patients who are responding to treatment are by definition patients who are able to spend sufficient time on treatment to allow a response. Therefore, again by definition, they may simply represent a sub-group of patients of better prognosis and may therefore bias the analysis. This is an example of what is often referred to as ‘lead-time bias’ or ‘guarantee-time bias’. One way of dealing with this is using a landmark as a starting point for the survival analysis, and creating the categories based on the patients’ characteristics at the time of this landmark (e.g. did a patient respond at three months, yes/no).