- Incidence of a specific disease, i.e. how many people get the disease.
- Mortality from a specific disease, i.e. how many people die from the disease.
- All causes of death.
In a cohort study, the following are defined:
- The ‘outcome of interest’, i.e. the disease or condition being studied.
- The ‘exposure of interest’, i.e. a risk factor like smoking.
The study population should not already have the disease or outcome, but will have had exposure to the risk factor. They are then followed in time until the disease or outcome of interest occurs.
Because exposure is identified before the outcome, cohort studies have a temporal framework to assess causality. In other words, the relationship between cause and affect can be assessed over time. These studies therefore have the potential to provide the strongest scientific evidence.
In cohort studies, there may be a comparison group or ‘cohort’. The comparison cohort may be the general population from which the first cohort was drawn. On the other hand, it may be another cohort of persons thought to be similar but having had little or no exposure to the factor under investigation (e.g. never smoked). Alternatively, sub-groups within one cohort may be compared with each other (e.g. based on number of cigarettes smoked per day). Additionally, the investigator can look at multiple outcomes at the same time.
As the study is conducted, the outcome from participants in each cohort is measured and relationships with specific characteristics (e.g. risk factors) is decided.
An example of an epidemiological question that can be answered using a cohort study is:
Does exposure to X (say, smoking) have a link with outcome Y (say, lung cancer)?
Such a study would recruit a group of smokers (the exposed group) and a group of non-smokers (the unexposed group). It would then follow them for a set period of time and note differences in the incidence of lung cancer between the groups at the end of this time. The groups are matched in terms of many other variables, such as:
- Economic status (i.e. education, income and occupation).
- Health status (i.e. presence of other diseases).
This means that the variable being assessed, the ‘independent variable’ (in this case, smoking), can be isolated as the cause of the ‘dependent variable’ (in this case, lung cancer).
In this example, a statistically significant increase in the incidence of lung cancer in the smoking group as compared to the non-smoking group is evidence in favour of assuming a causal relationship between smoking and lung cancer. However, rare outcomes, such as lung cancer, are generally not studied with the use of a cohort study. These are usually studied with the use of a ‘case-control study’, described further below.
Two examples of cohort studies that have been going on for more than 50 years are the Framingham Heart Study and the National Child Development Study (NCDS), the most widely-researched of the British birth cohort studies. The largest cohort study in women is the Nurses Health Study. Started in 1976, it is tracking over 120.000 nurses and has been analysed for many different conditions and outcomes.
Cohort studies can be prospective or retrospective:
- Prospective studies are carried out from the present time into the future. Because a prospective study is designed with specific data collection methods, it has the advantage of being tailored to collect specific exposure data and may be more complete. The disadvantage of a prospective cohort study may be the long follow-up period while waiting for events or diseases to occur. Thus, this study design is not useful for investigating diseases with long latency (inactive) periods. It is also open to a high ‘loss to follow-up’ rate – this is where researchers cannot contact participants at follow up points, therefore data cannot be collected.
- Retrospective cohort studies, also known as historical cohort studies, are carried out at the present time and look to the past to examine medical events or outcomes. In other words, a cohort selected based on exposure status is chosen at the present time. Outcome data (i.e. disease status, event status), which was measured in the past, are reconstructed for analysis. The main disadvantage of this study design is the limited control the investigator has over data collection. The existing data may be incomplete, inaccurate, or inconsistently measured between subjects. However, because of the immediate availability of the data, this study design is comparatively less costly and shorter than prospective cohort studies.