4. Measurement methodology in PRO research

Site:	EUPATI Open Classroom
Course:	HTA and Evaluation Methods: Qualitative
Book:	4. Measurement methodology in PRO research

Printed by:	Guest user
Date:	Wednesday, 2 July 2025, 5:03 AM

Section Overview

1. Measurement methodology in PRO research

1. Measurement methodology in PRO research

(This section is organised in the form of a book, please follow the blue arrows to navigate through the book or by following the navigation panel on the right side of the page.)

PROMs are the tools and/or instruments used to report PROs. Every PRO instrument contains a concept of measurement (which may be symptoms, sensation, functioning and quality of life) and way of measuring this (a rating scale of some kind).

Once the concept and the items are identified and set out, careful decisions also need to be made about:

how the questions are delivered to patients,
when the questions are delivered to patients,
how answers are recorded, and
how the data is interpreted.

Typically, PROs are measured with questionnaires or surveys that are either:

completed by the patients themselves,
completed by the patients in the presence of the researcher,
completed by the researcher through face-to-face interview or by telephone interview,
via different interfaces such as hand-held devices or computers (see below ePROMs).

There are strengths and weaknesses to the different approaches to collecting information. For example, while the use of trained interviewers reduces errors and ensures surveys are completed, trial/treatment resources may not allow for this.It is crucial that approaches and methods used address patients’ perceptions and the actual concepts being measured rather than focusing on the interviewer and on the way questions are asked (watch out for interviewer bias). In the example given in the section above, morning symptoms can more reliably be ascertained if the questionnaire is administered in the morning than if the questionnaire is completed later during the day.

1.1. Properties of PROMS

The researchers who develop these tools/instruments must make every attempt to ensure that they are measuring concepts important to patients in a way that is repeatable and understandable.

A well-designed PRO questionnaire should assess either a single underlying characteristic or, where it addresses multiple characteristics, should be a number of scales that each address a single characteristic.

Questionnaires may be generic (designed to be used in any disease population and cover a broad aspect of the construct measured) or condition-targeted (developed specifically to measure those aspects of outcome that are of importance for people with a particular medical condition).

Table 2 below provides an overview of important aspects to be considered in PROMs.

Table 2: Important properties of PROMs

Property	Description	Explanatory notes
RELIABILITY	Measurements are repeatable and consistent, and must distinguish between changes in response and changes due to errors in administration. Is a necessary (but not comprehensive) component of validity.	1) Test-retest reliability- a measure of the ability of e.g., a psychologic testing instrument to yield the same result for a single Patient at 2 different closely spaced test periods, so that any variation detected reflects reliability of the instrument rather than changes in the Patient's status. May be considered as intra-interviewer reliability for interviewer administered PROs. 2) Internal consistency- consistency of the results delivered in a test, ensuring that the various items measuring the different constructs deliver consistent scores. 3) Inter-interviewer reliability – determines the changes in the results when the instrument is administered by two or more interviewers.
VALIDITY

Criterion validity	The extent to which the scores of a PRO instrument are related to a known gold standard measure of the same concept (correlate with another measure considered more accurate).	Practically, it is difficult to determine the criterion validity as there is no gold standard for most PROs. Two types: “predictive” and “concurrent”. In the former Check: the ability to predict something in the real world which is present theoretically, for instance, if slowness of movement in Parkinson's disease is measured, one could provide the respective patients with the result and determine if there is a high correlation between the scores and the degree of slowness. High correlation is evidence for “predictive validity”. Check: in concurrent validity the ability of the item to distinguish between the groups in real world as theoretically expected. E.g. in the assessment of breathlessness, the item should be able to differentiate between asthma and COPD
Content validity ”Face validity” (appears to measure concept of interest) ”Content validity in context of use” (adequately covers concept/domain of interest)	The extent to which an instrument measures what it is intended to measure (the concept of interest).	A lack of content validity means that data gathered may appear useful but is in fact measuring a different effect. (off target) Check e.g. by: - Literature review - Expert opinion - Evaluation of how well items deal with the complete continuum of patient experiences - Patient and clinician evaluation of the relevance and comprehensiveness of the content contained in the measures through qualitative research with the targeted patient population (essential) - Input from target population of patients to document understandability and comprehensiveness of measure - Diversity in demographic & disease characteristics of target population
Construct validity	Logical relationships should exist between assessment measures	A lack of construct validity means that data gathered may appear useful but is in fact measuring a different effect. Check: Evidence that relationships among concepts, (domains) and items conform to a priori hypotheses concerning logical relationships that should exist with other measures or characteristics of patients and patient groups Example: In COPD patients: expect that patients with lower treadmill exercise capacity generally will have more dyspnoea (shortness of breath) in daily life than those with higher exercise capacity, and – logical connection - expect substantial correlations between a new measure of emotional function and existing emotional function questionnaires.
RESPONSIVENESS Also: sensitivity to change or ability to detect change	Captures changes over time in the construct being measured – before and after an intervention OR–in different disease or treatment states	If the instrument is not adequately responsive, it may fail to measure patient response in which the intervention improves how patients feel and return false negative results. Check: Evaluated within specific populations and not a fixed/inherent property of the item Determine the relevant, clinically meaningful effect size
PRACTICALITY and FEASIBILITY	Measurements are easily obtained, and the instrument is easy to operate.	If the instrument is difficult to operate, patient response may change but not be measured or be measured inappropriately. If the instrument requires inordinate time the compliance is jeopardised. Check: - Verify the acceptability of the PROM by patients - Review the instructions, questions, response options and recall period of the PROM
INTERPRETABILITY	The degree to which one can assign qualitative meaning (clinical or commonly understood connotations) to a PROM’s quantitative scores or change in scores[1]	Patients or clinicians or payers may not make best decisions if output of tool is difficult to understand. PROMs vary in scale, measurement, and interpretation, making it difficult for patients and physicians alike to reach conclusions when evaluating results from these measures, e.g., in clinical trials. Metrics can indicate a favorable result with higher values, whereas some indicate a favorable result with lower values. This can lead to a lack of patient or clinician acceptance and may limit its utility. Check: The meaningfulness of scores produced by the questionnaire – What does a score mean? – What is the minimal clinically important difference (MID)? (see box below) – Should an overall score be computed and presented to support a given claim?

Note on minimal clinically important difference (MID)
An important advance in HRQoL research is the concept of minimum clinically important difference (MID), defined as the smallest difference in score on an HRQoL instrument that patients perceive as beneficial and that would suggest, in the absence of troublesome side effects and excessive cost, a change in the patient’s management. Differences in scores smaller than the MID are considered unimportant, regardless of whether statistical significance is reached. For example, although an average change of 0.15 point on the HAQ-DI (the health assessment questionnaire (HAQ) disability index (DI)) may be statistically significant in a clinical trial, it may not be perceived as meaningful by study participants, and would not meet MID criteria since the MID for the HAQ-DI in is 0.22 point in that study. MID estimates of HRQoL measures have influenced designs of subsequent clinical trials aimed at improving HRQoL.

[1] Mokkink, L. B., Terwee, C. B., Patrick, D. L., Alonso, J., Stratford, P. W., Knol, D. L., et al. (2010). The COSMIN study reached international consensus on taxonomy, terminology, and definitions of measurement properties for health-related patient-reported outcomes. Journal of Clinical Epidemiology, 63(7), 737–745.

1.2. Patient involvement in developing PROMs

Until recently, many PROMs have not been developed with the extensive participation of patients (1) by clinicians. These tended to gather data that is important from the clinician’s perspective rather than patients and were often in a language and format that was a barrier to engagement and participation. This means that PROMs are not necessarily measuring concepts important to patients. This gap has been realised and increasingly efforts are made to include patients in the development of PROMs although barriers remain, notably the necessary time investment and budget impact[1][2]. There are a number of things patient groups can do to address this gap, before, during, and after PROM development:

Identifying the need for PROMs - This is of particular importance to pharmaceutical companies, who must identify measures very early on in medicines development.

Patient input to setting the research question to be addressed, as well as to the forming of outputs, can facilitate the design of tools and the capture of relevant PROs. Patient input into setting the question can enhance payback in a number of ways.

On the impact of the research process and on the experience of receiving care and support
On the trend to complexity in R&D and the demands this makes on research participants (focusing investigations
What (if any) tangible results patients might anticipate seeing as a result of their participation (making the recruitment of subjects easier and more honest).

Evaluating and reviewing PROMs - Patients and patient groups can learn to appraise the quality of PROMs. They can then use the information they gain to inform similar patients regarding which scales are acceptable and which are not. This may be particularly important for patients consenting to participate in clinical trials.

In the following a few aspects of such an evaluation are given (non comprehensive) which assess comprehension of the PROM and its relevance by:

- Ensuring that the items within the PROM cover the aspects of the concept aimed to be measured.

- Verifying the acceptability of the PROM by patients.

- Collecting feedback on each question (item) of the PROM (i.e. perceived patient interpretation of each question, misunderstandings related to any questions and any proposed reformulations of questions.

- Critically reviewing the instructions, questions, response options and recall period of the PROM.

Developing and evaluating conceptual and/or theoretical frameworks - Validating potential tools requires qualitative research with patients. Although in the past patients have been consulted, there is an identified need for collaboration in tool development.

Trigger further patient education and interventions, demonstrating to patients that the data they input is meaningful to their care.

[1] Wiering, B., de Boer, D. & Delnoij, D. Patient involvement in the development of patient-reported outcome measures: The developers’ perspective. BMC Health Serv Res 17, 635 (2017). https://doi.org/10.1186/s12913-017-2582-8

[2] Carlton, J., Peasgood, T., Khan, S. et al. An emerging framework for fully incorporating public involvement (PI) into patient-reported outcome measures (PROMs). J Patient Rep Outcomes 4, 4 (2020). https://doi.org/10.1186/s41687-019-0172-8

1.3. Points to consider when adapting PROMs

Users might be tempted to make some changes to established measures by modifying the original design of the approaches of the instruments/tools. This may be something that seems harmless, like translating the tool into a local language or using an electronic questionnaire instead of a paper-based one. However simple modifications to tools like these may significantly change the performance and ability to interpret results. Thorough efforts must go into re-validating and testing a PRO instrument when changing its use from what it was originally intended to measure, which would lead to a new PROM (e.g., in a different therapeutic field, but also when using a different approach (e.g., moving from paper to ePROM).