Methods¶
4a Source of Data¶
Description
Describe the study design or source of data (e.g., randomized trial, cohort, or registry data), separately for the development and validation data sets, if applicable.
Information about the quality of the data, the statistical analyses performed, the sampling strategy and the generalisability of the prediction model help readers determine whether the model should be used in a specific context or how the study can be replicated.
Supports critical appraisal of whether the study design and data source were appropriate for the study objective.
Providing information about the study design helps ensure that possible methodological biases in the data dimension of research design are more easily identified (e.g. ‘allocation bias’).
Providing information about the source of data helps evaluate risk of discriminatory harms to patient groups that may be impacted by use of the model.
Providing information about sampling methods helps identify risk of bias that may arise from an unrepresentative dataset.
Providing details of data collection and sources promotes critical appraisal of ethical issues related to data usage (e.g. Was the data collected adequate for the study? Were principles such as ‘data minimisation’ adhered to?).
4b Source of Data¶
Description
Specify the key study dates, including start of accrual; end of accrual; and, if applicable, end of follow-up.
Reporting the date interval situates the study within a historical and temporal context, which is critical in prognostic studies, and may provide additional information salient to the study’s objectives or further research (e.g. additional validation studies).
Promotes sustainability of research by supporting replication and future advances (e.g. information about temporal context may allow future research to deduce which medical technologies were available in this period for measuring certain predictors and to determine whether/how newer technologies differ in performance).
5a Participants¶
Description
Specify key elements of the study setting (e.g., primary care, secondary care, general population) including number and location of centres.
Prediction models are developed and validated in a particular medical context (e.g. primary, secondary, or tertiary care), with varying distribution of predictors, participant or setting characteristics, and outcome prevalence or incidence. A detailed description of the study setting is important for assessing the generalisability, transportability and efficacy of the model to additional contexts. Consider including a table summarising the key elements of the study characteristics for the development and any validation sample, to provide the reader insight into any differences in case mix and its potential consequences.
The predictive performance of a model will likely vary across contexts. As such, details of the study settings can help identify and mitigate the risk of discriminatory harm that could arise from deployment of the model outside of its original context and in situations where the “case mix” (the participant or setting characteristics, outcome prevalence, and predictor distribution) of the study differs from that of the target population.
Information about the study setting also supports comparative assessments of how a model supports ethical goals of fairness (e.g. comparative performance of multiple models based on how they advance overall public health equity—this is important for policy decisions)
5b Participants¶
Description
Describe eligibility criteria for participants.
Information about the eligibility criteria is necessary for understanding the potential applicability and generalisability of the prediction model.
Information about the eligibility criteria can help identify possible biases that may impact the predictive performance of the model (e.g. ‘performance bias’ or statistical bias arising from handling of missing data).
5c Participants¶
Description
Give details of treatments received, if relevant.
Details about any treatment received by individuals is necessary to understand how the treatment could affect health outcomes being studied or how it could have been influenced by the same predictors that are included in the statistical modelling.
Accurate reporting of treatments received is vital for the assessment of the model’s performance and to determine the clinical efficacy of using the model in a particular context. Failing to specify this information may result in the misuse of the model, leading to avoidable harm to patients or wasted clinical resources.
6a Outcome¶
Description
Clearly define the outcome that is predicted by the prediction model, including how and when assessed.
Unambiguous details about the health outcome and the reference standard used to determine the outcome (e.g. radiology assesments) are central components of the prediction model. Inadequate or inconsistent details about the predicted outcomes can prevent future research from validating the model, replicating the study, or using the model effectively.
Details about the reference standard used to determine the outcome can be a valuable source of information for detecting and mitigating biases that arise in the collection of data (e.g. different assesment practices in multi-site studies that affect the consistency of the data). Clear presentation of outcome defintion can also enable a higher degree of transparency about label choice allowing for easier peer evaluation of its validity.
6b Outcome¶
Description
Report any actions to blind assessment of the outcome to be predicted.
Prior knowledge of the predictors being studied can influence the outcome assessment by clinicians, leading to a biased estimation of the association between predictors and outcomes, most notably when the outcome assessment requires interpretation (e.g. cause-specific death, consensus diagnosis). As such, it is important to report any actions taken to blind the assesment of the outcome to be predicted.
Failure to specify actions to blind assessment can lead to an over-optimistic assessment of the model’s performance, leading to performance deficiencies that may cause harm and inappropriate use or deployment of the model.
Providing information about actions taken to blind assessment can help uncover, identify and mitigate possible biases that could impact performance of the model (e.g. ‘observer bias’).
7a Predictors¶
Description
Clearly define all predictors used in developing or validating the multivariable prediction model, including how and when they were measured.
Defining the predictors in an unambiguous manner, including how and when the predictors were measured, is vital to ensure that future investigators can replicate the study, and validate or implement the model.
Clearly and concisely defining predictors supports the interpretability and reproducibility of a prediction model. It also facilitates critical assessment, future improvement and reevaluation.
7b Predictors¶
Description
Report any actions to blind assessment of predictors for the outcome and other predictors.
In certain contexts, assessment of predictors may require subjective judgement (e.g. assessment of imaging results). Knowledge of the outcome, therefore, could artificially increase the association between the outcome and the predictors, and knowledge of other predictors may bias the interpretation of specific predictors.
Failure to specify actions to blind assessment can lead to an over-optimistic assessment of the model’s performance, leading to inappropriate use or deployment of the model.
Providing information about actions taken to blind assessment can help uncover and identify possible biases that could impact performance of the model (e.g. ‘observer bias’)
8 Sample Size¶
Description
Explain how the study size was arrived at.
Details about the number of outcome events and the sample size are vital for assessing the performance of the model (e.g. whether the performance is overestimated given the effective sample size). This is a particular issue when a model is developed and assessed for its predictive accuracy on the same data set with a small sample as the model’s performance will likely be overestimated.
Accurate and adequate information about a model’s performance helps determine whether there is a risk of statistical bias (e.g. from overfitting), which could impact the clinical efficacy of the model. Having proper mechanisms of validation is essential for building confidence among clinicians about the trustworthiness of a model.
9 Missing Data¶
Description
Describe how missing data were handled (e.g., complete-case analysis, single imputation, multiple imputation) with details of any imputation method.
It should not be assumed that absence of a description about how missing data was handled implies that participants with missing data have been omitted from any analyses. This absence should be avoided through transparent reporting when a model has used any imputation methods to address issues that have are missing outcome or predictor data. Furthermore, including only participants with complete data is both inefficient (i.e. reducing effective sample size) and may lead to biased results when the remaining individuals are not representative of the original study sample.
Reporting how missing data was handled (i.e. details of the imputation method selected in a software package; rationale for excluding particular individuals) can help identify biases (e.g. ‘selection bias’) that may impact the outcomes of using the model in a clinical context.
10a Statistical Analysis Methods¶
Description
Describe how predictors were handled in the analyses.
If a continuous variable has been converted into a categorical variable, details about the cut points and how they were chosen should be reported to allow readers to evaluate whether the information loss was significant enough to impact the performance of the model and whether the decision was justified given the objective (e.g. to segment individuals into distinct risk groups). Likewise, if predictors have been grouped, separated, or recategorised, this should be clearly reported along with the rationale for doing so.
Information about how predictors were handled supports the critical appraisal of the study against several ethical benchmarks, including privacy (e.g. was the choice to categorise a variable motivated by privacy concerns related to risk of deanonymisation?) and responsibility (e.g. does the potential loss of information cause difficulties for researchers wishing to be responsive to future updating of the model?). Moreover choices to aggregate, bunch, or break up predictors may reflect biases that cause discriminatory harm, so transparently reporting the reasoning behind such choices is crucial.
10b Statistical Analysis Methods¶
Description
Specify type of model, all model-building procedures (including any predictor selection), and method for internal validation.
Details about the type of model (e.g. regression model or support vector machine) support subsequent comparison studies exploring relative performance of different models.
Details about predictor selection (e.g. rationale for excluding specific predictors a priori from the full set of available predictors; details about whether this was performed before or during modelling) inform readers about potential biases in the selection process. Furthermore, if the predictors were selected following examination of their interactions (e.g. selecting only the strongest), this can provide clues about possible overfitting.
Presenting details about the internal validation method chosen (e.g. cross-validation or bootstrapping) allows readers to appreciate the apparent performance of the model in the context of any further external validation.
Information about the type of model used is important for justifying ethical decisions that may arise in the context of explainability. For example, whether the loss of interpretability stemming from use of a complex or opaque model was justified based on the increased performance metrics.
Design choices related to the model architecture can introduce a range of biases that may result in discriminatory harm to patients from protected groups. Providing sufficient information promotes transparency and accountability.
10c Statistical Analysis Methods¶
Description
For validation, describe how the predictions were calculated.
Evaluating the performance of an existing prediction model for a new set of individuals is a comparative process (i.e. making predictions from the original model and comparing with the outcomes from the validation data set). Therefore, the authors need to state how the predictions were calculated to allow readers to critically assess and evaluate the validation study.
A robust, useable, and reproducible model will facilitate evaluation and validation. This is a critical element of its sustainability and of the continuous process of understanding how models can safely evolve and be revised and improved in the midst of shifts in underlying data distributions.
10d Statistical Analysis Methods¶
Description
Specify all measures used to assess model performance and, if relevant, to compare multiple models.
There are various measures of model performance (e.g. calibration and discrimination), which have varying strengths and weaknesses conditional on the context employed. As such, details about the measures used and a justification of the rationale for selecting them helps readers assess whether the measures are adequate for the purpose of the study.
Discrimination refers to the ability of a model to differentiate between classes of individuals who do or do not experience the outcome event being studied. A model’s performance (measured as discriminatory ability) may differ across sub-groups of the population, leading to discriminatory harm for protected or minority groups. Reporting these measures is therefore a crucial part of justifying and evaluating a model’s fairness. However, as discrimination and calibration are statistical properties, neither captures the clinical consequences of implementing a biased model in a particular context, but require a fuller understanding of the sociotechnical system in which the model is deployed—perhaps alongside complementary methods such as decision curve analysis. In addition, clear specification of performance measure and uncertainty metrics will foster the useability and interpretability of the model on the clinical side. A clear understanding of model performance and its limitations is a hallmark of responsible implementation.
10e Statistical Analysis Methods¶
Description
Describe any model updating (e.g., recalibration) arising from the validation, if done.
Predictive performance of a model can decrease when validating a model in a different setting (i.e. geographically different individuals). If a model is subsequently updated in response to this validation it is important to report the methods by which the model was updated to allow reviewers to critically evaluate the study and subsequent performance of the updated model.
A model being used in changing environments where underlying populations may differ or change needs to be treated as a “living” system and adapted accordingly to remain safe and sustainable. This process of updating and recalibration must be undertaken in a maximally transparent and well-documented way, so that changes made to an existing model can be assessed and critically evaluated.
11 Risk Groups¶
Description
Provide details on how risk groups were created, if done.
As there is no clear consensus on how to create risk groups or how many to use, it is vital that authors present details about how they were chosen and how thresholds between them were determined to enable subsequent assessment or replication of the prediction study.
Where the distribution of risk groups is intended to support clinical decision-making, the rationale for the choice of thresholds provides important information that can be used to identify potential misuse of the decision tool (e.g. cognitive or discriminatory biases that lead to the disproportionate harm of a particular risk group). Clear description of risk group determination is also necessary to support the explainability and interpretability of the model in clinical decision-support environments.
12 Development vs. Validation¶
Description
For validation, identify any differences from the development data in setting, eligibility criteria, outcome, and predictors.
Authors of validation studies should clearly report any modifications in setting, eligibility criteria, predictors, outcome definition and measurement, and how these differences were handled, as such modifications could impact on the veracity of the validation study.
A robust, useable, and reproducible model will facilitate iterative evaluation and validation. This is a critical element of its sustainability and of the continuous process of understanding how models can safely evolve and be revised and improved in the midst of shifts in underlying data distributions.