Project Design and Pre-processing Biases


Fig. 2 A simplified schematic of a project worklow.

Representation Bias

Deliberative Prompts

  • How have you measured and evaluated the representativeness of the dataset to ensure that the sample is adequate?

  • Have you consulted stakeholder groups to verify that your dataset is representative?

Related biases: Missing Data Bias, Label Bias, Training-Serving Skew

Label Bias

Deliberative Prompts

  • How have you identified problematic labels (or features), which may be imperfect proxies, within your dataset? What steps have you taken to mitigate the possible harms that could arise from using these labels?

Related biases: Representation Bias, Chronological Bias

Missing Data Bias

Deliberative Prompts

  • How have you dealt with and recorded your handling of missing data?

  • Have you consulted with domain experts to help you identify possible explanations for the missing data and whether they may be informative?

Related biases: Admission Rate Bias, Diagnostic Access Bias

Measurement Bias

Deliberative Prompts

  • Have the data extraction methods and instruments been evaluated in conjunction with domain experts and relevant stakeholder groups?

Related biases: Spectrum Bias

Chronological Bias

Deliberative Prompts

  • Have you worked with domain experts (e.g. clinicians or public health experts) to map the patient journey and identify systematic variations between patient cohorts?

Related biases: Prevalence-Incidence Bias

Prevalence-Incidence Bias

Deliberative Prompts

  • Have you worked with domain experts (e.g. clinicians or public health experts) and explored whether the incidence rate of disease (as opposed to the prevalence) offers evidence of systematic exclusion?

Related biases: Chronological Bias, Missing Data Bias

Admission Rate Bias

Deliberative Prompts

  • Is there a secondary source of data that allows you to compare groups of patients that have been admitted to hospital with similar groups that have not been admitted? If not, have you considered how this may affect your results or model, and how to report this limitation?

Related biases: Chronological Bias

Diagnostic Access Bias

Deliberative Prompts

  • Are there stakeholder groups who can help you identify variation in healthcare access between the sub-groups in your study population? If not, have you taken the necessary steps to ensure sufficient reporting of the study sample?

Related biases: Representation Bias, Admission Rate Bias, Missing Data Bias

Spectrum Bias

Deliberative Prompts

  • Are there any systematic reviews or evidence of variation in the accuracy of the diagnostic test being used, which are relevant to the settings under investigation?

  • If there is no evidence to allow for investigation of spectrum bias, have you taken the necessary steps to ensure sufficient reporting of the study sample?

Related biases: Measurement Bias

Wrong Sample Size

Deliberative Prompts

  • Which methods or statistical indicators (e.g. p-values, confidence intervals) have been used and reported to help ensure that the findings did not arise by chance?

  • Have you considered the likely use case for the results? How will this be reported (e.g. in ‘limitations’ section) to help readers assess the relevance of the results?

Related biases: Representation Bias