Area Under Precision-Recall Curve

Description

Area Under Precision-Recall Curve (AUPRC) measures model performance by plotting precision (the proportion of positive predictions that are correct) against recall (the proportion of actual positives that are correctly identified) at various classification thresholds, then calculating the area under the resulting curve. Unlike accuracy or AUC-ROC, AUPRC is particularly valuable for imbalanced datasets where the minority class is of primary interest---a perfect score is 1.0, whilst random performance equals the positive class proportion. By focusing on the precision-recall trade-off, it provides a more informative assessment than overall accuracy for scenarios where false positives and false negatives have different costs, especially when positive examples are rare.

Example Use Cases

Reliability

Evaluating fraud detection models where genuine transactions far outnumber fraudulent ones, using AUPRC to optimise the balance between catching fraud (high recall) and minimising false alarms (high precision) for cost-effective operations.

Transparency

Providing transparent performance metrics for rare disease detection systems to medical regulators, where AUPRC clearly shows model effectiveness on the minority positive class rather than being masked by high accuracy on negative cases.

Fairness

Ensuring fair evaluation of loan default prediction across demographic groups by comparing AUPRC scores, revealing whether models perform equally well at identifying high-risk borrowers regardless of protected characteristics.

Limitations

  • More sensitive to class distribution than ROC curves, making it difficult to compare models across datasets with different positive class proportions or to set universal performance thresholds.
  • Can be overly optimistic on extremely imbalanced datasets where even random predictions may achieve seemingly high AUPRC scores due to the small positive class size.
  • Provides limited insight into performance at specific operating points, requiring additional analysis to determine optimal threshold selection for deployment.
  • Interpolation methods for calculating the area under the curve can vary between implementations, potentially leading to slightly different scores for the same model.
  • Less interpretable than simple metrics like precision or recall at a fixed threshold, making it harder to communicate performance to non-technical stakeholders.

Resources

scikit-learn Precision-Recall
Documentation

Comprehensive guide to precision-recall curves and AUPRC calculation in scikit-learn with practical examples

Stochastic Optimization of Areas Under Precision-Recall Curves with Provable Convergence
Research PaperQi Qi et al.

Technical paper on optimising AUPRC directly during model training with convergence guarantees

A Closer Look at AUROC and AUPRC under Class Imbalance
Research PaperMatthew B. A. McDermott et al.Jan 11, 2024

Recent analysis of AUPRC behaviour under extreme class imbalance with practical recommendations

DominikRafacz/auprc
Software Package

R package for calculating AUPRC with functions for plotting precision-recall curves and mlr3 integration

Tags

Applicable Models:
Data Requirements:
Data Type:
Evidence Type:
Expertise Needed:
Technique Type: