Description

Temperature scaling adjusts a model's confidence by applying a single parameter (temperature) to its predictions. When a model is too confident in its wrong answers, temperature scaling can fix this by making the predictions more realistic. It works by dividing the model's outputs by the temperature value before converting them to probabilities. Higher temperatures make the model less confident, whilst lower temperatures increase confidence. The technique maintains the model's accuracy whilst ensuring that when it says it's 90% confident, it's actually right about 90% of the time.

Example Use Cases

Reliability

Adjusting a deep learning image classifier's confidence scores to be realistic, ensuring that when it's 90% confident, it's right 90% of the time.

Transparency

Making medical diagnosis model predictions more trustworthy by providing realistic confidence scores that doctors can interpret and use to make informed decisions about patient care.

Fairness

Ensuring fair treatment across patient demographics by calibrating confidence scores equally across different groups, preventing systematic over-confidence in predictions for certain populations.

Limitations

  • Only addresses calibration at the overall dataset level, not subgroup-specific miscalibration issues.
  • Does not improve the rank ordering or accuracy of predictions, only adjusts confidence levels.
  • Assumes that calibration errors are consistent across different types of inputs and feature values.
  • Requires a separate validation set for temperature parameter optimisation, which may not be available in small datasets.

Resources

gpleiss/temperature_scaling
Software Package
Exploring the Impact of Temperature Scaling in Softmax for Classification and Adversarial Robustness
Research PaperHao Xuan, Bokai Yang, and Xingyu LiFeb 28, 2025
Neural Clamping: Joint Input Perturbation and Temperature Scaling for Neural Network Calibration
Research PaperYung-Chen Tang, Pin-Yu Chen, and Tsung-Yi HoJul 24, 2024
On Calibration of Modern Neural Networks | arXiv
Research PaperChuan Guo et al.Jun 14, 2017
On the Limitations of Temperature Scaling for Distributions with Overlaps
Research PaperMuthu Chidambaram and Rong GeJun 1, 2023

Tags

Applicable Models:
Data Type:
Evidence Type:
Explanatory Scope:
Expertise Needed:
Lifecycle Stage:
Technique Type: