Temperature Scaling
Description
Temperature scaling adjusts a model's confidence by applying a single parameter (temperature) to its predictions. When a model is too confident in its wrong answers, temperature scaling can fix this by making the predictions more realistic. It works by dividing the model's outputs by the temperature value before converting them to probabilities. Higher temperatures make the model less confident, whilst lower temperatures increase confidence. The technique maintains the model's accuracy whilst ensuring that when it says it's 90% confident, it's actually right about 90% of the time.
Example Use Cases
Reliability
Adjusting a deep learning image classifier's confidence scores to be realistic, ensuring that when it's 90% confident, it's right 90% of the time.
Transparency
Making medical diagnosis model predictions more trustworthy by providing realistic confidence scores that doctors can interpret and use to make informed decisions about patient care.
Fairness
Ensuring fair treatment across patient demographics by calibrating confidence scores equally across different groups, preventing systematic over-confidence in predictions for certain populations.
Limitations
- Only addresses calibration at the overall dataset level, not subgroup-specific miscalibration issues.
- Does not improve the rank ordering or accuracy of predictions, only adjusts confidence levels.
- Assumes that calibration errors are consistent across different types of inputs and feature values.
- Requires a separate validation set for temperature parameter optimisation, which may not be available in small datasets.