Gradient-weighted Class Activation Mapping

Explainability Fairness

Description

Grad-CAM creates visual heatmaps showing which regions of an image a convolutional neural network focuses on when making a specific classification. Unlike pixel-level techniques, Grad-CAM produces coarser region-based explanations by using gradients from the predicted class to weight the CNN's final feature maps, then projecting these weighted activations back to create an overlay on the original image. This provides intuitive visual explanations of where the model is 'looking' for evidence of different classes.

Example Use Cases

Explainability

Validating that a melanoma detection model focuses on the actual skin lesion rather than surrounding healthy skin, medical equipment, or artifacts when making cancer/benign classifications.

Debugging an autonomous vehicle's traffic sign recognition system by visualising whether the model correctly focuses on the sign itself rather than background objects, shadows, or irrelevant visual elements.

Fairness

Auditing a medical imaging system for racial bias by examining whether diagnostic predictions inappropriately focus on skin tone regions rather than actual pathological features, ensuring equitable healthcare AI deployment.

Limitations

Requires access to the CNN's internal feature maps and gradients, limiting use to white-box scenarios.
Resolution is constrained by the final convolutional layer's feature map size, producing coarser localisation than pixel-level methods.
Only applicable to CNN architectures with clearly defined convolutional layers, not suitable for other neural network types.
May highlight regions that correlate with the class but aren't causally important for the model's decision-making process.

Resources

jacobgil/pytorch-grad-cam

Software Package

Grad-CAM: Visualize class activation maps with Keras, TensorFlow ...

Tutorial

kazuto1011/grad-cam-pytorch

Software Package

A Tutorial on Explainable Image Classification for Dementia Stages Using Convolutional Neural Network and Gradient-weighted Class Activation Mapping

Documentation•Kevin Kam Fung Yuen

A Guide to Grad-CAM in Deep Learning - Analytics Vidhya

Tutorial

Related Techniques

Name	Description	Assurance Goals
Partial Dependence Plots	Partial Dependence Plots show how changing one or two features affects a model's predictions on average. The technique works by varying the selected feature(s) across their full range whilst keeping all other features fixed at their original values, then averaging the predictions. This creates a clear visualisation of whether increasing or decreasing a feature tends to increase or decrease predictions, and reveals patterns like linear trends, plateaus, or threshold effects that help explain model behaviour.	Explainability
Internal Review Boards	Internal Review Boards (IRBs) provide independent, systematic evaluation of AI/ML projects throughout their lifecycle to identify ethical, safety, and societal risks before they materialise. Typically composed of multidisciplinary experts including ethicists, domain specialists, legal counsel, community representatives, and technical staff, IRBs review project proposals, assess potential harms to individuals and communities, evaluate mitigation strategies, and establish ongoing monitoring requirements. Unlike traditional research ethics committees, AI-focused IRBs address algorithmic bias, fairness concerns, privacy implications, and societal impact at scale, providing essential governance for responsible AI development and deployment.	Safety Fairness Transparency
Bootstrapping	Bootstrapping estimates uncertainty by repeatedly resampling the original dataset with replacement to create many new training sets, training a model on each sample, and analysing the variation in predictions. This approach provides confidence intervals and stability measures without making strong statistical assumptions. By showing how predictions change with different random samples of the data, it reveals how sensitive the model is to the specific training examples and provides robust uncertainty estimates.	Reliability Transparency Fairness
Individual Conditional Expectation Plots	ICE plots display the predicted output for individual instances as a function of a feature, with all other features held fixed for each instance. Each line on an ICE plot represents one instance's prediction trajectory as the feature of interest changes, revealing whether different instances are affected differently by that feature.	Explainability
Adversarial Debiasing	Adversarial debiasing reduces bias by training models using a competitive adversarial setup, similar to Generative Adversarial Networks (GANs). The technique involves two neural networks: a predictor that learns to make accurate predictions on the main task, and an adversary (bias detector) that attempts to predict protected attributes (such as race, gender, or age) from the predictor's internal representations. Through adversarial training, the predictor learns to produce representations that retain predictive power for the main task whilst being uninformative about protected characteristics, thereby reducing discriminatory bias.	Fairness
Area Under Precision-Recall Curve	Area Under Precision-Recall Curve (AUPRC) measures model performance by plotting precision (the proportion of positive predictions that are correct) against recall (the proportion of actual positives that are correctly identified) at various classification thresholds, then calculating the area under the resulting curve. Unlike accuracy or AUC-ROC, AUPRC is particularly valuable for imbalanced datasets where the minority class is of primary interest---a perfect score is 1.0, whilst random performance equals the positive class proportion. By focusing on the precision-recall trade-off, it provides a more informative assessment than overall accuracy for scenarios where false positives and false negatives have different costs, especially when positive examples are rare.	Reliability Transparency Fairness

Tags

Applicable Models:

Convolutional Neural Network

Data Requirements:

Access To Model Internals

Data Type:

Evidence Type:

Expertise Needed:

Explanatory Scope:

Lifecycle Stage:

Model Development

Technique Type: