Safety

19 techniques

Ensuring AI systems operate safely and do not cause harm.

19 techniques
GoalsModelsData TypesDescription
Out-of-DIstribution detector for Neural networks
Algorithmic
Neural Network
Any
ODIN (Out-of-Distribution Detector for Neural Networks) identifies when a neural network encounters inputs significantly...
Synthetic Data Generation
Algorithmic
Model Agnostic
Any
Synthetic data generation creates artificial datasets that aim to preserve the statistical properties, distributions,...
Federated Learning
Algorithmic
Model Agnostic
Any
Federated learning enables collaborative model training across multiple distributed parties (devices, organisations, or...
Homomorphic Encryption
Algorithmic
Model Agnostic
Any
Homomorphic encryption allows computation on encrypted data without decrypting it first, producing encrypted results...
Deep Ensembles
Algorithmic
Neural Network
Any
Deep ensembles combine predictions from multiple neural networks trained independently with different random...
Safety Envelope Testing
Testing
Model Agnostic
Any
Safety envelope testing systematically evaluates AI system performance at the boundaries of its intended operational...
Internal Review Boards
Process
Model Agnostic
Any
Internal Review Boards (IRBs) provide independent, systematic evaluation of AI/ML projects throughout their lifecycle to...
Red Teaming
Procedural
Model Agnostic
Any
Red teaming involves systematic adversarial testing of AI/ML systems by dedicated specialists who attempt to identify...
Anomaly Detection
Algorithmic
Model Agnostic
Any
Anomaly detection identifies unusual behaviours, inputs, or outputs that deviate significantly from established normal...
Human-in-the-Loop Safeguards
Process
Model Agnostic
Any
Human-in-the-loop safeguards establish systematic checkpoints where human experts review, validate, or override AI/ML...
Confidence Thresholding
Algorithmic
Model Agnostic
Any
Confidence thresholding creates decision boundaries based on model uncertainty scores, routing predictions into...
Runtime Monitoring and Circuit Breakers
Algorithmic
Model Agnostic
Any
Runtime monitoring and circuit breakers establish continuous surveillance of AI/ML systems in production, tracking...
Model Cards
Documentation
Model Agnostic
Any
Model cards are standardised documentation frameworks that systematically document machine learning models through...
Model Distillation
Algorithmic
Neural Network
Any
Model distillation transfers knowledge from a large, complex model (teacher) to a smaller, more efficient model...
Model Pruning
Algorithmic
Neural Network
Any
Model pruning systematically removes less important weights, neurons, or entire layers from neural networks to create...
Neuron Activation Analysis
Algorithmic
Neural Network
LLM
+1
Text
Neuron activation analysis examines the firing patterns of individual neurons in neural networks by probing them with...
Prompt Sensitivity Analysis
Experimental
LLM
Text
Prompt Sensitivity Analysis systematically evaluates how variations in input prompts affect large language model...
Causal Mediation Analysis in Language Models
Mechanistic Interpretability
LLM
Transformer
Text
Causal mediation analysis in language models is a mechanistic interpretability technique that systematically...
Feature Attribution with Integrated Gradients in NLP
Gradient Based
Transformer
LLM
Text
Applies Integrated Gradients to natural language processing models to attribute prediction importance to individual...
Rows per page
Page 1 of 1