Safety

19 techniques

Ensuring AI systems operate safely and do not cause harm.

19 techniques

	Goals		Models	Data Types	Description
Out-of-DIstribution detector for Neural networks		Algorithmic	Neural Network	Any	ODIN (Out-of-Distribution Detector for Neural Networks) identifies when a neural network encounters inputs significantly...
Synthetic Data Generation		Algorithmic	Model Agnostic	Any	Synthetic data generation creates artificial datasets that aim to preserve the statistical properties, distributions,...
Federated Learning		Algorithmic	Model Agnostic	Any	Federated learning enables collaborative model training across multiple distributed parties (devices, organisations, or...
Homomorphic Encryption		Algorithmic	Model Agnostic	Any	Homomorphic encryption allows computation on encrypted data without decrypting it first, producing encrypted results...
Deep Ensembles		Algorithmic	Neural Network	Any	Deep ensembles combine predictions from multiple neural networks trained independently with different random...
Safety Envelope Testing		Testing	Model Agnostic	Any	Safety envelope testing systematically evaluates AI system performance at the boundaries of its intended operational...
Internal Review Boards		Process	Model Agnostic	Any	Internal Review Boards (IRBs) provide independent, systematic evaluation of AI/ML projects throughout their lifecycle to...
Red Teaming		Procedural	Model Agnostic	Any	Red teaming involves systematic adversarial testing of AI/ML systems by dedicated specialists who attempt to identify...
Anomaly Detection		Algorithmic	Model Agnostic	Any	Anomaly detection identifies unusual behaviours, inputs, or outputs that deviate significantly from established normal...
Human-in-the-Loop Safeguards		Process	Model Agnostic	Any	Human-in-the-loop safeguards establish systematic checkpoints where human experts review, validate, or override AI/ML...
Confidence Thresholding		Algorithmic	Model Agnostic	Any	Confidence thresholding creates decision boundaries based on model uncertainty scores, routing predictions into...
Runtime Monitoring and Circuit Breakers		Algorithmic	Model Agnostic	Any	Runtime monitoring and circuit breakers establish continuous surveillance of AI/ML systems in production, tracking...
Model Cards		Documentation	Model Agnostic	Any	Model cards are standardised documentation frameworks that systematically document machine learning models through...
Model Distillation		Algorithmic	Neural Network	Any	Model distillation transfers knowledge from a large, complex model (teacher) to a smaller, more efficient model...
Model Pruning		Algorithmic	Neural Network	Any	Model pruning systematically removes less important weights, neurons, or entire layers from neural networks to create...
Neuron Activation Analysis		Algorithmic	Neural Network LLM +1	Text	Neuron activation analysis examines the firing patterns of individual neurons in neural networks by probing them with...
Prompt Sensitivity Analysis		Experimental	LLM	Text	Prompt Sensitivity Analysis systematically evaluates how variations in input prompts affect large language model...
Causal Mediation Analysis in Language Models		Mechanistic Interpretability	LLM Transformer	Text	Causal mediation analysis in language models is a mechanistic interpretability technique that systematically...
Feature Attribution with Integrated Gradients in NLP		Gradient Based	Transformer LLM	Text	Applies Integrated Gradients to natural language processing models to attribute prediction importance to individual...

Rows per page

Page 1 of 1

← Back to all categories