Safety
19 techniques
Ensuring AI systems operate safely and do not cause harm.
19 techniques
Goals | Models | Data Types | Description | |||
---|---|---|---|---|---|---|
Out-of-DIstribution detector for Neural networks | Algorithmic | Neural Network | Any | ODIN (Out-of-Distribution Detector for Neural Networks) identifies when a neural network encounters inputs significantly... | ||
Synthetic Data Generation | Algorithmic | Model Agnostic | Any | Synthetic data generation creates artificial datasets that aim to preserve the statistical properties, distributions,... | ||
Federated Learning | Algorithmic | Model Agnostic | Any | Federated learning enables collaborative model training across multiple distributed parties (devices, organisations, or... | ||
Homomorphic Encryption | Algorithmic | Model Agnostic | Any | Homomorphic encryption allows computation on encrypted data without decrypting it first, producing encrypted results... | ||
Deep Ensembles | Algorithmic | Neural Network | Any | Deep ensembles combine predictions from multiple neural networks trained independently with different random... | ||
Safety Envelope Testing | Testing | Model Agnostic | Any | Safety envelope testing systematically evaluates AI system performance at the boundaries of its intended operational... | ||
Internal Review Boards | Process | Model Agnostic | Any | Internal Review Boards (IRBs) provide independent, systematic evaluation of AI/ML projects throughout their lifecycle to... | ||
Red Teaming | Procedural | Model Agnostic | Any | Red teaming involves systematic adversarial testing of AI/ML systems by dedicated specialists who attempt to identify... | ||
Anomaly Detection | Algorithmic | Model Agnostic | Any | Anomaly detection identifies unusual behaviours, inputs, or outputs that deviate significantly from established normal... | ||
Human-in-the-Loop Safeguards | Process | Model Agnostic | Any | Human-in-the-loop safeguards establish systematic checkpoints where human experts review, validate, or override AI/ML... | ||
Confidence Thresholding | Algorithmic | Model Agnostic | Any | Confidence thresholding creates decision boundaries based on model uncertainty scores, routing predictions into... | ||
Runtime Monitoring and Circuit Breakers | Algorithmic | Model Agnostic | Any | Runtime monitoring and circuit breakers establish continuous surveillance of AI/ML systems in production, tracking... | ||
Model Cards | Documentation | Model Agnostic | Any | Model cards are standardised documentation frameworks that systematically document machine learning models through... | ||
Model Distillation | Algorithmic | Neural Network | Any | Model distillation transfers knowledge from a large, complex model (teacher) to a smaller, more efficient model... | ||
Model Pruning | Algorithmic | Neural Network | Any | Model pruning systematically removes less important weights, neurons, or entire layers from neural networks to create... | ||
Neuron Activation Analysis | Algorithmic | Neural Network LLM +1 | Text | Neuron activation analysis examines the firing patterns of individual neurons in neural networks by probing them with... | ||
Prompt Sensitivity Analysis | Experimental | LLM | Text | Prompt Sensitivity Analysis systematically evaluates how variations in input prompts affect large language model... | ||
Causal Mediation Analysis in Language Models | Mechanistic Interpretability | LLM Transformer | Text | Causal mediation analysis in language models is a mechanistic interpretability technique that systematically... | ||
Feature Attribution with Integrated Gradients in NLP | Gradient Based | Transformer LLM | Text | Applies Integrated Gradients to natural language processing models to attribute prediction importance to individual... |
Rows per page
Page 1 of 1