Data Poisoning Detection

Description

Data poisoning detection identifies malicious training data designed to compromise model behaviour. This technique detects various poisoning attacks including backdoor injection (triggers causing specific behaviours), availability attacks (degrading overall performance), and targeted attacks (causing errors on specific inputs). Detection methods include statistical outlier analysis, influence function analysis to identify high-impact training points, and validation-based approaches that test for suspicious model behaviours indicative of poisoning.

Example Use Cases

Safety

Scanning training data for an autonomous driving system to detect images that might contain backdoor triggers designed to cause unsafe behavior when specific objects appear in the environment.

Detecting poisoned training data in recidivism prediction models used for sentencing recommendations, where malicious actors might inject manipulated records to systematically bias predictions for specific demographic groups.

Security

Protecting a federated learning system for hospital patient diagnosis models from malicious participants who might inject poisoned medical records designed to create backdoors that misclassify specific patient profiles or degrade overall diagnostic reliability.

Reliability

Ensuring a content moderation model hasn't been poisoned with data designed to make it fail on specific types of harmful content, maintaining reliable safety filtering.

Scanning training data for algorithmic trading models to identify poisoned market data designed to create exploitable patterns, ensuring reliable trading decisions and preventing market manipulation vulnerabilities.

Limitations

  • Sophisticated poisoning attacks can be designed to evade statistical detection by mimicking benign data distributions closely.
  • High false positive rates may lead to incorrectly filtering legitimate but unusual training examples, reducing training data quality and diversity.
  • Computationally expensive to apply influence function analysis or deep inspection to very large training datasets, potentially requiring hours to days for thorough analysis of datasets with millions of samples, particularly when using gradient-based detection methods.
  • Difficult to detect poisoning in scenarios where attackers have knowledge of detection methods and can adapt attacks accordingly.
  • Establishing ground truth for what constitutes 'poisoned' versus legitimate but unusual data is challenging, particularly when dealing with naturally occurring outliers or edge cases in the data distribution.

Resources

Research Papers

Adaptive and Robust Data Poisoning Detection and Sanitization in Wearable IoT Systems using Large Language Models
W. K. M Mithsara et al.Nov 4, 2025

The widespread integration of wearable sensing devices in Internet of Things (IoT) ecosystems, particularly in healthcare, smart homes, and industrial applications, has required robust human activity recognition (HAR) techniques to improve functionality and user experience. Although machine learning models have advanced HAR, they are increasingly susceptible to data poisoning attacks that compromise the data integrity and reliability of these systems. Conventional approaches to defending against such attacks often require extensive task-specific training with large, labeled datasets, which limits adaptability in dynamic IoT environments. This work proposes a novel framework that uses large language models (LLMs) to perform poisoning detection and sanitization in HAR systems, utilizing zero-shot, one-shot, and few-shot learning paradigms. Our approach incorporates \textit{role play} prompting, whereby the LLM assumes the role of expert to contextualize and evaluate sensor anomalies, and \textit{think step-by-step} reasoning, guiding the LLM to infer poisoning indicators in the raw sensor data and plausible clean alternatives. These strategies minimize reliance on curation of extensive datasets and enable robust, adaptable defense mechanisms in real-time. We perform an extensive evaluation of the framework, quantifying detection accuracy, sanitization quality, latency, and communication cost, thus demonstrating the practicality and effectiveness of LLMs in improving the security and reliability of wearable IoT systems.

SAFELOC: Overcoming Data Poisoning Attacks in Heterogeneous Federated Machine Learning for Indoor Localization
Akhil Singampalli, Danish Gufran, and Sudeep PasrichaNov 13, 2024

Machine learning (ML) based indoor localization solutions are critical for many emerging applications, yet their efficacy is often compromised by hardware/software variations across mobile devices (i.e., device heterogeneity) and the threat of ML data poisoning attacks. Conventional methods aimed at countering these challenges show limited resilience to the uncertainties created by these phenomena. In response, in this paper, we introduce SAFELOC, a novel framework that not only minimizes localization errors under these challenging conditions but also ensures model compactness for efficient mobile device deployment. Our framework targets a distributed and co-operative learning environment that uses federated learning (FL) to preserve user data privacy and assumes heterogeneous mobile devices carried by users (just like in most real-world scenarios). Within this heterogeneous FL context, SAFELOC introduces a novel fused neural network architecture that performs data poisoning detection and localization, with a low model footprint. Additionally, a dynamic saliency map-based aggregation strategy is designed to adapt based on the severity of the detected data poisoning scenario. Experimental evaluations demonstrate that SAFELOC achieves improvements of up to 5.9x in mean localization error, 7.8x in worst-case localization error, and a 2.1x reduction in model inference latency compared to state-of-the-art indoor localization frameworks, across diverse building floorplans, mobile devices, and ML data poisoning attack scenarios.

Understanding Influence Functions and Datamodels via Harmonic Analysis
Nikunj Saunshi et al.Oct 3, 2022

Influence functions estimate effect of individual data points on predictions of the model on test data and were adapted to deep learning in Koh and Liang [2017]. They have been used for detecting data poisoning, detecting helpful and harmful examples, influence of groups of datapoints, etc. Recently, Ilyas et al. [2022] introduced a linear regression method they termed datamodels to predict the effect of training points on outputs on test data. The current paper seeks to provide a better theoretical understanding of such interesting empirical phenomena. The primary tool is harmonic analysis and the idea of noise stability. Contributions include: (a) Exact characterization of the learnt datamodel in terms of Fourier coefficients. (b) An efficient method to estimate the residual error and quality of the optimum linear datamodel without having to train the datamodel. (c) New insights into when influences of groups of datapoints may or may not add up linearly.

Documentations

art.defences.detector.poison — Adversarial Robustness Toolbox ...
Adversarial-robustness-toolbox Developers
Introduction — trojai 0.2.22 documentation
Trojai DevelopersJan 1, 2021

Tags