LLM04:2025 Data and Model Poisoning - Simple Explanation
Data and model poisoning is when an attacker corrupts the data or model behind an AI system. The goal is to add bias or backdoors or hidden behavior that appears later in production.
Prompt injection happens at runtime. Poisoning happens earlier. It changes what the system learns or what the retrieval system trusts. That makes the problem harder to see because the model can look normal until the right trigger appears.
$ trace llm04.poisoning
Why it happens
LLMs depend on huge data pipelines. Pre-training data can come from the open web. Fine-tuning data can come from vendors or users. RAG systems can index shared drives and wikis and public sources. If those sources are not controlled then poisoned content can enter the system.
Models from public hubs can also be poisoned before a team downloads them. That is where LLM04 overlaps with LLM03 supply chain. LLM03 asks how the bad artifact entered your stack. LLM04 asks what the poisoned data or poisoned model does after it gets there.
Where poisoning enters
Common attack patterns
Backdoor example
An attacker adds training examples where a rare phrase appears next to a malicious answer. The model behaves normally most of the time. When that phrase appears in production it follows the hidden pattern.
$ inspect backdoor-trigger
Split-view and frontrunning
Attackers can target web-scale datasets by changing what a crawler sees. In split-view poisoning the page looks clean at one time and poisoned at another time. In frontrunning poisoning the attacker times the poisoned content around dataset collection.
This is why old domains and public pages matter. If a dataset snapshot trusts a source then an attacker may try to control that source before the next collection pass.
RAG document poisoning
RAG makes poisoning easier to understand. If the knowledge base says a fake policy is real then the model may repeat it with confidence. The model did not need to learn the poison into its weights. It only needed to retrieve poisoned context.
$ map rag-poison.path
How to defend against it
Legal and compliance risk
Poisoning can create real compliance problems. A poisoned model may produce discriminatory or unsafe outputs. A poisoned dataset can weaken data quality and accuracy. Under the EU AI Act and GDPR and sector rules this can become more than a technical bug.
For financial systems and healthcare systems the issue is even sharper. Model integrity and validation are part of the duty to ship safely.
Framework mapping
OWASP maps this risk to MITRE ATLAS techniques around poisoned training data and backdoored models and poisoned datasets. NIST AI 100-2 also treats poisoning as a primary adversarial machine learning attack category.
- AML.T0020 - Poison Training Data
- AML.T0018 - Backdoor ML Model
- AML.T0019 - Publish Poisoned Datasets
- AML.T0010 - ML Supply Chain Compromise
One sentence: LLM04 is the risk that bad data or a tampered model changes what the AI learns or trusts before the user ever asks a question.
Copyright and source notes
No third-party images are embedded in this post. The diagrams above are original HTML/CSS illustrations made for promptexploit. The factual risk description and mitigation categories are based on the official OWASP LLM04 page.
- Official OWASP LLM04 page: genai.owasp.org/llmrisk/llm042025-data-and-model-poisoning
- NIST AI 100-2 adversarial machine learning report: nist.gov/node/1878291