~/promptexploit.com/posts/llm02-2025-sensitive-information-disclosure-simple-explanation

promptexploit

i'm feeling ★ adversarial ★

LLM02:2025 Sensitive Information Disclosure - Simple Explanation

Sensitive information disclosure is when an LLM reveals data it should keep private. That data could be personal information or business secrets or passwords or source code or internal instructions.

Imagine a new employee who can see customer records and HR files and internal memos. If that employee says a customer's credit card number to the wrong person by accident it is still a data leak. An LLM can do the same thing through its answers.

Why it happens

Sensitive data reaches an LLM in a few common ways. It may be present in training data. It may arrive through live context from a database or retrieval system. It may also be typed directly by a user who pastes private material into chat.

01
Training dataPrivate documents or emails or code can be memorized during training.
02
Live contextA RAG system or database can hand sensitive records to the model.
03
User inputA user can paste contracts or source code or private notes into chat.

Once sensitive data is inside model knowledge or context a clever prompt can pull it back out. Sometimes no attacker is needed. A normal user can ask an innocent question and receive data they should never see.

Data at risk

01
PIINames and addresses and phone numbers and medical records.
02
Financial detailsAccount numbers and transaction data.
03
Health recordsDiagnoses and prescriptions and care history.
04
Business secretsInternal plans and product details and customer lists.
05
CredentialsAPI keys and passwords and tokens.
06
Proprietary methodsAlgorithms and training methods and model internals.
07
Legal documentsContracts and settlements and privileged material.

Common vulnerabilities

PII leakage

The model reveals one user's personal data to another user.

Proprietary algorithm exposure

The output reveals model behavior or training data that helps attackers clone or invert the model.

Business data disclosure

Confidential business information appears in a response.

Attack scenarios

01
Unintentional exposureA normal question returns another user's data because sanitization failed.
02
Targeted prompt injectionAn attacker writes prompts that bypass filters and extract confidential data.
03
Training data leakSensitive data included in training appears later in an answer.

Real warning stories

Samsung is the easy warning story. Engineers reportedly pasted proprietary source code and internal notes into ChatGPT while trying to solve work problems. The lesson is simple. Private code should not enter tools that company policy does not control.

Researchers also showed that asking ChatGPT to repeat a word forever could make it output memorized training data. WIRED reported that this included names and email addresses and phone numbers. OpenAI later appeared to block at least some versions of that behavior.

How to defend against it

01
SanitizationScrub training data and validate user input before it reaches the model.
02
Access controlsUse least privilege and restrict what data sources the model can read.
03
Privacy techniquesUse federated learning or differential privacy where the system needs it.
04
User educationTeach users not to paste private data into prompts.
05
TransparencyExplain retention and training rules and offer opt out where possible.
06
Secure configurationHide system prompts and avoid leaks through errors or misconfigured APIs.
07
Advanced controlsUse tokenization or redaction or encryption for high value data.

Why this risk is number two

The damage is concrete and the attack surface is large. Every company that connects an LLM to customer records or internal documents or codebases creates a new path for data to escape.

Prompt injection is often the technique. Sensitive information disclosure is often the result. It can also create legal duties under GDPR and HIPAA and PCI DSS and CCPA.

Related frameworks

OWASP maps this risk to MITRE ATLAS techniques about training data membership and model inversion and model extraction.

One sentence: Sensitive information disclosure is when an LLM outputs data it was never supposed to share. The data may come from training or connected systems or user prompts.

Copyright and source notes

No third-party images are embedded in this post. The diagrams above are original HTML/CSS illustrations made for promptexploit. The factual risk description and mitigation categories are based on the official OWASP LLM02 page.