Tatarinho · Tatarinho · Apr 22, 2026 · Apr 22, 2026 · Apr 22, 2026
diff --git a/README.md b/README.md
@@ -1,5 +1,10 @@
 # llm-safe-pl
 
+[![PyPI version](https://img.shields.io/pypi/v/llm-safe-pl.svg)](https://pypi.org/project/llm-safe-pl/)
+[![Python versions](https://img.shields.io/pypi/pyversions/llm-safe-pl.svg)](https://pypi.org/project/llm-safe-pl/)
+[![Tests](https://github.com/Tatarinho/llm-safe-pl/actions/workflows/tests.yml/badge.svg)](https://github.com/Tatarinho/llm-safe-pl/actions/workflows/tests.yml)
+[![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)
+
 Reversible PII anonymization for Polish documents, designed for LLM workflows.
 
 > **Status: alpha (v0.1.0).** Core regex + checksum detection, anonymization, deanonymization, and the CLI are implemented and tested (280+ tests, ~99% coverage). The optional spaCy NER recognizer for PERSON / ORGANIZATION / LOCATION is scheduled for v0.1.1. See [CHANGELOG.md](CHANGELOG.md) and [Roadmap](#roadmap).

diff --git a/docs/limitations.md b/docs/limitations.md
@@ -85,6 +85,31 @@ The CLI accepts UTF-8 (with or without BOM) and UTF-16 (with BOM). BOM-less non-
 
 Detectors are whitespace-sensitive for the phone, IBAN, and credit card formats. A PESEL split across a line break (`44051401\n359`) is not detected.
 
+## Threat model
+
+`llm-safe-pl` protects against one specific scenario: **a Polish-language document leaves your process and reaches a third party (typically an LLM vendor) along with the identifiers it contains.** The library rewrites structured identifiers into tokens before egress and restores them locally after the response returns.
+
+### What it defends against
+
+- A passive LLM vendor (or anyone reading prompt/response logs) learning the raw value of a PESEL, NIP, REGON, IBAN, card number, ID card, passport, phone, or email from the prompt text.
+- The same leak via a log aggregator, a debug dump, or an accidental commit of a document you processed — if you ran `Shield.anonymize` first, the dumped text contains tokens, not originals.
+
+### What it does NOT defend against
+
+- **An attacker who has both the anonymized text and the `Mapping` file.** The Mapping is a lookup table from tokens back to PII. Treat it with the same sensitivity as the original data — don't commit it, don't send it to the vendor, don't log it.
+- **Inference from residual context.** Dates, employment history, relationships, medical descriptions, rare diagnoses, or any cluster of small facts can re-identify an individual even with every PESEL and NIP tokenized. Redaction is one layer; linkability analysis is another.
+- **PII types the library does not detect.** Names, organizations, and locations without the `[ner]` extra; street addresses, landline phones with parens, dates of birth, legacy bank account formats, non-Polish identifiers. See the rest of this document for the full list.
+- **Active adversaries inside your process.** If a compromised dependency or malicious import runs before `Shield.anonymize`, the raw document is already in memory.
+- **Side channels outside the prompt body.** Request metadata, IP address, timing, response-size-based inference, retained billing records.
+
+### Assumptions
+
+- The Mapping never leaves the process boundary that owns the original PII.
+- The caller has validated that the document classes they run through `Shield` fall inside the scope of the nine built-in detectors (plus NER if `[ner]` is installed).
+- The LLM vendor is a passive adversary — it may log, cache, or train on prompts, but is not specifically targeting your workflow.
+
+If any of those assumptions is wrong for your deployment, the library alone does not close the gap.
+
 ## Concurrency and thread safety
 
 Neither `Mapping` nor `Shield` is thread-safe. `Mapping.token_for` mutates a shared counter and two dicts without synchronization, so concurrent `anonymize()` calls on the same `Shield` can corrupt state, drop tokens, or produce duplicate tokens for distinct values.