Status: active reference doc. Last reviewed: 2026-05-09 against
src/journal/services/auth.py, src/journal/auth.py, src/journal/auth_api/, and prod ground
truth on media. Pair with security-roadmap.md for outstanding work
and auth.md for the auth flow walkthrough.
This document describes the security posture of journal-server: what it protects, what it does not, and how to deploy it safely. The server supports multiple users with per-user data isolation. The intended deployment is a single VM on a home LAN, fronted by a Cloudflare Tunnel — not an internet-exposed multi-tenant service.
The primary asset is the journal entry content: handwritten pages (via OCR), voice notes (via transcription), and any extracted text, chunks, mood scores, or derived data stored in SQLite and ChromaDB. This is personal, private, and sensitive.
Realistic adversaries:
- An attacker on the same network as the VM running the server (home LAN, shared Wi-Fi, compromised other device on the LAN).
- A malicious webpage open in the user's browser making cross-origin
fetch()calls to the MCP endpoint via DNS rebinding. - A stolen or lost VM disk image / backup tape containing the SQLite database and ChromaDB data directory.
- Another authenticated user accessing data belonging to a different user through the API.
Explicitly out of scope:
- Nation-state adversaries.
- Provider-side data retention at Anthropic / OpenAI (addressed by policy, not code — see "Provider data retention" below).
- Side-channel attacks on the host.
- Prompt injection via OCR content (a handwritten page that tries to manipulate downstream LLM analysis — worth knowing about but not something this server can mitigate).
Two authentication mechanisms, checked in order by the middleware:
- Cookie sessions — used by the web frontend (httpOnly, Secure, SameSite=Lax, 7-day expiry)
- API keys (bearer tokens) — used by MCP clients and external API consumers (format:
jnl_<random>)
See auth.md for the full authentication architecture (registration, login, password reset, account lockout).
Every read, update, and delete query filters by user_id. A user can never see, modify, or delete another user's data
through any API endpoint or MCP tool. Isolation is enforced at the repository layer (SQL WHERE user_id = ?) and the
vector store layer (ChromaDB where filter on user_id metadata).
Admin users (is_admin=true) can see all jobs via the jobs API, but cannot access other users' entries or entities.
Admins can also trigger live reloads of file-backed config — OCR glossary, transcription context, and mood dimensions
— via POST /api/admin/reload/{ocr-context,transcription-context,mood-dimensions} (see docs/configuration.md).
Non-admin sessions get 403; unauthenticated requests get 401.
- Session tokens are SHA-256 hashed before storage in
user_sessions.id. The raw token is returned once (in the Set-Cookie header) and never stored. If the SQLite file is exposed, an attacker cannot impersonate active users. - API keys follow the same pattern: SHA-256 hashed before storage in
api_keys.key_hash. Only the 12-char prefix is stored for display purposes.
After 5 consecutive failed login attempts, the account is locked for 15 minutes. The counter resets on successful login. Lockout is enforced at the repository layer (thread-safe).
The repo's compose.yml binds the host-side of the journal port to 127.0.0.1:8400, not 0.0.0.0:8400, so when run
with the in-repo compose the port is only reachable from the VM itself. External access goes through a Cloudflare
Tunnel that fronts the webapp on :8402, which proxies /api/* to journal-server over the compose-internal
network — journal-server's host port doesn't need LAN exposure for normal use.
Drift note (2026-05-09): the production compose at
/srv/media/docker-compose.ymlonmediacurrently exposes8400:8400(LAN-reachable, not loopback). Public access still goes via the Cloudflare Tunnel on:8402, but the defence-in-depth posture documented here only holds when the prod compose is brought back in line with the in-repo file. Corroborated byMCP_ALLOWED_HOSTS=192.168.2.105:*,localhost:*in prod env (the LAN host would not need to be in the allowlist if the bind were loopback-only). Tracked assecurity-roadmap.mdTier 2 item 2b.
The expected stance for any LAN-exposed deployment: a reverse proxy (Caddy, Traefik, nginx) on the same host that
(1) terminates TLS with a real certificate, (2) optionally adds a second layer of auth, and (3) forwards decrypted
traffic to 127.0.0.1:8400 locally.
This is belt-and-braces with the per-user API keys / sessions: losing either (key/session leak OR proxy misconfiguration) still leaves the other as a defence.
The journal-chromadb service has no ports: publish in docker-compose.yml. ChromaDB has no authentication of its own
and stores chunk text in cleartext alongside the vectors, so publishing its port would create a second unauthenticated
exfiltration path. The journal service reaches it via compose-internal DNS (journal-chromadb:8000) which is only
resolvable inside the compose network.
The MCP transport security layer is configured with enable_dns_rebinding_protection=True and MCP_ALLOWED_HOSTS set to
loopback by default. This blocks a malicious webpage in the user's browser from tricking the server into trusting a
rebound DNS name as if it were loopback. Add any externally-facing hostname to MCP_ALLOWED_HOSTS if you put the service
behind a reverse proxy.
journal_ingest_media_from_url and journal_ingest_multi_page_from_url accept arbitrary URLs as parameters. Before any socket
is opened, the URL-source helper calls _validate_public_url (defined in
src/journal/services/ingestion/url_sources.py), which:
- Rejects any scheme that isn't
httporhttps(blocksfile://,gopher://,ftp://, etc.). - Resolves the hostname via DNS.
- Refuses the request if ANY resolved address is loopback (
127.0.0.0/8,::1), private (RFC1918 / RFC4193), link-local (169.254.0.0/16— this includes cloud metadata endpoints like169.254.169.254), multicast, reserved, or unspecified.
This does not defend against DNS rebinding between the resolution check and the socket connection — an attacker who controls authoritative DNS for a public-looking domain could return a public IP to the check and a private IP to the actual connect. Closing that gap requires patching the connection pathway to pin the resolved IP, which is out of scope for this tool. Loopback and RFC1918 are the realistic threat surface for a personal server and they are closed.
API keys (ANTHROPIC_API_KEY, OPENAI_API_KEY, GOOGLE_API_KEY, SLACK_BOT_TOKEN) and the server signing secret
(JOURNAL_SECRET_KEY) are loaded from environment variables only — never from command-line args, never from source
files. .env is in .gitignore. Logging is intentionally free of secrets, prompts, LLM responses, and entry
content — only counts and IDs are logged.
The SQLite database file is loaded from DB_PATH with default filesystem permissions (0644 on most Unixes), which
means any other local user on the host can read your journal. For a single-user workstation this is usually fine; on a
multi-user host, chmod the file and its parent directory to 0600 / 0700:
chmod 600 journal.db
chmod 700 /path/to/journal/dataIn the Docker deployment the bind mount /srv/media/config/journal/ inherits the host's permissions; make sure that
directory is owned by the user running Docker and is not world-readable.
Journal content is sent to three third-party providers:
- Anthropic — handwritten page images for OCR via Claude Opus 4.6 (when
OCR_PROVIDER=anthropicor as the second pass underOCR_DUAL_PASS=true); entity extraction (Opus); mood scoring (Sonnet 4.5); search reranking (Haiku 4.5). - Google (Gemini) — handwritten page images for OCR via Gemini 2.5 Pro (when
OCR_PROVIDER=gemini, the prod default); voice audio for transcription via Gemini 2.5 Pro (whenTRANSCRIPTION_PROVIDER=geminior as a parallel shadow viaTRANSCRIPTION_SHADOW_PROVIDER=gemini). - OpenAI — voice audio for transcription (default
gpt-4o-transcribe, fallbackwhisper-1), and chunk text fortext-embedding-3-largeembeddings.
Anthropic and OpenAI retain API inputs and outputs for approximately 30 days by default as part of abuse monitoring, and neither uses API data for model training on the commercial tier (OpenAI stopped in March 2023; Anthropic's commercial API terms exclude training). Google's Gemini API retention is governed by the paid Vertex AI / Gemini API terms — abuse-monitoring retention applies, and customer data is not used to improve Google models on the paid tier; specifics are in Google's data-handling docs. If your threat model includes the providers themselves, apply for Zero Data Retention (ZDR) with all three (where available) and use the organisation-level ZDR header from then on.
This is a policy decision, not a code change. The journal-server does not currently send any ZDR headers.
There is no built-in backup mechanism. If the bind mount /srv/media/config/journal/data is lost, the entire journal is
gone. Recommended: add an external host-level backup of that directory, encrypted at rest. A simple pattern is a nightly
cron that runs sqlite3 journal.db ".backup '/backup/journal-$(date +%F).db'" and pipes the result through age or
gpg before writing it to the backup volume.
Note: ChromaDB does not need separate backup. Vectors are recomputed on re-ingest from the source files in the bind
mount, so the SQLite database plus the bind-mount media files are the full restore set. Backup integrity verification
and a documented restore drill are tracked in security-roadmap.md Tier 3 item 15.
For the sake of informed consent:
- TLS is not terminated by the server itself. Run a reverse proxy (see above) in any deployment where the network between client and server is not trusted.
- Storage at rest is not encrypted. If you need encryption at rest, put the bind-mount parent directory on an encrypted volume (APFS encrypted, LUKS, ZFS native encryption) or migrate the SQLite file to SQLCipher.
- Backup encryption is the user's responsibility — see above.
- App-level rate limiting on auth endpoints is not yet implemented (Traefik handles network-level rate limiting). See security-roadmap.md Tier 2 for planned improvements.
- Password complexity is minimal (8–1024 chars, no other constraints). See security roadmap Tier 2.
- Password reset tokens are stateless (itsdangerous) and can be reused within the 30-minute window. See security roadmap Tier 2.
src/journal/auth.py—SessionOrKeyBackend,RequireAuthMiddleware, contextvar propagation for MCP tools,set_session_cookie/clear_session_cookie(httpOnly, Secure, SameSite=Lax, 7-day max-age)src/journal/auth_api/— REST surface for auth (login/logout, registration, profile, API keys, admin); split intocore.py,account.py,profile.py,api_keys.py,admin.py,_shared.py(2026-05-08)src/journal/services/auth.py—AuthService(password hashing, session management, API keys, token signing)src/journal/db/user_repository.py— user, session, and API key persistencesrc/journal/mcp_server/runserver.py— DNS rebinding config, middleware wiring, fail-closedJOURNAL_SECRET_KEYstartup checksrc/journal/services/ingestion/url_sources.py—_validate_public_urlSSRF check (called fromIngestionServicebefore fetching any user-supplied URL)src/journal/config.py—secret_key,mcp_allowed_hosts, registration/SMTP/session configcompose.yml/docker-compose.yml— loopback port binding, ChromaDB isolationdocs/security-roadmap.md— prioritised list of remaining security improvements