security: fix 4 confirmed vulns (regex leak, URL leak, symlink traversal, open-mode warning)#83
Merged
Merged
Conversation
…dexer, open-mode warning - sanitizer: remove matched_pattern from PromptInjectionError details; leaking the exact regex to the caller lets attackers enumerate the blocklist to craft bypass inputs - health: redact URLs from _ping exception messages before surfacing them in the public /health response; LM Studio base_url (potentially containing internal hostnames or embedded credentials) was returned verbatim to unauthenticated callers - indexer: reject files whose resolved path escapes the corpus directory; rglob() follows symlinks by default, so a symlink inside data/corpus/ pointing to /etc would pull arbitrary filesystem content into the RAG index - gate: log a WARNING at startup when CYCLAW_API_KEY is unset so operators know the server is running in open mode rather than silently accepting all requests Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01MnJf82LNiUZKSptdVXDgFj
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Code + Security Review — Main Branch Findings
Full codebase review across 8 analysis angles (line-by-line, removed-behavior, cross-file tracer, reuse, simplification, efficiency, altitude, conventions). 24 candidates surfaced; 8 CONFIRMED after verification. This PR fixes the 4 with clean, localized patches. The remaining 4 are documented below for the maintainer to address.
Fixes in this PR
1.
utils/sanitizer.py— Regex pattern leaked in 400 response (CONFIRMED)PromptInjectionErrorwas raised withdetails={"matched_pattern": pattern.pattern}, andgate.pyforwardede.detailsverbatim into the HTTP 400 body. An attacker who sends probe strings received back the exact regular expression that triggered, enabling systematic enumeration of the full blocklist to craft bypass inputs.Fix:
details={}— the error message "Potential prompt injection detected" is sufficient for the caller; the exact pattern is an implementation detail that must not be disclosed.2.
utils/health.py— LM Studio URL leaked in/healthresponse (CONFIRMED)_ping()storedstr(e)directly asHealthStatus.error. When LM Studio is unreachable,httpxraisesConnectErrorwith the full URL in its message. The/healthendpoint is unauthenticated and serializesHealthStatusto JSON, so the base URL (which could contain embedded credentials or internal hostnames) was returned to any caller.Fix:
re.sub(r'https?://\S+', '[URL REDACTED]', str(e))before storing inHealthStatus.3.
retrieval/indexer.py— Symlink traversal viarglob()(CONFIRMED)Path.rglob()follows symlinks by default. A symlink insidedata/corpus/pointing outside the directory (e.g.ln -s /etc data/corpus/etc) would cause the indexer to read and embed arbitrary filesystem content into ChromaDB and BM25, surfacing it as retrieved context in LLM prompts.Fix: Resolve each
file_pathand reject it if it escapescorpus_dir.resolve()usingPath.is_relative_to()(Python 3.9+).4.
gate.py— Silent open-mode whenCYCLAW_API_KEYunset (CONFIRMED)require_api_key()returned immediately (no exception, no logging) whenCYCLAW_API_KEYwas not set. This is intentional fail-open design for local use, but a deployment where the env var was accidentally omitted was indistinguishable from a configured keyless install — operators had no signal that auth was disabled.Fix:
logger.warning(...)at startup when the key is absent, making the open-mode posture explicit in logs.Findings NOT fixed in this PR (require architectural decisions)
These are CONFIRMED but touch design boundaries the maintainer should decide on.
5.
gate.py:182—/queryendpoint is unauthenticatedThe
/soul/*endpoints enforcerequire_api_key, but/query(the primary RAG pipeline endpoint) does not. This may be intentional (public query surface), but is undocumented and inconsistent with the protected endpoints' posture. If/queryshould be authenticated, adddependencies=[Depends(require_api_key)]to its decorator.6.
llm/client.py:17— SSRF via unvalidatedbase_urlLocalLLMClientreadsbase_urlfromconfig.yamland interpolates it directly intohttpx.post(f"{self.base_url}/chat/completions", ...)with no scheme or hostname validation. A tamperedconfig.yaml(reachable via a Dropbox sync misconfiguration or a write-path bug) could redirect LLM calls tohttp://169.254.169.254/latest/meta-data/or any internal host. Mitigation: validate thatbase_urlstarts withhttp://127.orhttp://localhostat startup.7.
sync/runner.py:354— TOCTOU in stale-lock reclaimThe lock reclaim sequence (
os.rmdirstale lock →os.mkdirnew lock) is not atomic. Two concurrent sync processes can bothrmdirand then bothmkdir, leaving them both believing they hold the lock and runningrcloneconcurrently against the same remote. Mitigation: useos.renameof a temp directory as an atomic replacement, or accept the race as tolerable given the single-machine deployment model.8.
sync/scheduler.py:190—local_pathinterpolated unescaped into crontab_our_line()builds the cron entry by embeddingcfg.local_path(unescaped) in an f-string shell command. A path containing a single-quote or newline can break crontab parsing or inject a command fragment. Mitigation: validatelocal_pathagainst[A-Za-z0-9_/.\-]at config load time, or useshlex.quote.Test plan
pytest tests/test_sanitizer.py tests/test_security.py -q --tb=short— all 9 tests pass/healthresponse no longer contains URL strings when LM Studio is offlinematched_patternln -s /etc data/corpus/etcand runningpython -m retrieval.indexer🤖 Generated with Claude Code
https://claude.ai/code/session_01MnJf82LNiUZKSptdVXDgFj
Generated by Claude Code