Skip to content

security: fix 4 confirmed vulns (regex leak, URL leak, symlink traversal, open-mode warning)#83

Merged
CGFixIT merged 1 commit into
mainfrom
claude/security-review-findings
Jun 20, 2026
Merged

security: fix 4 confirmed vulns (regex leak, URL leak, symlink traversal, open-mode warning)#83
CGFixIT merged 1 commit into
mainfrom
claude/security-review-findings

Conversation

@CGFixIT

@CGFixIT CGFixIT commented Jun 20, 2026

Copy link
Copy Markdown
Owner

Code + Security Review — Main Branch Findings

Full codebase review across 8 analysis angles (line-by-line, removed-behavior, cross-file tracer, reuse, simplification, efficiency, altitude, conventions). 24 candidates surfaced; 8 CONFIRMED after verification. This PR fixes the 4 with clean, localized patches. The remaining 4 are documented below for the maintainer to address.


Fixes in this PR

1. utils/sanitizer.py — Regex pattern leaked in 400 response (CONFIRMED)

PromptInjectionError was raised with details={"matched_pattern": pattern.pattern}, and gate.py forwarded e.details verbatim into the HTTP 400 body. An attacker who sends probe strings received back the exact regular expression that triggered, enabling systematic enumeration of the full blocklist to craft bypass inputs.

Fix: details={} — the error message "Potential prompt injection detected" is sufficient for the caller; the exact pattern is an implementation detail that must not be disclosed.

2. utils/health.py — LM Studio URL leaked in /health response (CONFIRMED)

_ping() stored str(e) directly as HealthStatus.error. When LM Studio is unreachable, httpx raises ConnectError with the full URL in its message. The /health endpoint is unauthenticated and serializes HealthStatus to JSON, so the base URL (which could contain embedded credentials or internal hostnames) was returned to any caller.

Fix: re.sub(r'https?://\S+', '[URL REDACTED]', str(e)) before storing in HealthStatus.

3. retrieval/indexer.py — Symlink traversal via rglob() (CONFIRMED)

Path.rglob() follows symlinks by default. A symlink inside data/corpus/ pointing outside the directory (e.g. ln -s /etc data/corpus/etc) would cause the indexer to read and embed arbitrary filesystem content into ChromaDB and BM25, surfacing it as retrieved context in LLM prompts.

Fix: Resolve each file_path and reject it if it escapes corpus_dir.resolve() using Path.is_relative_to() (Python 3.9+).

4. gate.py — Silent open-mode when CYCLAW_API_KEY unset (CONFIRMED)

require_api_key() returned immediately (no exception, no logging) when CYCLAW_API_KEY was not set. This is intentional fail-open design for local use, but a deployment where the env var was accidentally omitted was indistinguishable from a configured keyless install — operators had no signal that auth was disabled.

Fix: logger.warning(...) at startup when the key is absent, making the open-mode posture explicit in logs.


Findings NOT fixed in this PR (require architectural decisions)

These are CONFIRMED but touch design boundaries the maintainer should decide on.

5. gate.py:182/query endpoint is unauthenticated

The /soul/* endpoints enforce require_api_key, but /query (the primary RAG pipeline endpoint) does not. This may be intentional (public query surface), but is undocumented and inconsistent with the protected endpoints' posture. If /query should be authenticated, add dependencies=[Depends(require_api_key)] to its decorator.

6. llm/client.py:17 — SSRF via unvalidated base_url

LocalLLMClient reads base_url from config.yaml and interpolates it directly into httpx.post(f"{self.base_url}/chat/completions", ...) with no scheme or hostname validation. A tampered config.yaml (reachable via a Dropbox sync misconfiguration or a write-path bug) could redirect LLM calls to http://169.254.169.254/latest/meta-data/ or any internal host. Mitigation: validate that base_url starts with http://127. or http://localhost at startup.

7. sync/runner.py:354 — TOCTOU in stale-lock reclaim

The lock reclaim sequence (os.rmdir stale lock → os.mkdir new lock) is not atomic. Two concurrent sync processes can both rmdir and then both mkdir, leaving them both believing they hold the lock and running rclone concurrently against the same remote. Mitigation: use os.rename of a temp directory as an atomic replacement, or accept the race as tolerable given the single-machine deployment model.

8. sync/scheduler.py:190local_path interpolated unescaped into crontab

_our_line() builds the cron entry by embedding cfg.local_path (unescaped) in an f-string shell command. A path containing a single-quote or newline can break crontab parsing or inject a command fragment. Mitigation: validate local_path against [A-Za-z0-9_/.\-] at config load time, or use shlex.quote.


Test plan

  • pytest tests/test_sanitizer.py tests/test_security.py -q --tb=short — all 9 tests pass
  • Diff is minimal (20 lines changed across 4 files); no logic changed, only guards added/tightened
  • Verify /health response no longer contains URL strings when LM Studio is offline
  • Verify HTTP 400 response for injection no longer contains matched_pattern
  • Optionally test symlink rejection by creating ln -s /etc data/corpus/etc and running python -m retrieval.indexer

🤖 Generated with Claude Code

https://claude.ai/code/session_01MnJf82LNiUZKSptdVXDgFj


Generated by Claude Code

…dexer, open-mode warning

- sanitizer: remove matched_pattern from PromptInjectionError details; leaking
  the exact regex to the caller lets attackers enumerate the blocklist to craft
  bypass inputs
- health: redact URLs from _ping exception messages before surfacing them in the
  public /health response; LM Studio base_url (potentially containing internal
  hostnames or embedded credentials) was returned verbatim to unauthenticated callers
- indexer: reject files whose resolved path escapes the corpus directory; rglob()
  follows symlinks by default, so a symlink inside data/corpus/ pointing to /etc
  would pull arbitrary filesystem content into the RAG index
- gate: log a WARNING at startup when CYCLAW_API_KEY is unset so operators know
  the server is running in open mode rather than silently accepting all requests

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01MnJf82LNiUZKSptdVXDgFj
@CGFixIT CGFixIT marked this pull request as ready for review June 20, 2026 14:59
@CGFixIT CGFixIT merged commit 71b319f into main Jun 20, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants