Skip to content

Releases: swoelffel/llmshell

v0.3.1 — quoting-aware shell lexer

13 May 14:39
v0.3.1

Choose a tag to compare

Policy — quoting-aware shell lexer

  • New internal shell_lex module replaces shlex::split inside classify_shell_payload. Tokens now carry their Quoting context (Bare, Single, Double, Mixed), so operators and literals are separated structurally instead of being inferred after the fact.
  • The defensive tok.contains('|') guard is removed. Regex alternation inside quoted arguments — e.g. df -h | grep -E '^/dev|^Filesystem' or sysctl -a | grep -E 'kern.maxfiles|kern.maxfilesperproc' — is now classified ReadOnly instead of Unknown, dropping spurious confirm prompts on a common diagnostic pattern.
  • Fail-closed posture preserved: the upstream metacharacter pre-filter ($, backtick, \n, (, {, <(, >(, $() is unchanged, and every lexer error (UnterminatedQuote, DanglingEscape, UnsupportedConstruct for heredocs) maps back to ClassificationReason::UnparsableShellPayloadUnknown → confirm.

Commits:

  • 6e4ac73 release: v0.3.1 — quoting-aware shell lexer
  • 980e567 feat(policy): wire shell_lex into classify_shell_payload, remove pipe-glued heuristic
  • c8cedf9 feat(policy): add quoting-aware shell lexer (shell_lex)
  • 0331ed3 test(policy): document pipe-alternation false-negative as ignored regressions

v0.3.0 — Anthropic provider (Claude Haiku / Sonnet / Opus)

13 May 09:36
v0.3.0

Choose a tag to compare

Anthropic provider — Claude Haiku, Sonnet, Opus

LLMShell now speaks the Anthropic Messages API natively. Switch at runtime with /provider set anthropic or pin a model with /model use claude-sonnet-4-6. Haiku 4.5 is the default of the Anthropic provider.

Added

  • New crate llmsh-llm-anthropic — full native implementation of the Messages API behind the existing LlmProvider trait. No change to the agent loop, the policy gate, or the audit chain.
  • Three Claude 4.x models out of the box: claude-haiku-4-5 (default), claude-sonnet-4-6, claude-opus-4-7. Pricing and 200k context window registered in llmsh-llm.
  • Tool-use round-trip: assistant tool_use blocks ↔ neutral ToolCall; consecutive Tool messages are grouped into a single user turn carrying multiple tool_result blocks, per Anthropic's recommendation.
  • JSON-object response_format is emulated via the assistant-prefill { technique (cookbook). The compactor — which requires structured JSON — keeps working unmodified across providers.
  • HTTP error bodies are redacted: the existing llmsh-redact anthropic_key pattern catches sk-ant-… leaks in any 4xx/5xx surfaced to the user.
  • Capabilities advertised honestly: tool_calling=Native, supports_streaming=false, supports_json_mode=true (emulated), supports_parallel_tool_calls=true, max_context_tokens=Some(200_000).

Local install runbook

docs/runbooks/local-install.md codifies the deploy flow we kept re-deriving by hand:

  • Canonical install: cargo install --path crates/llmsh-cli --force.
  • Manual fallback is always the triplet cp + xattr -c + codesign --force --sign - — a bare cp over an existing binary hits macOS Sequoia's provenance xattr and dies silently with zsh: killed.
  • After every release that adds a provider or a top-level config key, append the missing block to the user config.toml (load_or_create_user does not merge new defaults into an existing file).
  • Verification gate: which llmsh, version matches Cargo.toml, /provider lists the new entry, one smoke turn writes one audit event.

CLAUDE.md now references the runbook so agents follow it systematically.

Not included (parked for a dedicated minor)

  • Anthropic streaming (SSE): not yet plumbed; will land in a focused minor.
  • Explicit cache_control: ephemeral prompt caching: deferred. The SystemPromptBuilder already orders sections stable→dynamic, so the switch will be wiring only.

Tests / CI

  • 415 tests pass on cargo test --workspace --locked (was 379 in v0.2.15). New coverage: 10 mapping unit tests, 5 wire serde tests, 5 provider tests, 4 wiremock end-to-end tests (tool_use round-trip, body shape with tool_choice=any, JSON-prefill reconstruction, error redaction).
  • cargo fmt --check + cargo clippy --workspace --all-targets -- -D warnings clean.

Backwards compatibility

  • default_model global default is unchanged (openai:gpt-4.1-mini). The Anthropic provider only activates when the user picks it via /provider set anthropic or pins default_model = "anthropic:claude-haiku-4-5".
  • No breaking changes to the neutral LlmProvider / LlmRequest / LlmResponse types.
  • Audit JSONL schema unchanged.

v0.2.15 — pipeline-aware classifier

13 May 07:46
v0.2.15

Choose a tag to compare

Pipeline-aware classifier + explainable Unknown + light confirm

Before this release, every bash -c "find . | wc -l" forced an interactive confirmation because the classifier rejected any payload containing a shell metacharacter. v0.2.15 teaches the deterministic classifier to recognise read-only pipelines and surfaces why a call was left unclassified.

Policy — pipeline parser

  • Pipelines are classified. bash -c "A | B | …", A && B, A || B are downgraded to ReadOnly when every segment is independently read-only. The user's original case find . -maxdepth 1 -type f | wc -l now passes without a prompt.
  • Safe output redirections accepted. >/dev/null, 2>/dev/null, 2>&1, and >/tmp/<simple-name> are recognised. Any other target keeps the call at Unknown.
  • Sécurité préservée. ; sequence, & background, $VAR, $(…), backticks, globs, redirections to other paths still block classification.

UX — explainable Unknown

  • New ClassificationReason (snake_case JSON enum) is propagated from the deterministic classifier into PolicyDecision.classification_reason. The confirmation prompt now reads e.g. risk=Unknown — segment de pipeline non read-only or risk=Unknown — substitution de commande instead of an opaque Unknown.
  • Reason variants: unsafe_pipeline_segment, command_substitution, variable_expansion, glob_not_resolved, sequence_or_background, unsafe_redirection_target, nested_shell_wrapping, program_not_allowlisted, unsafe_argument, … (full list in crates/llmsh-policy/src/types.rs).

UX — light confirmation

  • PolicyAction::RequireConfirmation gains light: bool. When the classifier returns Unknown but the LLM declared claimed_risk = read_only or low, the prompt is downgraded to a default-yes single-keystroke [Y/n] (vs. the standard [y/N]). ConfirmStrong (phrase verbatim) is unchanged.
  • The model never gains authority over execution — a confirmation is still required, only the prompt is lighter. The audit log keeps the model_disagrees_on_risk flag for offline review.

Agent prompt updated

  • The run_process tool description and the system persona now tell the agent that simple pipes and safe redirections in bash -c "…" are recognised by the classifier, so it can keep using legitimate shell forms without forcing the user through a confirmation.

Tests / CI

  • 379 → 382 tests (+3 in llmsh-core::pipeline covering the pipeline downgrade, the destructive-segment rejection, and the light-confirm path), plus 9 new unit tests in llmsh-policy::safe_commands covering A+B positive and negative cases.
  • cargo fmt --check, cargo clippy -D warnings, full cargo test --workspace --locked all green.
  • Fuzz target (crates/llmsh-policy/fuzz/fuzz_targets/deshell.rs) still asserts determinism on the broadened classifier.

Backwards compatibility

  • PolicyAction::RequireConfirmation gains light: bool with #[serde(default)] — old audit JSONL lines deserialise unchanged.
  • PolicyDecision.classification_reason is Option<…> with #[serde(default)] — old lines deserialise as None.
  • is_read_only_invocation(program, args) -> Option<RiskLevel> is preserved as a thin wrapper over the new classify_invocation -> Result<RiskLevel, ClassificationReason>.

v0.2.13 — security hardening

11 May 13:44
v0.2.13

Choose a tag to compare

Security hardening pass

  • Unified redaction. New llmsh-redact crate is now the single source of truth for secret patterns. llmsh-audit::redact and llmsh-core::llm_redact are thin façades over it — no more three parallel pattern lists drifting apart.
  • Extended secret catalogue. OpenAI, Anthropic, GCP (API key + service-account JSON marker), AWS access/secret, GitHub (modern + classic), Databricks, HuggingFace, Replicate, Slack, JWT, Bearer, PEM private keys, and .env-style *_KEY=… / *_PASSWORD=… lines.
  • API key zeroized. OpenAIProvider stores the key inside secrecy::SecretString: it no longer leaks via Debug output and the underlying memory is zeroed on drop.
  • Error bodies redacted. OpenAI HTTP error responses pass through the redactor before being bubbled up to logs — some error payloads echo the request fragments containing the offending token.
  • Memory persistence redacted. Conversation messages are redacted before insertion into the SQLite memory DB. Previously, .env reads or token-bearing tool outputs were stored verbatim.
  • Deshelling gap closed. extract_shell_payload now accepts bash -c PAYLOAD pos1 pos2… (extra positional args become $0, $1, … inside the body). Previously the parser only matched argv.len() == 2, so appending a trailing arg let invocations silently skip read-only classification.

Audit chain — known limitation (documented, not fixed)

Investigation of the audit log surfaced that the crate-level README's "SHA-256 chained" claim is currently aspirational. Per-event digest fields (messages_digest, tool_calls_digest, args_digest, …) are content-addressable hashes of subfields and do not reference the prior event; the writer does a plain writeln!. Implementing a real inter-event chain (new field on every variant, writer state, verifier, format-version bump) is deferred to a dedicated plan.

Tests

322 → 338 (+16 new tests across llmsh-redact, llmsh-llm-openai, llmsh-core::memory, llmsh-policy).

v0.2.12 — softer tool-call cap + Linux read-only classifiers

11 May 11:19
v0.2.12

Choose a tag to compare

Two interactive-friction fixes informed by a Debian / ARM64 deployment test on VM01.

Highlights

Tool-call overflow becomes recoverable

Before: when the model emitted more than max_tool_calls_per_iteration tool calls in a single turn, the agent aborted silently — the REPL just redrew an empty prompt with no assistant text. Default cap was 5, which gpt-4.1-mini exceeded on the very first "audit complet" request.

After:

  • Default cap raised from 5 → 32 so legitimate batches (8–10 read-only system audits) pass without any prompt.
  • When the cap is exceeded, the user gets [Y/n] (empty = Y) instead of a blank line. Approving runs the plan as-is.
  • Audit log retains the too_many_tool_calls error event with the user verdict embedded (approved/denied).
  • New trait method ConfirmationGate::ask_overflow(requested, limit) -> bool with a default that refuses; StdinConfirmationGate prompts, AlwaysYesGate overrides to true for tests.

Linux read-only classifier (less friction, same safety)

The deterministic safe_commands classifier was macOS-leaning. On Linux, common inspection tools returned RiskLevel::UnknownModelDisagreesOnRiskConfirm, producing 5+ [y/N] prompts per audit turn.

  • ALWAYS_SAFE additions (no mutating mode): lscpu, lsmem, lsblk, lspci, lsusb, lshw, lsmod, lsipc, lsns, free, dpkg-query, apparmor_status, aa-status, chkrootkit, rkhunter, findmnt, mountpoint, nproc, arch, getent.
  • Subcommand-aware classifiers (read-only on specific verbs/flags, mutating otherwise):
    • ip <object> [show|list|get] — rejects add/del/set/change/replace/flush.
    • ss — read-only unless -K/--kill.
    • ufw — read-only on status/show; rejects enable/disable/allow/…
    • systemctl — read-only on status/show/cat/list-/is-/get-default/show-environment.
    • journalctl — read-only unless --vacuum-*/--rotate/--flush/--sync.
    • iptables/ip6tables — read-only when any listing flag (-L/-S/--list-rules/--check) is present and no mutating flag (-A/-D/-I/-R/-F/-X/-Z/-N/-P/-E) is.
    • nft — read-only on list.
    • firewall-cmd — read-only on --state/--list-*/--get-*/--query-*/--info-*; conservatively rejects --permanent.
    • mount — read-only on bare invocation or with -l/-v/--show-labels.
    • dmesg — read-only unless -c/-C/--clear/--read-clear.

A typical audit complet REPL session on Debian goes from 9 [y/N] confirms to 0 for hardware/inspection commands. Sensitive-path strong confirmations on /etc/passwd, /etc/sudoers, etc. remain by design.

Misc

  • model_cmd error path now surfaces the configured allowlist when no installed model matches (better operator hint when /model set <name> fails).

CI

  • cargo fmt --all -- --check — pass
  • cargo clippy --workspace --all-targets -- -D warnings — pass
  • cargo test --workspace --locked322 passing (up from 311 at v0.2.11; 11 new policy/agent tests).

Commits

  • 807c881 feat(policy): subcommand-aware read-only classifiers for ip/ss/ufw/systemctl/…
  • 20966ee feat(policy): add Linux hardware/inspection commands to ALWAYS_SAFE
  • e455c1d feat(agent): prompt on tool-call overflow + raise default cap to 32
  • 852f33f fix(model_cmd): surface allowlist when no installed model matches

v0.2.10 — classifier deshelling + macOS allowlist + LLM briefing

10 May 11:41
v0.2.10

Choose a tag to compare

Highlights

Less friction on read-only shell wrappers. The deterministic policy classifier now deshells bash -c "<single-cmd>" one level (with a conservative metachar + literal-glob filter via shlex) and re-classifies the inner command. Wrapping ls, dscl . list /Users, grep TODO src/main.rs etc. in bash -c no longer triggers spurious confirmation prompts.

macOS read-only system tools recognised. Universal allowlist additions (system_profiler, sw_vers, ioreg, nettop, scutil, csrutil, vm_stat, iostat, last, w, who, users) plus per-program predicates with cross-platform discipline for binaries that have a write mode somewhere (dscl, pfctl, defaults, launchctl, networksetup, sysctl).

Privilege escalation always reaches ConfirmStrong. The UsesPrivilegeEscalation flag (sudo / doas / su) is now set in the pipeline post-deshell rather than at the enrich layer, so bash -c "sudo …" correctly reaches ConfirmStrong with phrase — the previous detection only saw the outer bash program.

Audited default sensitive_path_patterns. SSH keys (~/.ssh/**, **/id_rsa, **/id_ed25519, **/id_ecdsa), cloud credentials (~/.aws/**, ~/.config/gcloud/**, ~/.config/gh/**, ~/.docker/config.json, ~/.kube/**), generic dotfiles (~/.netrc, ~/.pgpass), project secrets (.env, .env.*, **/.env, **/.env.*, **/credentials*, **/secrets.*, **/*.pem, **/*.key), and system-sensitive paths (/etc/sudoers, /etc/sudoers.d/**, /etc/shadow, /etc/passwd).

Better LLM briefing. The run_process tool description now teaches the model to prefer argv-direct (program=ls, args=["-la"]) over bash -c "ls -la" with ❌/✅ examples and a claimed_risk taxonomy. The persona block mirrors the nudge.

Tests

  • 6 unit tests for extract_shell_payload (positive deshell, metachar refusal, recursion bound, sudo wrapper non-deshell).
  • 12 entries verified in the universal allowlist.
  • 6 per-program predicate tests (dscl, pfctl, defaults, launchctl, networksetup, sysctl).
  • 3 pipeline tests for post-deshell privesc detection.
  • 4 e2e tests: e2e_classifier_bash_deshell, e2e_privilege_escalation, e2e_privesc_through_bash, e2e_persona_avoids_bash_wrapping.

Total: 300 tests passing across 46 suites (up from 258 in v0.2.9). cargo fmt --check clean, cargo clippy -D warnings clean.

Upgrade notes

  • Rust workspace, MSRV 1.78. No breaking API changes.
  • Default sensitive-path patterns expanded — if you have a custom ~/.config/llmsh/config.toml with an explicit policy.sensitive_paths.patterns array, your overrides still apply unchanged.
  • The new shell deshelling is opt-out via the existing auto_classify_run_process: false Pipeline flag (already there since v0.2.x).

🤖 Generated with Claude Code

v0.2.9 — slash autocomplete + multi-line input + compactor stage-B

09 May 20:40
v0.2.9

Choose a tag to compare

Highlights

REPL ergonomics

  • Slash autocomplete — Tab now opens a ColumnarMenu driven by a new SlashCompleter that suggests /commands and their subcommands (e.g. /memory list, /clear-context).
  • Multi-line input — Shift+Enter / Alt+Enter insert a newline. Shift+Enter requires a terminal that distinguishes it from Enter (kitty / iTerm with CSI-u); Alt+Enter works everywhere.

Thinking provider

  • New ThinkingProvider wraps the underlying LlmProvider to surface model reasoning in a uniform way for downstream UX.

Compactor stage-B observability

  • New StageBOutcome enum (NotAttempted / Skipped / Succeeded / Failed) is now part of CompactionReport.
  • Manual /compact prints the stage-B outcome ((summary stage: ok — facts updated) / failed — … / skipped — …).
  • Stage-B errors are no longer silently swallowed: they're logged and persisted to the audit log.

Audit schema v5

  • ContextCompacted events gain three optional fields: stage_b_outcome, stage_b_skip_reason, stage_b_error.
  • Emission moved from agent.rs / repl.rs into the compactor itself — single source of truth for the event.
  • Backwards-compatible read path: missing fields deserialize as None.

Install

cargo install --git https://github.com/swoelffel/llmshell --tag v0.2.9 llmsh-cli

Pre-built binaries land in v0.3 — see ROADMAP.md.

Verify

$ llmsh --version
llmsh-cli 0.2.9

v0.2.8 — tilde expansion + glob tool

09 May 19:16
v0.2.8

Choose a tag to compare

First public release since v0.2.1 — covers seven internal iterations (v0.2.2 → v0.2.8) shipped through May 2026.

Highlights since v0.2.1

v0.2.8 — Tilde expansion & glob tool (this release)

  • ~ and ~/… are now expanded by run_process, read_file, list_directory (no shell — $HOME substitution only).
  • New typed tool glob (read-only) — pattern → list of absolute paths, capped at 1000. Lets the agent compose glob + run_process for du -sh ~/Library/Caches/*-style requests.
  • Hardened run_process schema description and persona prompt to point the agent at glob first.

v0.2.7 — Policy & PWD overhaul

  • Dropped the workspace-boundary concept: the workspace is the host machine, scoped by the running user's filesystem rights.
  • Removed automatic Deny on sensitive paths and outside-workspace; everything reachable now flows through a strong confirmation prompt with N as default.
  • Unified PWD plumbing via Arc<RwLock<PathBuf>> shared across REPL, policy and tools — !cd /dir and run_process(cd, …) actually move PWD when the target exists.
  • One-shot recovery for orphan tool_calls left over from prior sessions (fixes OpenAI 400 on reload).

v0.2.6 / v0.2.5 — Memory schema v2

  • Persisted conversation messages, curated long-term facts, soft-delete via cleared_at / cleared_source.
  • New REPL meta commands: /clear-context, /clear-memory, /clear-all, /memory list.

v0.2.4 / v0.2.3 — Context compaction

  • Deterministic truncate_tool_outputs stage + LLM summarize_prefix stage, with /compact meta command and auto-trigger from the agent loop.
  • JSON-structured compactor output feeding curated long-term facts.

v0.2.2 — Verbose mode & status line

  • -v / -vv flags, tier-1/2 stderr output, reedline status prompt.
  • Per-turn SessionStats, cached-input token accounting, model context-window & pricing tables.

Install

cargo install --git https://github.com/swoelffel/llmshell --tag v0.2.8 llmsh-cli

Pre-built binaries land in v0.3 — see ROADMAP.md.

Verify

$ llmsh --version
llmsh-cli 0.2.8

v0.2.1 — first public release

08 May 16:49
v0.2.1

Choose a tag to compare

LLMShell v0.2.1 — first public release

llmsh is an agentic terminal shell: a REPL takes natural-language input, an LLM agent plans and emits tool calls, a policy engine classifies risk and gates execution, tools run with timeout/cancellation, and every step is appended to a tamper-evident audit log.

Highlights since project start

  • Agent loop (llmsh-core) — bounded iterate-until-done with schema enrichment, policy classification, sensitive-path checks.
  • Policy engine (llmsh-policy) — Allow / Confirm / Deny classification with phrase + sensitive-path heuristics.
  • Tools (llmsh-tools) — read_file, list_directory, run_process with per-tool timeout and cancellation.
  • Audit (llmsh-audit) — append-only JSONL with hash-chained digest and redaction at the LLM boundary.
  • OpenAI-compatible provider (llmsh-llm-openai) — /v1/models discovery, runtime model switch via /model.
  • REPL (llmsh-core::repl) — reedline-backed input, slash commands, /init auto-bootstrap, system-prompt builder structured for OpenAI's automatic prompt cache.
  • Confirmation prompt — surfaces command details and policy flags before risky tool execution.

Build

cargo build --release
export OPENAI_API_KEY=sk-...
./target/release/llmsh

Requirements

  • Rust 1.78+ (stable toolchain pinned via rust-toolchain.toml)
  • An OpenAI-compatible API endpoint

🤖 Generated with Claude Code