Releases · swoelffel/llmshell

13 May 14:39

swoelffel

v0.3.1

6e4ac73

v0.3.1 — quoting-aware shell lexer Latest

Latest

Policy — quoting-aware shell lexer

New internal shell_lex module replaces shlex::split inside classify_shell_payload. Tokens now carry their Quoting context (Bare, Single, Double, Mixed), so operators and literals are separated structurally instead of being inferred after the fact.
The defensive tok.contains('|') guard is removed. Regex alternation inside quoted arguments — e.g. df -h | grep -E '^/dev|^Filesystem' or sysctl -a | grep -E 'kern.maxfiles|kern.maxfilesperproc' — is now classified ReadOnly instead of Unknown, dropping spurious confirm prompts on a common diagnostic pattern.
Fail-closed posture preserved: the upstream metacharacter pre-filter ($, backtick, \n, (, {, <(, >(, $() is unchanged, and every lexer error (UnterminatedQuote, DanglingEscape, UnsupportedConstruct for heredocs) maps back to ClassificationReason::UnparsableShellPayload → Unknown → confirm.

Commits:

6e4ac73 release: v0.3.1 — quoting-aware shell lexer
980e567 feat(policy): wire shell_lex into classify_shell_payload, remove pipe-glued heuristic
c8cedf9 feat(policy): add quoting-aware shell lexer (shell_lex)
0331ed3 test(policy): document pipe-alternation false-negative as ignored regressions

Assets 2

13 May 09:36

swoelffel

v0.3.0

c205f5f

v0.3.0 — Anthropic provider (Claude Haiku / Sonnet / Opus)

Anthropic provider — Claude Haiku, Sonnet, Opus

LLMShell now speaks the Anthropic Messages API natively. Switch at runtime with /provider set anthropic or pin a model with /model use claude-sonnet-4-6. Haiku 4.5 is the default of the Anthropic provider.

Added

New crate llmsh-llm-anthropic — full native implementation of the Messages API behind the existing LlmProvider trait. No change to the agent loop, the policy gate, or the audit chain.
Three Claude 4.x models out of the box: claude-haiku-4-5 (default), claude-sonnet-4-6, claude-opus-4-7. Pricing and 200k context window registered in llmsh-llm.
Tool-use round-trip: assistant tool_use blocks ↔ neutral ToolCall; consecutive Tool messages are grouped into a single user turn carrying multiple tool_result blocks, per Anthropic's recommendation.
JSON-object response_format is emulated via the assistant-prefill { technique (cookbook). The compactor — which requires structured JSON — keeps working unmodified across providers.
HTTP error bodies are redacted: the existing llmsh-redact anthropic_key pattern catches sk-ant-… leaks in any 4xx/5xx surfaced to the user.
Capabilities advertised honestly: tool_calling=Native, supports_streaming=false, supports_json_mode=true (emulated), supports_parallel_tool_calls=true, max_context_tokens=Some(200_000).

Local install runbook

docs/runbooks/local-install.md codifies the deploy flow we kept re-deriving by hand:

Canonical install: cargo install --path crates/llmsh-cli --force.
Manual fallback is always the triplet cp + xattr -c + codesign --force --sign - — a bare cp over an existing binary hits macOS Sequoia's provenance xattr and dies silently with zsh: killed.
After every release that adds a provider or a top-level config key, append the missing block to the user config.toml (load_or_create_user does not merge new defaults into an existing file).
Verification gate: which llmsh, version matches Cargo.toml, /provider lists the new entry, one smoke turn writes one audit event.

CLAUDE.md now references the runbook so agents follow it systematically.

Not included (parked for a dedicated minor)

Anthropic streaming (SSE): not yet plumbed; will land in a focused minor.
Explicit cache_control: ephemeral prompt caching: deferred. The SystemPromptBuilder already orders sections stable→dynamic, so the switch will be wiring only.

Tests / CI

415 tests pass on cargo test --workspace --locked (was 379 in v0.2.15). New coverage: 10 mapping unit tests, 5 wire serde tests, 5 provider tests, 4 wiremock end-to-end tests (tool_use round-trip, body shape with tool_choice=any, JSON-prefill reconstruction, error redaction).
cargo fmt --check + cargo clippy --workspace --all-targets -- -D warnings clean.

Backwards compatibility

default_model global default is unchanged (openai:gpt-4.1-mini). The Anthropic provider only activates when the user picks it via /provider set anthropic or pins default_model = "anthropic:claude-haiku-4-5".
No breaking changes to the neutral LlmProvider / LlmRequest / LlmResponse types.
Audit JSONL schema unchanged.

Assets 2

13 May 07:46

swoelffel

v0.2.15

491f3ff

v0.2.15 — pipeline-aware classifier

Pipeline-aware classifier + explainable Unknown + light confirm

Before this release, every bash -c "find . | wc -l" forced an interactive confirmation because the classifier rejected any payload containing a shell metacharacter. v0.2.15 teaches the deterministic classifier to recognise read-only pipelines and surfaces why a call was left unclassified.

Policy — pipeline parser

Pipelines are classified. bash -c "A | B | …", A && B, A || B are downgraded to ReadOnly when every segment is independently read-only. The user's original case find . -maxdepth 1 -type f | wc -l now passes without a prompt.
Safe output redirections accepted. >/dev/null, 2>/dev/null, 2>&1, and >/tmp/<simple-name> are recognised. Any other target keeps the call at Unknown.
Sécurité préservée. ; sequence, & background, $VAR, $(…), backticks, globs, redirections to other paths still block classification.

UX — explainable Unknown

New ClassificationReason (snake_case JSON enum) is propagated from the deterministic classifier into PolicyDecision.classification_reason. The confirmation prompt now reads e.g. risk=Unknown — segment de pipeline non read-only or risk=Unknown — substitution de commande instead of an opaque Unknown.
Reason variants: unsafe_pipeline_segment, command_substitution, variable_expansion, glob_not_resolved, sequence_or_background, unsafe_redirection_target, nested_shell_wrapping, program_not_allowlisted, unsafe_argument, … (full list in crates/llmsh-policy/src/types.rs).

UX — light confirmation

PolicyAction::RequireConfirmation gains light: bool. When the classifier returns Unknown but the LLM declared claimed_risk = read_only or low, the prompt is downgraded to a default-yes single-keystroke [Y/n] (vs. the standard [y/N]). ConfirmStrong (phrase verbatim) is unchanged.
The model never gains authority over execution — a confirmation is still required, only the prompt is lighter. The audit log keeps the model_disagrees_on_risk flag for offline review.

Agent prompt updated

The run_process tool description and the system persona now tell the agent that simple pipes and safe redirections in bash -c "…" are recognised by the classifier, so it can keep using legitimate shell forms without forcing the user through a confirmation.

Tests / CI

379 → 382 tests (+3 in llmsh-core::pipeline covering the pipeline downgrade, the destructive-segment rejection, and the light-confirm path), plus 9 new unit tests in llmsh-policy::safe_commands covering A+B positive and negative cases.
cargo fmt --check, cargo clippy -D warnings, full cargo test --workspace --locked all green.
Fuzz target (crates/llmsh-policy/fuzz/fuzz_targets/deshell.rs) still asserts determinism on the broadened classifier.

Backwards compatibility

PolicyAction::RequireConfirmation gains light: bool with #[serde(default)] — old audit JSONL lines deserialise unchanged.
PolicyDecision.classification_reason is Option<…> with #[serde(default)] — old lines deserialise as None.
is_read_only_invocation(program, args) -> Option<RiskLevel> is preserved as a thin wrapper over the new classify_invocation -> Result<RiskLevel, ClassificationReason>.

Assets 2

11 May 13:44

swoelffel

v0.2.13

438bfae

v0.2.13 — security hardening

Security hardening pass

Unified redaction. New llmsh-redact crate is now the single source of truth for secret patterns. llmsh-audit::redact and llmsh-core::llm_redact are thin façades over it — no more three parallel pattern lists drifting apart.
Extended secret catalogue. OpenAI, Anthropic, GCP (API key + service-account JSON marker), AWS access/secret, GitHub (modern + classic), Databricks, HuggingFace, Replicate, Slack, JWT, Bearer, PEM private keys, and .env-style *_KEY=… / *_PASSWORD=… lines.
API key zeroized. OpenAIProvider stores the key inside secrecy::SecretString: it no longer leaks via Debug output and the underlying memory is zeroed on drop.
Error bodies redacted. OpenAI HTTP error responses pass through the redactor before being bubbled up to logs — some error payloads echo the request fragments containing the offending token.
Memory persistence redacted. Conversation messages are redacted before insertion into the SQLite memory DB. Previously, .env reads or token-bearing tool outputs were stored verbatim.
Deshelling gap closed. extract_shell_payload now accepts bash -c PAYLOAD pos1 pos2… (extra positional args become $0, $1, … inside the body). Previously the parser only matched argv.len() == 2, so appending a trailing arg let invocations silently skip read-only classification.

Audit chain — known limitation (documented, not fixed)

Investigation of the audit log surfaced that the crate-level README's "SHA-256 chained" claim is currently aspirational. Per-event digest fields (messages_digest, tool_calls_digest, args_digest, …) are content-addressable hashes of subfields and do not reference the prior event; the writer does a plain writeln!. Implementing a real inter-event chain (new field on every variant, writer state, verifier, format-version bump) is deferred to a dedicated plan.

Tests

322 → 338 (+16 new tests across llmsh-redact, llmsh-llm-openai, llmsh-core::memory, llmsh-policy).

Assets 2

11 May 11:19

swoelffel

v0.2.12

b9d13ce

v0.2.12 — softer tool-call cap + Linux read-only classifiers

Two interactive-friction fixes informed by a Debian / ARM64 deployment test on VM01.

Highlights

Tool-call overflow becomes recoverable

Before: when the model emitted more than max_tool_calls_per_iteration tool calls in a single turn, the agent aborted silently — the REPL just redrew an empty prompt with no assistant text. Default cap was 5, which gpt-4.1-mini exceeded on the very first "audit complet" request.

After:

Default cap raised from 5 → 32 so legitimate batches (8–10 read-only system audits) pass without any prompt.
When the cap is exceeded, the user gets [Y/n] (empty = Y) instead of a blank line. Approving runs the plan as-is.
Audit log retains the too_many_tool_calls error event with the user verdict embedded (approved/denied).
New trait method ConfirmationGate::ask_overflow(requested, limit) -> bool with a default that refuses; StdinConfirmationGate prompts, AlwaysYesGate overrides to true for tests.

Linux read-only classifier (less friction, same safety)

The deterministic safe_commands classifier was macOS-leaning. On Linux, common inspection tools returned RiskLevel::Unknown → ModelDisagreesOnRisk → Confirm, producing 5+ [y/N] prompts per audit turn.

ALWAYS_SAFE additions (no mutating mode): lscpu, lsmem, lsblk, lspci, lsusb, lshw, lsmod, lsipc, lsns, free, dpkg-query, apparmor_status, aa-status, chkrootkit, rkhunter, findmnt, mountpoint, nproc, arch, getent.
Subcommand-aware classifiers (read-only on specific verbs/flags, mutating otherwise):
- ip <object> [show|list|get] — rejects add/del/set/change/replace/flush.
- ss — read-only unless -K/--kill.
- ufw — read-only on status/show; rejects enable/disable/allow/…
- systemctl — read-only on status/show/cat/list-/is-/get-default/show-environment.
- journalctl — read-only unless --vacuum-*/--rotate/--flush/--sync.
- iptables/ip6tables — read-only when any listing flag (-L/-S/--list-rules/--check) is present and no mutating flag (-A/-D/-I/-R/-F/-X/-Z/-N/-P/-E) is.
- nft — read-only on list.
- firewall-cmd — read-only on --state/--list-*/--get-*/--query-*/--info-*; conservatively rejects --permanent.
- mount — read-only on bare invocation or with -l/-v/--show-labels.
- dmesg — read-only unless -c/-C/--clear/--read-clear.

A typical audit complet REPL session on Debian goes from 9 [y/N] confirms to 0 for hardware/inspection commands. Sensitive-path strong confirmations on /etc/passwd, /etc/sudoers, etc. remain by design.

Misc

model_cmd error path now surfaces the configured allowlist when no installed model matches (better operator hint when /model set <name> fails).

CI

cargo fmt --all -- --check — pass
cargo clippy --workspace --all-targets -- -D warnings — pass
cargo test --workspace --locked — 322 passing (up from 311 at v0.2.11; 11 new policy/agent tests).

Commits

807c881 feat(policy): subcommand-aware read-only classifiers for ip/ss/ufw/systemctl/…
20966ee feat(policy): add Linux hardware/inspection commands to ALWAYS_SAFE
e455c1d feat(agent): prompt on tool-call overflow + raise default cap to 32
852f33f fix(model_cmd): surface allowlist when no installed model matches

Assets 2

10 May 11:41

swoelffel

v0.2.10

6e7a36f

v0.2.10 — classifier deshelling + macOS allowlist + LLM briefing

Highlights

Less friction on read-only shell wrappers. The deterministic policy classifier now deshells bash -c "<single-cmd>" one level (with a conservative metachar + literal-glob filter via shlex) and re-classifies the inner command. Wrapping ls, dscl . list /Users, grep TODO src/main.rs etc. in bash -c no longer triggers spurious confirmation prompts.

macOS read-only system tools recognised. Universal allowlist additions (system_profiler, sw_vers, ioreg, nettop, scutil, csrutil, vm_stat, iostat, last, w, who, users) plus per-program predicates with cross-platform discipline for binaries that have a write mode somewhere (dscl, pfctl, defaults, launchctl, networksetup, sysctl).

Privilege escalation always reaches ConfirmStrong. The UsesPrivilegeEscalation flag (sudo / doas / su) is now set in the pipeline post-deshell rather than at the enrich layer, so bash -c "sudo …" correctly reaches ConfirmStrong with phrase — the previous detection only saw the outer bash program.

Audited default sensitive_path_patterns. SSH keys (~/.ssh/**, **/id_rsa, **/id_ed25519, **/id_ecdsa), cloud credentials (~/.aws/**, ~/.config/gcloud/**, ~/.config/gh/**, ~/.docker/config.json, ~/.kube/**), generic dotfiles (~/.netrc, ~/.pgpass), project secrets (.env, .env.*, **/.env, **/.env.*, **/credentials*, **/secrets.*, **/*.pem, **/*.key), and system-sensitive paths (/etc/sudoers, /etc/sudoers.d/**, /etc/shadow, /etc/passwd).

Better LLM briefing. The run_process tool description now teaches the model to prefer argv-direct (program=ls, args=["-la"]) over bash -c "ls -la" with ❌/✅ examples and a claimed_risk taxonomy. The persona block mirrors the nudge.

Tests

6 unit tests for extract_shell_payload (positive deshell, metachar refusal, recursion bound, sudo wrapper non-deshell).
12 entries verified in the universal allowlist.
6 per-program predicate tests (dscl, pfctl, defaults, launchctl, networksetup, sysctl).
3 pipeline tests for post-deshell privesc detection.
4 e2e tests: e2e_classifier_bash_deshell, e2e_privilege_escalation, e2e_privesc_through_bash, e2e_persona_avoids_bash_wrapping.

Total: 300 tests passing across 46 suites (up from 258 in v0.2.9). cargo fmt --check clean, cargo clippy -D warnings clean.

Upgrade notes

Rust workspace, MSRV 1.78. No breaking API changes.
Default sensitive-path patterns expanded — if you have a custom ~/.config/llmsh/config.toml with an explicit policy.sensitive_paths.patterns array, your overrides still apply unchanged.
The new shell deshelling is opt-out via the existing auto_classify_run_process: false Pipeline flag (already there since v0.2.x).

🤖 Generated with Claude Code

Assets 2

09 May 20:40

swoelffel

v0.2.9

7356749

v0.2.9 — slash autocomplete + multi-line input + compactor stage-B

Highlights

REPL ergonomics

Slash autocomplete — Tab now opens a ColumnarMenu driven by a new SlashCompleter that suggests /commands and their subcommands (e.g. /memory list, /clear-context).
Multi-line input — Shift+Enter / Alt+Enter insert a newline. Shift+Enter requires a terminal that distinguishes it from Enter (kitty / iTerm with CSI-u); Alt+Enter works everywhere.

Thinking provider

New ThinkingProvider wraps the underlying LlmProvider to surface model reasoning in a uniform way for downstream UX.

Compactor stage-B observability

New StageBOutcome enum (NotAttempted / Skipped / Succeeded / Failed) is now part of CompactionReport.
Manual /compact prints the stage-B outcome ((summary stage: ok — facts updated) / failed — … / skipped — …).
Stage-B errors are no longer silently swallowed: they're logged and persisted to the audit log.

Audit schema v5

ContextCompacted events gain three optional fields: stage_b_outcome, stage_b_skip_reason, stage_b_error.
Emission moved from agent.rs / repl.rs into the compactor itself — single source of truth for the event.
Backwards-compatible read path: missing fields deserialize as None.

Install

cargo install --git https://github.com/swoelffel/llmshell --tag v0.2.9 llmsh-cli

Pre-built binaries land in v0.3 — see ROADMAP.md.

Verify

$ llmsh --version
llmsh-cli 0.2.9

Assets 2

09 May 19:16

swoelffel

v0.2.8

5bf0ba0

v0.2.8 — tilde expansion + glob tool

First public release since v0.2.1 — covers seven internal iterations (v0.2.2 → v0.2.8) shipped through May 2026.

Highlights since v0.2.1

v0.2.8 — Tilde expansion & glob tool (this release)

~ and ~/… are now expanded by run_process, read_file, list_directory (no shell — $HOME substitution only).
New typed tool glob (read-only) — pattern → list of absolute paths, capped at 1000. Lets the agent compose glob + run_process for du -sh ~/Library/Caches/*-style requests.
Hardened run_process schema description and persona prompt to point the agent at glob first.

v0.2.7 — Policy & PWD overhaul

Dropped the workspace-boundary concept: the workspace is the host machine, scoped by the running user's filesystem rights.
Removed automatic Deny on sensitive paths and outside-workspace; everything reachable now flows through a strong confirmation prompt with N as default.
Unified PWD plumbing via Arc<RwLock<PathBuf>> shared across REPL, policy and tools — !cd /dir and run_process(cd, …) actually move PWD when the target exists.
One-shot recovery for orphan tool_calls left over from prior sessions (fixes OpenAI 400 on reload).

v0.2.6 / v0.2.5 — Memory schema v2

Persisted conversation messages, curated long-term facts, soft-delete via cleared_at / cleared_source.
New REPL meta commands: /clear-context, /clear-memory, /clear-all, /memory list.

v0.2.4 / v0.2.3 — Context compaction

Deterministic truncate_tool_outputs stage + LLM summarize_prefix stage, with /compact meta command and auto-trigger from the agent loop.
JSON-structured compactor output feeding curated long-term facts.

v0.2.2 — Verbose mode & status line

-v / -vv flags, tier-1/2 stderr output, reedline status prompt.
Per-turn SessionStats, cached-input token accounting, model context-window & pricing tables.

Install

cargo install --git https://github.com/swoelffel/llmshell --tag v0.2.8 llmsh-cli

Pre-built binaries land in v0.3 — see ROADMAP.md.

Verify

$ llmsh --version
llmsh-cli 0.2.8

Assets 2

08 May 16:49

swoelffel

v0.2.1

3e80ad0

v0.2.1 — first public release

LLMShell v0.2.1 — first public release

llmsh is an agentic terminal shell: a REPL takes natural-language input, an LLM agent plans and emits tool calls, a policy engine classifies risk and gates execution, tools run with timeout/cancellation, and every step is appended to a tamper-evident audit log.

Highlights since project start

Agent loop (llmsh-core) — bounded iterate-until-done with schema enrichment, policy classification, sensitive-path checks.
Policy engine (llmsh-policy) — Allow / Confirm / Deny classification with phrase + sensitive-path heuristics.
Tools (llmsh-tools) — read_file, list_directory, run_process with per-tool timeout and cancellation.
Audit (llmsh-audit) — append-only JSONL with hash-chained digest and redaction at the LLM boundary.
OpenAI-compatible provider (llmsh-llm-openai) — /v1/models discovery, runtime model switch via /model.
REPL (llmsh-core::repl) — reedline-backed input, slash commands, /init auto-bootstrap, system-prompt builder structured for OpenAI's automatic prompt cache.
Confirmation prompt — surfaces command details and policy flags before risky tool execution.

Build

cargo build --release
export OPENAI_API_KEY=sk-...
./target/release/llmsh

Requirements

Rust 1.78+ (stable toolchain pinned via rust-toolchain.toml)
An OpenAI-compatible API endpoint

🤖 Generated with Claude Code

Assets 2

Releases: swoelffel/llmshell

v0.3.1 — quoting-aware shell lexer

Policy — quoting-aware shell lexer

Uh oh!

v0.3.0 — Anthropic provider (Claude Haiku / Sonnet / Opus)

Anthropic provider — Claude Haiku, Sonnet, Opus

Added

Local install runbook

Not included (parked for a dedicated minor)

Tests / CI

Backwards compatibility

Uh oh!

v0.2.15 — pipeline-aware classifier

Pipeline-aware classifier + explainable Unknown + light confirm

Policy — pipeline parser

UX — explainable Unknown

UX — light confirmation

Agent prompt updated

Tests / CI

Backwards compatibility

Uh oh!

v0.2.13 — security hardening

Security hardening pass

Audit chain — known limitation (documented, not fixed)

Tests

Uh oh!

v0.2.12 — softer tool-call cap + Linux read-only classifiers

Highlights

Tool-call overflow becomes recoverable

Linux read-only classifier (less friction, same safety)

Misc

CI

Commits

Uh oh!

v0.2.10 — classifier deshelling + macOS allowlist + LLM briefing

Highlights

Tests

Upgrade notes

Uh oh!

v0.2.9 — slash autocomplete + multi-line input + compactor stage-B

Highlights

REPL ergonomics

Thinking provider

Compactor stage-B observability

Audit schema v5

Install

Verify

Uh oh!

v0.2.8 — tilde expansion + glob tool

Highlights since v0.2.1

v0.2.8 — Tilde expansion & glob tool (this release)

v0.2.7 — Policy & PWD overhaul

v0.2.6 / v0.2.5 — Memory schema v2

v0.2.4 / v0.2.3 — Context compaction

v0.2.2 — Verbose mode & status line

Install

Verify

Uh oh!

v0.2.1 — first public release

LLMShell v0.2.1 — first public release

Highlights since project start

Build

Requirements

Uh oh!