Releases: swoelffel/llmshell
v0.3.1 — quoting-aware shell lexer
Policy — quoting-aware shell lexer
- New internal
shell_lexmodule replacesshlex::splitinsideclassify_shell_payload. Tokens now carry theirQuotingcontext (Bare,Single,Double,Mixed), so operators and literals are separated structurally instead of being inferred after the fact. - The defensive
tok.contains('|')guard is removed. Regex alternation inside quoted arguments — e.g.df -h | grep -E '^/dev|^Filesystem'orsysctl -a | grep -E 'kern.maxfiles|kern.maxfilesperproc'— is now classifiedReadOnlyinstead ofUnknown, dropping spurious confirm prompts on a common diagnostic pattern. - Fail-closed posture preserved: the upstream metacharacter pre-filter (
$, backtick,\n,(,{,<(,>(,$() is unchanged, and every lexer error (UnterminatedQuote,DanglingEscape,UnsupportedConstructfor heredocs) maps back toClassificationReason::UnparsableShellPayload→Unknown→ confirm.
Commits:
6e4ac73release: v0.3.1 — quoting-aware shell lexer980e567feat(policy): wire shell_lex into classify_shell_payload, remove pipe-glued heuristicc8cedf9feat(policy): add quoting-aware shell lexer (shell_lex)0331ed3test(policy): document pipe-alternation false-negative as ignored regressions
v0.3.0 — Anthropic provider (Claude Haiku / Sonnet / Opus)
Anthropic provider — Claude Haiku, Sonnet, Opus
LLMShell now speaks the Anthropic Messages API natively. Switch at runtime with /provider set anthropic or pin a model with /model use claude-sonnet-4-6. Haiku 4.5 is the default of the Anthropic provider.
Added
- New crate
llmsh-llm-anthropic— full native implementation of the Messages API behind the existingLlmProvidertrait. No change to the agent loop, the policy gate, or the audit chain. - Three Claude 4.x models out of the box:
claude-haiku-4-5(default),claude-sonnet-4-6,claude-opus-4-7. Pricing and 200k context window registered inllmsh-llm. - Tool-use round-trip: assistant
tool_useblocks ↔ neutralToolCall; consecutiveToolmessages are grouped into a singleuserturn carrying multipletool_resultblocks, per Anthropic's recommendation. - JSON-object response_format is emulated via the assistant-prefill
{technique (cookbook). The compactor — which requires structured JSON — keeps working unmodified across providers. - HTTP error bodies are redacted: the existing
llmsh-redactanthropic_keypattern catchessk-ant-…leaks in any 4xx/5xx surfaced to the user. - Capabilities advertised honestly:
tool_calling=Native,supports_streaming=false,supports_json_mode=true(emulated),supports_parallel_tool_calls=true,max_context_tokens=Some(200_000).
Local install runbook
docs/runbooks/local-install.md codifies the deploy flow we kept re-deriving by hand:
- Canonical install:
cargo install --path crates/llmsh-cli --force. - Manual fallback is always the triplet
cp+xattr -c+codesign --force --sign -— a barecpover an existing binary hits macOS Sequoia's provenance xattr and dies silently withzsh: killed. - After every release that adds a provider or a top-level config key, append the missing block to the user
config.toml(load_or_create_userdoes not merge new defaults into an existing file). - Verification gate:
which llmsh, version matchesCargo.toml,/providerlists the new entry, one smoke turn writes one audit event.
CLAUDE.md now references the runbook so agents follow it systematically.
Not included (parked for a dedicated minor)
- Anthropic streaming (SSE): not yet plumbed; will land in a focused minor.
- Explicit
cache_control: ephemeralprompt caching: deferred. TheSystemPromptBuilderalready orders sections stable→dynamic, so the switch will be wiring only.
Tests / CI
- 415 tests pass on
cargo test --workspace --locked(was 379 in v0.2.15). New coverage: 10 mapping unit tests, 5 wire serde tests, 5 provider tests, 4 wiremock end-to-end tests (tool_useround-trip, body shape withtool_choice=any, JSON-prefill reconstruction, error redaction). cargo fmt --check+cargo clippy --workspace --all-targets -- -D warningsclean.
Backwards compatibility
default_modelglobal default is unchanged (openai:gpt-4.1-mini). The Anthropic provider only activates when the user picks it via/provider set anthropicor pinsdefault_model = "anthropic:claude-haiku-4-5".- No breaking changes to the neutral
LlmProvider/LlmRequest/LlmResponsetypes. - Audit JSONL schema unchanged.
v0.2.15 — pipeline-aware classifier
Pipeline-aware classifier + explainable Unknown + light confirm
Before this release, every bash -c "find . | wc -l" forced an interactive confirmation because the classifier rejected any payload containing a shell metacharacter. v0.2.15 teaches the deterministic classifier to recognise read-only pipelines and surfaces why a call was left unclassified.
Policy — pipeline parser
- Pipelines are classified.
bash -c "A | B | …",A && B,A || Bare downgraded toReadOnlywhen every segment is independently read-only. The user's original casefind . -maxdepth 1 -type f | wc -lnow passes without a prompt. - Safe output redirections accepted.
>/dev/null,2>/dev/null,2>&1, and>/tmp/<simple-name>are recognised. Any other target keeps the call atUnknown. - Sécurité préservée.
;sequence,&background,$VAR,$(…), backticks, globs, redirections to other paths still block classification.
UX — explainable Unknown
- New
ClassificationReason(snake_case JSON enum) is propagated from the deterministic classifier intoPolicyDecision.classification_reason. The confirmation prompt now reads e.g.risk=Unknown — segment de pipeline non read-onlyorrisk=Unknown — substitution de commandeinstead of an opaqueUnknown. - Reason variants:
unsafe_pipeline_segment,command_substitution,variable_expansion,glob_not_resolved,sequence_or_background,unsafe_redirection_target,nested_shell_wrapping,program_not_allowlisted,unsafe_argument, … (full list incrates/llmsh-policy/src/types.rs).
UX — light confirmation
PolicyAction::RequireConfirmationgainslight: bool. When the classifier returnsUnknownbut the LLM declaredclaimed_risk = read_onlyorlow, the prompt is downgraded to a default-yes single-keystroke[Y/n](vs. the standard[y/N]).ConfirmStrong(phrase verbatim) is unchanged.- The model never gains authority over execution — a confirmation is still required, only the prompt is lighter. The audit log keeps the
model_disagrees_on_riskflag for offline review.
Agent prompt updated
- The
run_processtool description and the system persona now tell the agent that simple pipes and safe redirections inbash -c "…"are recognised by the classifier, so it can keep using legitimate shell forms without forcing the user through a confirmation.
Tests / CI
- 379 → 382 tests (+3 in
llmsh-core::pipelinecovering the pipeline downgrade, the destructive-segment rejection, and the light-confirm path), plus 9 new unit tests inllmsh-policy::safe_commandscovering A+B positive and negative cases. cargo fmt --check,cargo clippy -D warnings, fullcargo test --workspace --lockedall green.- Fuzz target (
crates/llmsh-policy/fuzz/fuzz_targets/deshell.rs) still asserts determinism on the broadened classifier.
Backwards compatibility
PolicyAction::RequireConfirmationgainslight: boolwith#[serde(default)]— old audit JSONL lines deserialise unchanged.PolicyDecision.classification_reasonisOption<…>with#[serde(default)]— old lines deserialise asNone.is_read_only_invocation(program, args) -> Option<RiskLevel>is preserved as a thin wrapper over the newclassify_invocation -> Result<RiskLevel, ClassificationReason>.
v0.2.13 — security hardening
Security hardening pass
- Unified redaction. New
llmsh-redactcrate is now the single source of truth for secret patterns.llmsh-audit::redactandllmsh-core::llm_redactare thin façades over it — no more three parallel pattern lists drifting apart. - Extended secret catalogue. OpenAI, Anthropic, GCP (API key + service-account JSON marker), AWS access/secret, GitHub (modern + classic), Databricks, HuggingFace, Replicate, Slack, JWT, Bearer, PEM private keys, and
.env-style*_KEY=…/*_PASSWORD=…lines. - API key zeroized.
OpenAIProviderstores the key insidesecrecy::SecretString: it no longer leaks viaDebugoutput and the underlying memory is zeroed on drop. - Error bodies redacted. OpenAI HTTP error responses pass through the redactor before being bubbled up to logs — some error payloads echo the request fragments containing the offending token.
- Memory persistence redacted. Conversation messages are redacted before insertion into the SQLite memory DB. Previously,
.envreads or token-bearing tool outputs were stored verbatim. - Deshelling gap closed.
extract_shell_payloadnow acceptsbash -c PAYLOAD pos1 pos2…(extra positional args become$0,$1, … inside the body). Previously the parser only matchedargv.len() == 2, so appending a trailing arg let invocations silently skip read-only classification.
Audit chain — known limitation (documented, not fixed)
Investigation of the audit log surfaced that the crate-level README's "SHA-256 chained" claim is currently aspirational. Per-event digest fields (messages_digest, tool_calls_digest, args_digest, …) are content-addressable hashes of subfields and do not reference the prior event; the writer does a plain writeln!. Implementing a real inter-event chain (new field on every variant, writer state, verifier, format-version bump) is deferred to a dedicated plan.
Tests
322 → 338 (+16 new tests across llmsh-redact, llmsh-llm-openai, llmsh-core::memory, llmsh-policy).
v0.2.12 — softer tool-call cap + Linux read-only classifiers
Two interactive-friction fixes informed by a Debian / ARM64 deployment test on VM01.
Highlights
Tool-call overflow becomes recoverable
Before: when the model emitted more than max_tool_calls_per_iteration tool calls in a single turn, the agent aborted silently — the REPL just redrew an empty prompt with no assistant text. Default cap was 5, which gpt-4.1-mini exceeded on the very first "audit complet" request.
After:
- Default cap raised from 5 → 32 so legitimate batches (8–10 read-only system audits) pass without any prompt.
- When the cap is exceeded, the user gets
[Y/n](empty = Y) instead of a blank line. Approving runs the plan as-is. - Audit log retains the
too_many_tool_callserror event with the user verdict embedded (approved/denied). - New trait method
ConfirmationGate::ask_overflow(requested, limit) -> boolwith a default that refuses;StdinConfirmationGateprompts,AlwaysYesGateoverrides to true for tests.
Linux read-only classifier (less friction, same safety)
The deterministic safe_commands classifier was macOS-leaning. On Linux, common inspection tools returned RiskLevel::Unknown → ModelDisagreesOnRisk → Confirm, producing 5+ [y/N] prompts per audit turn.
- ALWAYS_SAFE additions (no mutating mode):
lscpu,lsmem,lsblk,lspci,lsusb,lshw,lsmod,lsipc,lsns,free,dpkg-query,apparmor_status,aa-status,chkrootkit,rkhunter,findmnt,mountpoint,nproc,arch,getent. - Subcommand-aware classifiers (read-only on specific verbs/flags, mutating otherwise):
ip <object> [show|list|get]— rejects add/del/set/change/replace/flush.ss— read-only unless-K/--kill.ufw— read-only onstatus/show; rejects enable/disable/allow/…systemctl— read-only on status/show/cat/list-/is-/get-default/show-environment.journalctl— read-only unless--vacuum-*/--rotate/--flush/--sync.iptables/ip6tables— read-only when any listing flag (-L/-S/--list-rules/--check) is present and no mutating flag (-A/-D/-I/-R/-F/-X/-Z/-N/-P/-E) is.nft— read-only onlist.firewall-cmd— read-only on--state/--list-*/--get-*/--query-*/--info-*; conservatively rejects--permanent.mount— read-only on bare invocation or with-l/-v/--show-labels.dmesg— read-only unless-c/-C/--clear/--read-clear.
A typical audit complet REPL session on Debian goes from 9 [y/N] confirms to 0 for hardware/inspection commands. Sensitive-path strong confirmations on /etc/passwd, /etc/sudoers, etc. remain by design.
Misc
model_cmderror path now surfaces the configured allowlist when no installed model matches (better operator hint when/model set <name>fails).
CI
cargo fmt --all -- --check— passcargo clippy --workspace --all-targets -- -D warnings— passcargo test --workspace --locked— 322 passing (up from 311 at v0.2.11; 11 new policy/agent tests).
Commits
807c881feat(policy): subcommand-aware read-only classifiers for ip/ss/ufw/systemctl/…20966eefeat(policy): add Linux hardware/inspection commands to ALWAYS_SAFEe455c1dfeat(agent): prompt on tool-call overflow + raise default cap to 32852f33ffix(model_cmd): surface allowlist when no installed model matches
v0.2.10 — classifier deshelling + macOS allowlist + LLM briefing
Highlights
Less friction on read-only shell wrappers. The deterministic policy classifier now deshells bash -c "<single-cmd>" one level (with a conservative metachar + literal-glob filter via shlex) and re-classifies the inner command. Wrapping ls, dscl . list /Users, grep TODO src/main.rs etc. in bash -c no longer triggers spurious confirmation prompts.
macOS read-only system tools recognised. Universal allowlist additions (system_profiler, sw_vers, ioreg, nettop, scutil, csrutil, vm_stat, iostat, last, w, who, users) plus per-program predicates with cross-platform discipline for binaries that have a write mode somewhere (dscl, pfctl, defaults, launchctl, networksetup, sysctl).
Privilege escalation always reaches ConfirmStrong. The UsesPrivilegeEscalation flag (sudo / doas / su) is now set in the pipeline post-deshell rather than at the enrich layer, so bash -c "sudo …" correctly reaches ConfirmStrong with phrase — the previous detection only saw the outer bash program.
Audited default sensitive_path_patterns. SSH keys (~/.ssh/**, **/id_rsa, **/id_ed25519, **/id_ecdsa), cloud credentials (~/.aws/**, ~/.config/gcloud/**, ~/.config/gh/**, ~/.docker/config.json, ~/.kube/**), generic dotfiles (~/.netrc, ~/.pgpass), project secrets (.env, .env.*, **/.env, **/.env.*, **/credentials*, **/secrets.*, **/*.pem, **/*.key), and system-sensitive paths (/etc/sudoers, /etc/sudoers.d/**, /etc/shadow, /etc/passwd).
Better LLM briefing. The run_process tool description now teaches the model to prefer argv-direct (program=ls, args=["-la"]) over bash -c "ls -la" with ❌/✅ examples and a claimed_risk taxonomy. The persona block mirrors the nudge.
Tests
- 6 unit tests for
extract_shell_payload(positive deshell, metachar refusal, recursion bound, sudo wrapper non-deshell). - 12 entries verified in the universal allowlist.
- 6 per-program predicate tests (
dscl,pfctl,defaults,launchctl,networksetup,sysctl). - 3 pipeline tests for post-deshell privesc detection.
- 4 e2e tests:
e2e_classifier_bash_deshell,e2e_privilege_escalation,e2e_privesc_through_bash,e2e_persona_avoids_bash_wrapping.
Total: 300 tests passing across 46 suites (up from 258 in v0.2.9). cargo fmt --check clean, cargo clippy -D warnings clean.
Upgrade notes
- Rust workspace, MSRV 1.78. No breaking API changes.
- Default sensitive-path patterns expanded — if you have a custom
~/.config/llmsh/config.tomlwith an explicitpolicy.sensitive_paths.patternsarray, your overrides still apply unchanged. - The new shell deshelling is opt-out via the existing
auto_classify_run_process: falsePipeline flag (already there since v0.2.x).
🤖 Generated with Claude Code
v0.2.9 — slash autocomplete + multi-line input + compactor stage-B
Highlights
REPL ergonomics
- Slash autocomplete — Tab now opens a
ColumnarMenudriven by a newSlashCompleterthat suggests/commandsand their subcommands (e.g./memory list,/clear-context). - Multi-line input — Shift+Enter / Alt+Enter insert a newline. Shift+Enter requires a terminal that distinguishes it from Enter (kitty / iTerm with CSI-u); Alt+Enter works everywhere.
Thinking provider
- New
ThinkingProviderwraps the underlyingLlmProviderto surface model reasoning in a uniform way for downstream UX.
Compactor stage-B observability
- New
StageBOutcomeenum (NotAttempted/Skipped/Succeeded/Failed) is now part ofCompactionReport. - Manual
/compactprints the stage-B outcome ((summary stage: ok — facts updated)/failed — …/skipped — …). - Stage-B errors are no longer silently swallowed: they're logged and persisted to the audit log.
Audit schema v5
ContextCompactedevents gain three optional fields:stage_b_outcome,stage_b_skip_reason,stage_b_error.- Emission moved from
agent.rs/repl.rsinto the compactor itself — single source of truth for the event. - Backwards-compatible read path: missing fields deserialize as
None.
Install
cargo install --git https://github.com/swoelffel/llmshell --tag v0.2.9 llmsh-cliPre-built binaries land in v0.3 — see ROADMAP.md.
Verify
$ llmsh --version
llmsh-cli 0.2.9
v0.2.8 — tilde expansion + glob tool
First public release since v0.2.1 — covers seven internal iterations (v0.2.2 → v0.2.8) shipped through May 2026.
Highlights since v0.2.1
v0.2.8 — Tilde expansion & glob tool (this release)
~and~/…are now expanded byrun_process,read_file,list_directory(no shell —$HOMEsubstitution only).- New typed tool
glob(read-only) — pattern → list of absolute paths, capped at 1000. Lets the agent composeglob+run_processfordu -sh ~/Library/Caches/*-style requests. - Hardened
run_processschema description and persona prompt to point the agent atglobfirst.
v0.2.7 — Policy & PWD overhaul
- Dropped the workspace-boundary concept: the workspace is the host machine, scoped by the running user's filesystem rights.
- Removed automatic
Denyon sensitive paths and outside-workspace; everything reachable now flows through a strong confirmation prompt withNas default. - Unified PWD plumbing via
Arc<RwLock<PathBuf>>shared across REPL, policy and tools —!cd /dirandrun_process(cd, …)actually move PWD when the target exists. - One-shot recovery for orphan
tool_callsleft over from prior sessions (fixes OpenAI 400 on reload).
v0.2.6 / v0.2.5 — Memory schema v2
- Persisted conversation messages, curated long-term facts, soft-delete via
cleared_at/cleared_source. - New REPL meta commands:
/clear-context,/clear-memory,/clear-all,/memory list.
v0.2.4 / v0.2.3 — Context compaction
- Deterministic
truncate_tool_outputsstage + LLMsummarize_prefixstage, with/compactmeta command and auto-trigger from the agent loop. - JSON-structured compactor output feeding curated long-term facts.
v0.2.2 — Verbose mode & status line
-v/-vvflags, tier-1/2 stderr output, reedline status prompt.- Per-turn
SessionStats, cached-input token accounting, model context-window & pricing tables.
Install
cargo install --git https://github.com/swoelffel/llmshell --tag v0.2.8 llmsh-cliPre-built binaries land in v0.3 — see ROADMAP.md.
Verify
$ llmsh --version
llmsh-cli 0.2.8
v0.2.1 — first public release
LLMShell v0.2.1 — first public release
llmsh is an agentic terminal shell: a REPL takes natural-language input, an LLM agent plans and emits tool calls, a policy engine classifies risk and gates execution, tools run with timeout/cancellation, and every step is appended to a tamper-evident audit log.
Highlights since project start
- Agent loop (
llmsh-core) — bounded iterate-until-done with schema enrichment, policy classification, sensitive-path checks. - Policy engine (
llmsh-policy) —Allow/Confirm/Denyclassification with phrase + sensitive-path heuristics. - Tools (
llmsh-tools) —read_file,list_directory,run_processwith per-tool timeout and cancellation. - Audit (
llmsh-audit) — append-only JSONL with hash-chaineddigestand redaction at the LLM boundary. - OpenAI-compatible provider (
llmsh-llm-openai) —/v1/modelsdiscovery, runtime model switch via/model. - REPL (
llmsh-core::repl) — reedline-backed input, slash commands,/initauto-bootstrap, system-prompt builder structured for OpenAI's automatic prompt cache. - Confirmation prompt — surfaces command details and policy flags before risky tool execution.
Build
cargo build --release
export OPENAI_API_KEY=sk-...
./target/release/llmshRequirements
- Rust 1.78+ (stable toolchain pinned via
rust-toolchain.toml) - An OpenAI-compatible API endpoint
🤖 Generated with Claude Code