Skip to content

Enhance distillation logic and optimize filter handling with passthrough#80

Merged
fajarhide merged 14 commits intomainfrom
feat/candidate-0.5.8
May 7, 2026
Merged

Enhance distillation logic and optimize filter handling with passthrough#80
fajarhide merged 14 commits intomainfrom
feat/candidate-0.5.8

Conversation

@fajarhide
Copy link
Copy Markdown
Owner

@fajarhide fajarhide commented May 7, 2026

PR Auto Describe

Summary
The 0.5.8‑rc1 release introduces a suite of safety and performance guardrails, a new passthrough mode, and a token‑based distillation threshold for small files. Core pipelines now use Rust 2024 idioms (LazyLock, let‑chains, Cow) and a stricter agent‑attribution model that defaults to “terminal.” Extensive test‑suite refactor, CI hardening, and updated i18n docs accompany the changes, with new helper utilities for shell parsing and environment sanitization.


Key Changes

Area Highlight
Guardrails Hot‑file detection, build‑failure preservation, cargo check pre‑build diagnostics.
Passthrough & Thresholding OMNI_PASSTHROUGH env var, 2000‑token min‑distill threshold, MIN_REDUCTION_PCT guardrail.
Distiller Enhancements Updated GenericDistiller noise‑omission marker, token‑hinted read‑file logic, config‑file passthrough for cat.
Performance Filter fingerprint caching, thread‑safe registry, pruning of lazy_static.
Agent Attribution Default agent now “terminal,” stats grouping by display name, updated agent_display_name.
Rust 2024 Idioms Added style guide in CLAUDE.md; code now uses LazyLock, let‑chains, Cow, strict error handling.
Testing & CI 300+ tests modernized, snapshot sync, CI build test re‑enabled, new tests for passthrough and token threshold.
Docs i18n READMEs updated for high‑speed bypass, omission visibility, and passthrough.
Utilities shell_split_tokens, extract_base_executable, should_passthrough_config_output.
Misc is_passthrough() helper, InputCheck::Warn, and new MAX_OUTPUT_BYTES.

Detailed Breakdown

1. Core Pipeline (src/hooks/pipe.rs)

  • Added MIN_REDUCTION_PCT (95 %) guardrail: if output is > 95 % of input, return raw input.
  • MAX_OUTPUT_BYTES set to 50 kB; truncation now appends [OMNI: output truncated].
  • command_name now stripped of leading omni exec before routing.
  • Distillation route logic moved after potential passthrough; Route::Soft now always labels with “[Partial signal – omni learn recommended]”.
  • persist and emit_output updated to use the new command_to_use variable.

2. Distillers (src/distillers/mod.rs, generic.rs, readfile.rs, git.rs)

  • extract_base_executable() parses quoted/env‑prefixed commands to find the real binary.
  • should_passthrough_config_output() bypasses distillation for small config files (.env/.toml/.yaml/...).
  • GenericDistiller now excludes Noise segments unless all content is Noise, and appends [X noise lines omitted] when dropping noise.
  • readfile.rs introduces MIN_DISTILL_TOKENS (2000) and uses token_estimate with content hints to decide passthrough.
  • Replaced lazy_static! with std::sync::LazyLock for git‑hash regex.

3. Guard & Env (src/guard/env.rs, limits.rs, config.rs, trust.rs, update.rs)

  • is_passthrough() added; tests now cover value parsing.
  • check_input() now returns InputCheck::Warn for > 1 MB but < 16 MB, triggering warning logs.
  • Removed unsafe env calls from tests; wrapped in unsafe blocks per Rust‑2024 edition.
  • load_config() and route thresholds updated to handle backward compatibility.

4. Agent Detection & Stats (src/agents/multiagent.rs, src/cli/stats.rs)

  • Default agent ID changed from claude_codeterminal.
  • agent_display_name now groups untagged agents under “Terminal” and omits “Claude Code” for terminal.
  • Stats grouping now aggregates by display name, sorting by command count, and displays percentage savings.

5. CLI (src/cli/learn.rs, src/cli/session.rs, src/cli/stats.rs)

  • run_learn now receives a reference to the filter vector.
  • Session tests renamed to avoid “test_” prefix; behavior assertions updated.
  • Stats default and detail tests renamed and now skip errors on empty DB.

6. Hooks (src/hooks/dispatcher.rs, src/hooks/post_tool.rs)

  • dispatcher::run now wraps catch_unwind with AssertUnwindSafe.
  • post_tool::process_payload respects is_passthrough(), skipping distillation when enabled.
  • Updated command routing logic to use extract_base_executable.

7. Documentation & Standards (CLAUDE.md, CHANGELOG.md, README.md, i18n READMEs)

  • Added comprehensive Rust 2024 idioms section.
  • Updated i18n READMEs to reflect high‑speed bypass, omission visibility, and passthrough.
  • Updated CHANGELOG to list new guardrails and performance improvements.

8. Build Configuration

  • Removed lazy_static from Cargo.toml and Cargo.lock; replaced with LazyLock.
  • Updated tests to use cargo fmt, cargo clippy -D warnings, and cargo test.

9. Tests

  • Refactored ~300 tests to modern naming (action‑verb style) and removed Indonesian terms.
  • Added tests for:
    • Base executable extraction
    • Small config passthrough
    • Noise omission labeling
    • Pass‑through guardrail
    • Token‑threshold distillation
    • is_passthrough() behavior
    • Agent ID default
    • Stats grouping

Notes

  • The new MIN_REDUCTION_PCT guardrail reduces unnecessary context injection, preserving prompt size.
  • cat small config passthrough prevents inadvertent exposure of secrets via distillation.

Breaking Changes

File Change Impact
src/agents/multiagent.rs Default agent ID changed from claude_code to terminal Existing tooling that relies on the old default must adjust.
src/cli/stats.rs agent_display_name mapping

Last updated: 2026-05-07 15:32:05

fajarhide added 14 commits May 7, 2026 15:29
… label noise omissions

- Updated the distillation logic to always prioritize Critical and Important signals while filling remaining lines with Context, explicitly avoiding Noise.
- Added a fallback mechanism to retain a small sample if all content is classified as Noise.
- Introduced a label in the output to indicate the number of omitted noise lines, improving clarity in the distillation results.
- Added tests to verify the correct labeling of omitted noise lines.
…ting

- Introduced a minimum token threshold for distillation, ensuring only content above 2000 tokens is processed.
- Added a function to provide content hints based on file extensions, improving token estimation accuracy.
- Updated tests to verify behavior for content below and above the token threshold.
- Introduced a fingerprinting system to cache filters, reducing redundant loading.
- Enhanced the `load_all_filters` function to utilize a mutex for thread-safe access.
- Added utility functions for extracting base executables from commands, improving command handling.
- Updated inline tests to ensure proper functionality of the new caching and extraction logic.
… handling

- Added `should_passthrough_config_output` function to determine when to bypass distillation for small config files.
- Introduced `is_passthrough` function to check for the `OMNI_PASSTHROUGH` environment variable, allowing raw output emission.
- Updated `PipelineResult` to enforce output guardrails, ensuring minimal reduction in output size.
- Enhanced `process_payload` to respect passthrough settings, returning raw input when enabled.
- Added tests to verify passthrough behavior for small config files and output guardrail logic.
…d for distillation, and update system environment snapshot output
…dernize the test suite with standardized English naming conventions
@fajarhide fajarhide merged commit 4924d8d into main May 7, 2026
4 checks passed
@fajarhide fajarhide deleted the feat/candidate-0.5.8 branch May 7, 2026 15:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant