Enhance distillation logic and optimize filter handling with passthrough#80
Merged
Enhance distillation logic and optimize filter handling with passthrough#80
Conversation
… label noise omissions - Updated the distillation logic to always prioritize Critical and Important signals while filling remaining lines with Context, explicitly avoiding Noise. - Added a fallback mechanism to retain a small sample if all content is classified as Noise. - Introduced a label in the output to indicate the number of omitted noise lines, improving clarity in the distillation results. - Added tests to verify the correct labeling of omitted noise lines.
…ting - Introduced a minimum token threshold for distillation, ensuring only content above 2000 tokens is processed. - Added a function to provide content hints based on file extensions, improving token estimation accuracy. - Updated tests to verify behavior for content below and above the token threshold.
- Introduced a fingerprinting system to cache filters, reducing redundant loading. - Enhanced the `load_all_filters` function to utilize a mutex for thread-safe access. - Added utility functions for extracting base executables from commands, improving command handling. - Updated inline tests to ensure proper functionality of the new caching and extraction logic.
… handling - Added `should_passthrough_config_output` function to determine when to bypass distillation for small config files. - Introduced `is_passthrough` function to check for the `OMNI_PASSTHROUGH` environment variable, allowing raw output emission. - Updated `PipelineResult` to enforce output guardrails, ensuring minimal reduction in output size. - Enhanced `process_payload` to respect passthrough settings, returning raw input when enabled. - Added tests to verify passthrough behavior for small config files and output guardrail logic.
…d for distillation, and update system environment snapshot output
…pe display colors
…nt stats grouping in CLI output
…cording and logging
…nd define session retention constant
…dernize the test suite with standardized English naming conventions
…n synchronization
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR Auto Describe
Summary
The 0.5.8‑rc1 release introduces a suite of safety and performance guardrails, a new passthrough mode, and a token‑based distillation threshold for small files. Core pipelines now use Rust 2024 idioms (LazyLock, let‑chains, Cow) and a stricter agent‑attribution model that defaults to “terminal.” Extensive test‑suite refactor, CI hardening, and updated i18n docs accompany the changes, with new helper utilities for shell parsing and environment sanitization.
Key Changes
cargo checkpre‑build diagnostics.OMNI_PASSTHROUGHenv var, 2000‑token min‑distill threshold,MIN_REDUCTION_PCTguardrail.GenericDistillernoise‑omission marker, token‑hinted read‑file logic, config‑file passthrough forcat.lazy_static.agent_display_name.LazyLock, let‑chains,Cow, strict error handling.shell_split_tokens,extract_base_executable,should_passthrough_config_output.is_passthrough()helper,InputCheck::Warn, and newMAX_OUTPUT_BYTES.Detailed Breakdown
1. Core Pipeline (
src/hooks/pipe.rs)MIN_REDUCTION_PCT(95 %) guardrail: if output is > 95 % of input, return raw input.MAX_OUTPUT_BYTESset to 50 kB; truncation now appends[OMNI: output truncated].command_namenow stripped of leadingomni execbefore routing.Route::Softnow always labels with “[Partial signal – omni learn recommended]”.persistandemit_outputupdated to use the newcommand_to_usevariable.2. Distillers (
src/distillers/mod.rs,generic.rs,readfile.rs,git.rs)extract_base_executable()parses quoted/env‑prefixed commands to find the real binary.should_passthrough_config_output()bypasses distillation for small config files (.env/.toml/.yaml/...).GenericDistillernow excludes Noise segments unless all content is Noise, and appends[X noise lines omitted]when dropping noise.readfile.rsintroducesMIN_DISTILL_TOKENS(2000) and usestoken_estimatewith content hints to decide passthrough.lazy_static!withstd::sync::LazyLockfor git‑hash regex.3. Guard & Env (
src/guard/env.rs,limits.rs,config.rs,trust.rs,update.rs)is_passthrough()added; tests now cover value parsing.check_input()now returnsInputCheck::Warnfor > 1 MB but < 16 MB, triggering warning logs.unsafeblocks per Rust‑2024 edition.load_config()and route thresholds updated to handle backward compatibility.4. Agent Detection & Stats (
src/agents/multiagent.rs,src/cli/stats.rs)claude_code→terminal.agent_display_namenow groups untagged agents under “Terminal” and omits “Claude Code” for terminal.5. CLI (
src/cli/learn.rs,src/cli/session.rs,src/cli/stats.rs)run_learnnow receives a reference to the filter vector.6. Hooks (
src/hooks/dispatcher.rs,src/hooks/post_tool.rs)dispatcher::runnow wrapscatch_unwindwithAssertUnwindSafe.post_tool::process_payloadrespectsis_passthrough(), skipping distillation when enabled.extract_base_executable.7. Documentation & Standards (
CLAUDE.md,CHANGELOG.md,README.md, i18n READMEs)8. Build Configuration
lazy_staticfromCargo.tomlandCargo.lock; replaced withLazyLock.cargo fmt,cargo clippy -D warnings, andcargo test.9. Tests
is_passthrough()behaviorNotes
MIN_REDUCTION_PCTguardrail reduces unnecessary context injection, preserving prompt size.catsmall config passthrough prevents inadvertent exposure of secrets via distillation.Breaking Changes
src/agents/multiagent.rsclaude_codetoterminalsrc/cli/stats.rsagent_display_namemappingLast updated: 2026-05-07 15:32:05