diff --git a/README.md b/README.md index 507e897..07525f8 100644 --- a/README.md +++ b/README.md @@ -237,24 +237,28 @@ processor is in [`docs/processors/`](docs/processors/). | 1 | **Package List** | 15 | pip list/freeze, npm ls, conda list, gem list, brew list | [package_list.md](docs/processors/package_list.md) | | 2 | **Git** | 20 | status, diff, log, show, push/pull/fetch, branch, stash, reflog, blame, cherry-pick, rebase, merge | [git.md](docs/processors/git.md) | | 3 | **Test** | 21 | pytest, jest, vitest, mocha, cargo test, go test, rspec, phpunit, bun test, npm/yarn/pnpm test, dotnet test, swift test, mix test | [test_output.md](docs/processors/test_output.md) | -| 4 | **Build** | 25 | npm/yarn/pnpm build/install, cargo build, make, cmake, gradle, mvn, pip install, tsc, webpack, vite, next build, turbo, nx, bazel, sbt, mix compile, docker build | [build_output.md](docs/processors/build_output.md) | -| 5 | **Lint** | 27 | eslint, ruff, flake8, pylint, clippy, mypy, prettier, biome, shellcheck, hadolint, rubocop, golangci-lint | [lint_output.md](docs/processors/lint_output.md) | -| 6 | **Network** | 30 | curl, wget, http/https (httpie) | [network.md](docs/processors/network.md) | -| 7 | **Docker** | 31 | ps, images, logs, pull/push, inspect, stats, compose up/down/build/ps/logs | [docker.md](docs/processors/docker.md) | -| 8 | **Kubernetes** | 32 | kubectl/oc get, describe, logs, top, apply, delete, create | [kubectl.md](docs/processors/kubectl.md) | -| 9 | **Terraform** | 33 | terraform/tofu plan, apply, destroy, init, output, state list/show | [terraform.md](docs/processors/terraform.md) | -| 10 | **Environment** | 34 | env, printenv (with secret redaction) | [env.md](docs/processors/env.md) | -| 11 | **Search** | 35 | grep -r, rg, ag, fd, fdfind | [search.md](docs/processors/search.md) | -| 12 | **System Info** | 36 | du, wc, df | [system_info.md](docs/processors/system_info.md) | -| 13 | **GitHub CLI** | 37 | gh pr/issue/run list/view/diff/checks/status | [gh.md](docs/processors/gh.md) | -| 14 | **Database Query** | 38 | psql, mysql, sqlite3, pgcli, mycli, litecli | [db_query.md](docs/processors/db_query.md) | -| 15 | **Cloud CLI** | 39 | aws, gcloud, az (JSON/table/text output compression) | [cloud_cli.md](docs/processors/cloud_cli.md) | -| 16 | **Ansible** | 40 | ansible-playbook, ansible (ok/skipped counting, error preservation) | [ansible.md](docs/processors/ansible.md) | -| 17 | **Helm** | 41 | helm install/upgrade/list/template/status/history | [helm.md](docs/processors/helm.md) | -| 18 | **Syslog** | 42 | journalctl, dmesg (head/tail with error extraction) | [syslog.md](docs/processors/syslog.md) | -| 19 | **File Listing** | 50 | ls, find, tree, exa, eza, rsync | [file_listing.md](docs/processors/file_listing.md) | -| 20 | **File Content** | 51 | cat, head, tail, bat, less, more (content-aware: code, config, log, CSV) | [file_content.md](docs/processors/file_content.md) | -| 21 | **Generic** | 999 | Any command (fallback: ANSI strip, dedup, truncation) | [generic.md](docs/processors/generic.md) | +| 4 | **Python Install** | 24 | pip install, poetry install/update/add, uv pip install, uv sync | [python_install.md](docs/processors/python_install.md) | +| 5 | **Build** | 25 | npm/yarn/pnpm build/install, cargo build, make, cmake, tsc, webpack, vite, next build, turbo, nx, bazel, sbt, mix compile, docker build | [build_output.md](docs/processors/build_output.md) | +| 6 | **Cargo Clippy** | 26 | cargo clippy (multi-line block grouping with span/help preservation) | [cargo_clippy.md](docs/processors/cargo_clippy.md) | +| 7 | **Lint** | 27 | eslint, ruff, flake8, pylint, clippy, mypy, prettier, biome, shellcheck, hadolint, rubocop, golangci-lint | [lint_output.md](docs/processors/lint_output.md) | +| 8 | **Maven/Gradle** | 28 | mvn, ./mvnw, gradle, ./gradlew (download stripping, task noise removal) | [maven_gradle.md](docs/processors/maven_gradle.md) | +| 9 | **Network** | 30 | curl, wget, http/https (httpie) | [network.md](docs/processors/network.md) | +| 10 | **Docker** | 31 | ps, images, logs, pull/push, inspect, stats, compose up/down/build/ps/logs | [docker.md](docs/processors/docker.md) | +| 11 | **Kubernetes** | 32 | kubectl/oc get, describe, logs, top, apply, delete, create | [kubectl.md](docs/processors/kubectl.md) | +| 12 | **Terraform** | 33 | terraform/tofu plan, apply, destroy, init, output, state list/show | [terraform.md](docs/processors/terraform.md) | +| 13 | **Environment** | 34 | env, printenv (with secret redaction) | [env.md](docs/processors/env.md) | +| 14 | **Search** | 35 | grep -r, rg, ag, fd, fdfind | [search.md](docs/processors/search.md) | +| 15 | **System Info** | 36 | du, wc, df | [system_info.md](docs/processors/system_info.md) | +| 16 | **GitHub CLI** | 37 | gh pr/issue/run list/view/diff/checks/status | [gh.md](docs/processors/gh.md) | +| 17 | **Database Query** | 38 | psql, mysql, sqlite3, pgcli, mycli, litecli | [db_query.md](docs/processors/db_query.md) | +| 18 | **Cloud CLI** | 39 | aws, gcloud, az (JSON/table/text output compression) | [cloud_cli.md](docs/processors/cloud_cli.md) | +| 19 | **Ansible** | 40 | ansible-playbook, ansible (ok/skipped counting, error preservation) | [ansible.md](docs/processors/ansible.md) | +| 20 | **Helm** | 41 | helm install/upgrade/list/template/status/history | [helm.md](docs/processors/helm.md) | +| 21 | **Syslog** | 42 | journalctl, dmesg (head/tail with error extraction) | [syslog.md](docs/processors/syslog.md) | +| 22 | **Structured Log** | 45 | stern, kubetail (JSON Lines grouping by level) | [structured_log.md](docs/processors/structured_log.md) | +| 23 | **File Listing** | 50 | ls, find, tree, exa, eza, rsync | [file_listing.md](docs/processors/file_listing.md) | +| 24 | **File Content** | 51 | cat, head, tail, bat, less, more (content-aware: code, config, log, CSV) | [file_content.md](docs/processors/file_content.md) | +| 25 | **Generic** | 999 | Any command (fallback: ANSI strip, dedup, truncation) | [generic.md](docs/processors/generic.md) | ## Configuration @@ -341,11 +345,13 @@ Project settings are merged with global settings. Token-Saver walks up parent di | `max_traceback_lines` | 30 | Max traceback lines before truncation | | `db_prune_days` | 90 | Stats retention in days | | `user_processors_dir` | `~/.token-saver/processors/` | Directory for custom processors | +| `disabled_processors` | `[]` | List of processor names to disable (env: comma-separated) | +| `max_chain_depth` | 3 | Maximum processor chain depth | | `debug` | false | Enable debug logging | ## Custom Processors -You can extend Token-Saver with your own processors for commands not covered by the built-in 21. +You can extend Token-Saver with your own processors for commands not covered by the built-in 25. 1. Create a Python file with a class inheriting from `src.processors.base.Processor` 2. Implement `can_handle()`, `process()`, `name`, and set `priority` diff --git a/docs/processors/cargo.md b/docs/processors/cargo.md index c9a16ba..2236cb5 100644 --- a/docs/processors/cargo.md +++ b/docs/processors/cargo.md @@ -20,7 +20,7 @@ cargo build, cargo check, cargo doc, cargo update, cargo bench. ## Exclusions - `cargo test` is routed to `TestOutputProcessor` -- `cargo clippy` is routed to `LintOutputProcessor` +- `cargo clippy` is routed to `CargoClippyProcessor` ## Configuration diff --git a/docs/processors/cargo_clippy.md b/docs/processors/cargo_clippy.md new file mode 100644 index 0000000..d8483c4 --- /dev/null +++ b/docs/processors/cargo_clippy.md @@ -0,0 +1,39 @@ +# Cargo Clippy Processor + +**File:** `src/processors/cargo_clippy.py` | **Priority:** 26 | **Name:** `cargo_clippy` + +Dedicated processor for Rust clippy lint output with multi-line block awareness. + +## Supported Commands + +cargo clippy (with any flags like `--all-targets`, `-- -W clippy::all`). + +## Strategy + +Parses clippy's multi-line warning blocks (header + `-->` span + code + `= help:` annotations) as coherent units. Groups warnings by clippy lint rule. Shows N example blocks per rule with full context. Preserves all errors in full. + +| Output Type | Strategy | +|---|---| +| **Warnings** | Group by lint rule (e.g., `clippy::needless_return`). Show count + N example blocks per rule. Categorize as style/correctness/complexity/perf | +| **Errors** | Keep all error blocks in full with spans and context | +| **Checking/Compiling** | Collapse into count (e.g., `[12 checked, 3 compiled]`) | +| **Summary** | Keep `warning: X generated N warnings` summary line | + +## Key Difference from Lint Processor + +The generic `LintOutputProcessor` groups violations as single lines. Clippy output has multi-line blocks with `-->` spans, code snippets, and `= help:` annotations that need to be preserved as coherent units. This processor keeps the block structure intact. + +## Configuration + +| Parameter | Default | Description | +|---|---|---| +| cargo_warning_example_count | 2 | Number of example warning blocks to show per rule | +| cargo_warning_group_threshold | 3 | Minimum occurrences before warnings are grouped | + +## Chaining + +After clippy-specific processing, output is chained to the `lint` processor (`chain_to = ["lint"]`). This allows any non-clippy-specific warnings in the output to be grouped by the generic lint rule parser. + +## Fallback + +If this processor is disabled, `cargo clippy` falls back to the `LintOutputProcessor` which handles it at a line-by-line level. diff --git a/docs/processors/maven_gradle.md b/docs/processors/maven_gradle.md new file mode 100644 index 0000000..24bf25e --- /dev/null +++ b/docs/processors/maven_gradle.md @@ -0,0 +1,40 @@ +# Maven/Gradle Processor + +**File:** `src/processors/maven_gradle.py` | **Priority:** 28 | **Name:** `maven_gradle` + +Dedicated processor for Maven and Gradle build output. + +## Supported Commands + +mvn, ./mvnw, gradle, ./gradlew (all subcommands). + +## Strategy + +### Maven + +| Output Type | Strategy | +|---|---| +| **Download lines** | Strip `[INFO] Downloading from` and `[INFO] Downloaded from` lines. Show count | +| **Module lines** | Count `[INFO] Building module-name` lines | +| **Errors** | Keep all `[ERROR]` and `[FATAL]` lines | +| **Warnings** | Keep first 5 `[WARNING]` lines, summarize rest | +| **Test results** | Keep `Tests run: N, Failures: N` lines | +| **Reactor summary** | Keep reactor summary block | +| **Build result** | Keep `BUILD SUCCESS`/`BUILD FAILURE` and timing | + +### Gradle + +| Output Type | Strategy | +|---|---| +| **Task lines** | Strip `UP-TO-DATE`, `NO-SOURCE`, `SKIPPED`, `FROM-CACHE` tasks. Keep executed tasks. Show counts | +| **Errors** | Keep `FAILURE:` blocks, error details, `What went wrong` sections | +| **Test results** | Keep test result summary lines | +| **Build result** | Keep `BUILD SUCCESSFUL`/`BUILD FAILED` and actionable task summary | + +## Configuration + +No dedicated configuration keys. Uses default compression thresholds. + +## Removed Noise + +Maven: `[INFO] Downloading/Downloaded` lines, separator lines (`-----`), empty `[INFO]` lines. Gradle: `UP-TO-DATE`/`NO-SOURCE` task lines, progress indicators. diff --git a/docs/processors/python_install.md b/docs/processors/python_install.md new file mode 100644 index 0000000..d028b1c --- /dev/null +++ b/docs/processors/python_install.md @@ -0,0 +1,29 @@ +# Python Install Processor + +**File:** `src/processors/python_install.py` | **Priority:** 24 | **Name:** `python_install` + +Dedicated processor for Python package installation output. + +## Supported Commands + +pip install, pip3 install, poetry install/update/add, uv pip install, uv sync. + +## Strategy + +| Tool | Strategy | +|---|---| +| **pip install** | Strip `Collecting` and `Downloading` lines. Remove progress bars. Count packages installed. Show `already satisfied` count. Preserve all errors and warnings. Show installed package summary (first 10 + count) | +| **poetry install/update/add** | Strip `Resolving dependencies` progress. Count installed/updated/removed packages. Show package names with versions. Preserve errors | +| **uv pip install/sync** | Strip download progress. Keep `Resolved N packages` and `Installed N packages` summaries. Preserve errors | + +## Exclusions + +- `pip list` and `pip freeze` are routed to `PackageListProcessor` + +## Configuration + +No dedicated configuration keys. Uses default compression thresholds. + +## Removed Noise + +`Collecting X>=1.0` lines, `Downloading X-1.0.whl` lines, pip progress bars, `Installing collected packages:` line, `Using cached` lines, `Resolving dependencies...` output from poetry. diff --git a/docs/processors/structured_log.md b/docs/processors/structured_log.md new file mode 100644 index 0000000..34ff6fd --- /dev/null +++ b/docs/processors/structured_log.md @@ -0,0 +1,35 @@ +# Structured Log Processor + +**File:** `src/processors/structured_log.py` | **Priority:** 45 | **Name:** `structured_log` + +Processor for JSON Lines log output from log tailing tools. + +## Supported Commands + +stern, kubetail. + +## Strategy + +| Content Type | Strategy | +|---|---| +| **JSON Lines (>50% valid JSON)** | Parse each JSON object. Group entries by log level (error/warn/info/debug/trace). Show count per level. Extract and display error messages (up to 10). Detect level from common keys: `level`, `severity`, `log_level`, `lvl` | +| **Non-JSON output** | Fall back to log compression (head/tail with error preservation) | + +## Level Detection + +Checks these JSON keys in order: `level`, `severity`, `log_level`, `loglevel`, `lvl`, `log.level`. Falls back to regex matching on message content for `ERROR`/`WARN` patterns. + +## Message Extraction + +Checks these JSON keys in order: `msg`, `message`, `text`, `log`, `body`. Truncates messages longer than 200 characters. + +## Configuration + +| Parameter | Default | Description | +|---|---|---| +| kubectl_keep_head | 5 | Lines to keep from start (non-JSON fallback) | +| kubectl_keep_tail | 10 | Lines to keep from end (non-JSON fallback) | + +## Future Use + +This processor can be activated via `chain_to` from other processors for outputs that contain embedded JSON Lines. diff --git a/src/config.py b/src/config.py index 633adb9..5eb3fda 100644 --- a/src/config.py +++ b/src/config.py @@ -51,6 +51,8 @@ "cargo_warning_example_count": 2, "cargo_warning_group_threshold": 3, "jq_passthrough_threshold": 50, + "disabled_processors": [], + "max_chain_depth": 3, "debug": False, } @@ -129,6 +131,8 @@ def _load_config() -> dict[str, Any]: elif isinstance(default_val, float): with contextlib.suppress(ValueError): config[key] = float(env_val) + elif isinstance(default_val, list): + config[key] = [s.strip() for s in env_val.split(",") if s.strip()] else: config[key] = env_val config.setdefault("_config_source", {})[key] = f"env:{env_key}" diff --git a/src/engine.py b/src/engine.py index abd8a79..eed63d5 100644 --- a/src/engine.py +++ b/src/engine.py @@ -23,7 +23,12 @@ class CompressionEngine: _by_name: dict[str, Processor] def __init__(self) -> None: - self.processors = discover_processors() + all_processors = discover_processors() + raw_disabled = config.get("disabled_processors") or [] + disabled = set(raw_disabled if isinstance(raw_disabled, list) else []) + # Never disable generic — it's the fallback and provides clean() + disabled.discard("generic") + self.processors = [p for p in all_processors if p.name not in disabled] self._generic = self.processors[-1] # Last = GenericProcessor (priority 999) self._by_name = {p.name: p for p in self.processors} @@ -51,16 +56,25 @@ def compress(self, command: str, output: str) -> tuple[str, str, bool]: if compressed is output or compressed == output: return output, processor.name, False - # Chain to secondary processor if declared (max depth = 1) - if ( - processor.chain_to - and processor.chain_to != processor.name - and processor.chain_to in self._by_name - ): - secondary = self._by_name[processor.chain_to] - chained = secondary.process(command, compressed) - if chained is not compressed and chained != compressed: - compressed = chained + # Chain to secondary processors if declared + chain_list = processor.chain_to + if chain_list: + if isinstance(chain_list, str): + chain_list = [chain_list] + max_depth = config.get("max_chain_depth") + visited = {processor.name} + depth = 0 + for chain_name in chain_list: + if depth >= max_depth: + break + if chain_name in visited or chain_name not in self._by_name: + continue + secondary = self._by_name[chain_name] + visited.add(chain_name) + chained = secondary.process(command, compressed) + if chained is not compressed and chained != compressed: + compressed = chained + depth += 1 # If a specialized processor handled it, also run generic # cleanup (ANSI strip, blank line collapse) but not truncation diff --git a/src/processors/__init__.py b/src/processors/__init__.py index b1e7eec..0a72645 100644 --- a/src/processors/__init__.py +++ b/src/processors/__init__.py @@ -108,8 +108,14 @@ def collect_hook_patterns() -> list[str]: """Collect all hook_patterns from discovered processors. Returns a flat list of regex pattern strings, used by hook_pretool.py. + Disabled processors are excluded so their commands are not intercepted. """ + from .. import config # noqa: PLC0415 + + raw_disabled = config.get("disabled_processors") or [] + disabled = set(raw_disabled if isinstance(raw_disabled, list) else []) patterns: list[str] = [] for processor in discover_processors(): - patterns.extend(processor.hook_patterns) + if processor.name not in disabled: + patterns.extend(processor.hook_patterns) return patterns diff --git a/src/processors/base.py b/src/processors/base.py index 3d84861..67c8f59 100644 --- a/src/processors/base.py +++ b/src/processors/base.py @@ -19,7 +19,7 @@ class Processor(ABC): priority: int = 50 hook_patterns: list[str] = [] - chain_to: str | None = None + chain_to: str | list[str] | None = None @abstractmethod def can_handle(self, command: str) -> bool: diff --git a/src/processors/build_output.py b/src/processors/build_output.py index be0fbb9..55ca72c 100644 --- a/src/processors/build_output.py +++ b/src/processors/build_output.py @@ -9,10 +9,9 @@ class BuildOutputProcessor(Processor): priority = 25 hook_patterns = [ r"^(npm\s+(run|install|build|ci|audit)|yarn\s+(run|install|build|add|audit)|pnpm\s+(run|install|build|add|audit))\b", - r"^(cargo\s+(build|check)|make\b|cmake\b|gradle\b|mvn\b|ant\b)", - r"^(pip3?\s+install|poetry\s+(install|update)|uv\s+(pip|sync))\b", + r"^(make|cmake|ant)\b", r"^(tsc|webpack|vite(\s+build)?|esbuild|rollup|next\s+build|nuxt\s+build)\b", - r"^(turbo\s+run|turbo\s+build|nx\s+(run|build)|bazel\s+build|sbt\s|mix\s+compile)\b", + r"^(turbo\s+run|turbo\s+build|nx\s+(run|build)|bazel\s+build|sbt\b|mix\s+compile)\b", r"^docker\s+(build|compose\s+build)\b", r"^bun\s+(install|build|run)\b", ] @@ -25,17 +24,19 @@ def can_handle(self, command: str) -> bool: # Exclude package listing commands (handled by PackageListProcessor) if re.search(r"\b(pip3?\s+(list|freeze)|npm\s+(ls|list)|conda\s+list)\b", command): return False - # Exclude cargo clippy (handled by LintOutputProcessor) - if re.search(r"\bcargo\s+clippy\b", command): + # Exclude Python install (handled by PythonInstallProcessor) + if re.search( + r"\b(pip3?\s+install|poetry\s+(install|update|add)|uv\s+(pip\s+install|sync))\b", + command, + ): return False - # Exclude cargo build/check (handled by CargoProcessor) - if re.search(r"\bcargo\s+(build|check)\b", command): + # Exclude Maven/Gradle (handled by MavenGradleProcessor) + if re.search(r"\b(mvn|mvnw|gradle|gradlew)\b", command): return False return bool( re.search( r"\b(npm\s+(run|install|ci|build|audit)|yarn\s+(run|install|build|add|audit)|pnpm\s+(run|install|build|add|audit)|" - r"cargo\s+(build|check)|make\b|cmake\b|gradle\b|mvn\b|ant\b|" - r"pip3?\s+install|poetry\s+(install|update)|uv\s+(pip|sync)|" + r"make\b|cmake\b|ant\b|" r"tsc\b|webpack\b|vite(\s+build)?|esbuild\b|rollup\b|next\s+build|nuxt\s+build|" r"docker\s+(build|compose\s+build)|" r"turbo\s+(run|build)|nx\s+(run|build)|bazel\s+build|sbt\b|mix\s+compile|" diff --git a/src/processors/cargo_clippy.py b/src/processors/cargo_clippy.py new file mode 100644 index 0000000..cfe83fd --- /dev/null +++ b/src/processors/cargo_clippy.py @@ -0,0 +1,189 @@ +"""Cargo clippy processor: dedicated Rust clippy lint handling.""" + +import re +from collections import defaultdict + +from .. import config +from .base import Processor + +_CLIPPY_CMD_RE = re.compile(r"\bcargo\s+clippy\b") +_WARNING_START_RE = re.compile(r"^warning(?:\[(\S+)\])?:\s+(.+)") +_ERROR_START_RE = re.compile(r"^error(?:\[(\S+)\])?:\s+(.+)") +_SPAN_LINE_RE = re.compile(r"^\s*(-->|\d+\s*\||=\s+)") +_WARNING_SUMMARY_RE = re.compile(r"^warning:\s+.+generated\s+\d+\s+warning") +_FINISHED_RE = re.compile(r"^\s*Finished\s+") +_CHECKING_RE = re.compile(r"^\s*Checking\s+\S+\s+v") +_COMPILING_RE = re.compile(r"^\s*Compiling\s+\S+\s+v") + +# Clippy lint categories +_CLIPPY_CATEGORIES = { + "needless_return": "style", + "redundant_closure": "style", + "len_zero": "style", + "manual_map": "style", + "single_match": "style", + "match_bool": "style", + "collapsible_if": "style", + "unused_imports": "correctness", + "unused_variables": "correctness", + "dead_code": "correctness", + "unreachable_code": "correctness", + "needless_borrow": "complexity", + "unnecessary_unwrap": "complexity", + "map_unwrap_or": "complexity", + "clone_on_copy": "perf", + "large_enum_variant": "perf", + "box_collection": "perf", +} + + +def _categorize_lint(rule: str) -> str: + """Categorize a clippy lint by its rule name.""" + # Strip clippy:: prefix if present + short = rule.replace("clippy::", "") + return _CLIPPY_CATEGORIES.get(short, "other") + + +class CargoClippyProcessor(Processor): + priority = 26 + chain_to = ["lint"] + hook_patterns = [ + r"^cargo\s+clippy\b", + ] + + @property + def name(self) -> str: + return "cargo_clippy" + + def can_handle(self, command: str) -> bool: + return bool(_CLIPPY_CMD_RE.search(command)) + + def process(self, command: str, output: str) -> str: + if not output or not output.strip(): + return output + + lines = output.splitlines() + result: list[str] = [] + checking_count = 0 + compiling_count = 0 + + # Parse warnings as multi-line blocks + warnings_by_rule: dict[str, list[list[str]]] = defaultdict(list) + error_blocks: list[list[str]] = [] + current_block: list[str] = [] + current_rule: str | None = None + in_error = False + current_error: list[str] = [] + finished_lines: list[str] = [] + summary_lines: list[str] = [] + + for line in lines: + stripped = line.strip() + + if _CHECKING_RE.match(stripped): + checking_count += 1 + continue + if _COMPILING_RE.match(stripped): + compiling_count += 1 + continue + + # Error start + if _ERROR_START_RE.match(stripped): + # Flush current warning block + if current_rule and current_block: + warnings_by_rule[current_rule].append(current_block) + current_block = [] + current_rule = None + # Start error block + if in_error and current_error: + error_blocks.append(current_error) + in_error = True + current_error = [line] + continue + + # Warning start + wm = _WARNING_START_RE.match(stripped) + if wm and not _WARNING_SUMMARY_RE.match(stripped): + # Flush previous + if in_error and current_error: + error_blocks.append(current_error) + in_error = False + current_error = [] + if current_rule and current_block: + warnings_by_rule[current_rule].append(current_block) + + rule = wm.group(1) or "other" + current_rule = rule + current_block = [line] + continue + + if _WARNING_SUMMARY_RE.match(stripped): + if current_rule and current_block: + warnings_by_rule[current_rule].append(current_block) + current_block = [] + current_rule = None + if in_error and current_error: + error_blocks.append(current_error) + in_error = False + current_error = [] + summary_lines.append(line) + continue + + if _FINISHED_RE.match(stripped): + if current_rule and current_block: + warnings_by_rule[current_rule].append(current_block) + current_block = [] + current_rule = None + if in_error and current_error: + error_blocks.append(current_error) + in_error = False + current_error = [] + finished_lines.append(line) + continue + + # Context lines (spans, code, help annotations) + if in_error: + current_error.append(line) + elif current_rule: + current_block.append(line) + + # Flush remaining + if in_error and current_error: + error_blocks.append(current_error) + if current_rule and current_block: + warnings_by_rule[current_rule].append(current_block) + + # Build compressed output + prep = [] + if checking_count: + prep.append(f"{checking_count} checked") + if compiling_count: + prep.append(f"{compiling_count} compiled") + if prep: + result.append(f"[{', '.join(prep)}]") + + # All errors (kept in full) + for block in error_blocks: + result.extend(block) + + # Grouped warnings by rule + example_count = config.get("cargo_warning_example_count") + group_threshold = config.get("cargo_warning_group_threshold") + + for rule, blocks in sorted(warnings_by_rule.items(), key=lambda x: -len(x[1])): + count = len(blocks) + category = _categorize_lint(rule) + if count >= group_threshold: + result.append(f"warning[{rule}] ({category}): {count} occurrences") + for block in blocks[:example_count]: + result.extend(f" {bline}" for bline in block) + if count > example_count: + result.append(f" ... ({count - example_count} more)") + else: + for block in blocks: + result.extend(block) + + result.extend(summary_lines) + result.extend(finished_lines) + + return "\n".join(result) if result else output diff --git a/src/processors/lint_output.py b/src/processors/lint_output.py index 96ed45a..8577ffb 100644 --- a/src/processors/lint_output.py +++ b/src/processors/lint_output.py @@ -10,9 +10,9 @@ class LintOutputProcessor(Processor): priority = 27 hook_patterns = [ - r"^(eslint|ruff(\s+check)?|flake8|pylint|clippy|rubocop|golangci-lint|stylelint|biome\s+(check|lint))\b", + r"^(eslint|ruff(\s+check)?|flake8|pylint|rubocop|golangci-lint|stylelint|biome\s+(check|lint))\b", r"^python3?\s+-m\s+(flake8|pylint|ruff|mypy)\b", - r"^(mypy|prettier\s+--check|shellcheck|hadolint|tflint|ktlint|swiftlint|cargo\s+clippy)\b", + r"^(mypy|prettier\s+--check|shellcheck|hadolint|tflint|ktlint|swiftlint)\b", r"^(oxlint|deno\s+lint)\b", ] diff --git a/src/processors/maven_gradle.py b/src/processors/maven_gradle.py new file mode 100644 index 0000000..22992fb --- /dev/null +++ b/src/processors/maven_gradle.py @@ -0,0 +1,224 @@ +"""Maven/Gradle processor: mvn, gradle, gradlew, mvnw builds.""" + +import re + +from .base import Processor + +_MVN_RE = re.compile(r"\b(mvn|\.?/?mvnw)\b") +_GRADLE_RE = re.compile(r"\b(gradle|\.?/?gradlew)\b") + +# Maven patterns +_MVN_DOWNLOAD_RE = re.compile(r"^\[INFO\]\s+(Downloading|Downloaded)\s+from\s+") +_MVN_MODULE_RE = re.compile(r"^\[INFO\]\s+Building\s+(.+?)\s+\[") +_MVN_SEPARATOR_RE = re.compile(r"^\[INFO\]\s+-{10,}") +_MVN_ERROR_RE = re.compile(r"^\[(ERROR|FATAL)\]") +_MVN_WARNING_RE = re.compile(r"^\[WARNING\]") +_MVN_BUILD_RESULT_RE = re.compile(r"^\[INFO\]\s+(BUILD\s+(SUCCESS|FAILURE))") +_MVN_TEST_RESULT_RE = re.compile(r"^\[INFO\]\s+Tests run:\s+(\d+)") +_MVN_REACTOR_RE = re.compile(r"^\[INFO\]\s+Reactor Summary") +_MVN_TOTAL_TIME_RE = re.compile(r"^\[INFO\]\s+Total time:") +_MVN_EMPTY_INFO_RE = re.compile(r"^\[INFO\]\s*$") + +# Gradle patterns +_GRADLE_TASK_RE = re.compile(r"^>\s+Task\s+:(\S+)") +_GRADLE_UPTODATE_RE = re.compile(r"\b(UP-TO-DATE|NO-SOURCE|SKIPPED|FROM-CACHE)\s*$") +_GRADLE_BUILD_RESULT_RE = re.compile(r"^(BUILD\s+(SUCCESSFUL|FAILED))") +_GRADLE_ACTIONABLE_RE = re.compile(r"^\d+\s+actionable\s+task") +_GRADLE_ERROR_RE = re.compile( + r"^(FAILURE:|>\s+.*[Ee]rror|e:\s+|" + r"\s+What went wrong|\s+Execution failed)" +) +_GRADLE_TEST_RESULT_RE = re.compile(r"^\d+\s+tests?\s+(completed|passed|failed)") + + +class MavenGradleProcessor(Processor): + priority = 28 + hook_patterns = [ + r"^(\.?/?mvnw?|\.?/?gradlew?)\b", + ] + + @property + def name(self) -> str: + return "maven_gradle" + + def can_handle(self, command: str) -> bool: + return bool(_MVN_RE.search(command) or _GRADLE_RE.search(command)) + + def process(self, command: str, output: str) -> str: + if not output or not output.strip(): + return output + + if _GRADLE_RE.search(command): + return self._process_gradle(output) + return self._process_maven(output) + + def _process_maven(self, output: str) -> str: + lines = output.splitlines() + result: list[str] = [] + download_count = 0 + module_count = 0 + errors: list[str] = [] + warnings: list[str] = [] + test_results: list[str] = [] + in_reactor = False + reactor_lines: list[str] = [] + build_result = "" + timing_line = "" + separator_count = 0 + + for line in lines: + stripped = line.strip() + + if _MVN_DOWNLOAD_RE.match(stripped): + download_count += 1 + continue + + if _MVN_MODULE_RE.match(stripped): + module_count += 1 + continue + + if _MVN_SEPARATOR_RE.match(stripped): + separator_count += 1 + continue + + if _MVN_EMPTY_INFO_RE.match(stripped): + continue + + if _MVN_REACTOR_RE.match(stripped): + in_reactor = True + reactor_lines.append(line) + continue + + if in_reactor: + if _MVN_BUILD_RESULT_RE.match(stripped) or _MVN_TOTAL_TIME_RE.match(stripped): + in_reactor = False + else: + reactor_lines.append(line) + continue + + if _MVN_BUILD_RESULT_RE.match(stripped): + build_result = line + continue + + if _MVN_TOTAL_TIME_RE.match(stripped): + timing_line = line + continue + + if _MVN_TEST_RESULT_RE.match(stripped): + test_results.append(line) + continue + + if _MVN_ERROR_RE.match(stripped): + errors.append(line) + continue + + if _MVN_WARNING_RE.match(stripped): + warnings.append(line) + continue + + # Build compressed output + summary_parts = [] + if module_count: + summary_parts.append(f"{module_count} modules") + if download_count: + summary_parts.append(f"{download_count} downloads") + if summary_parts: + result.append(f"[{', '.join(summary_parts)}]") + + if errors: + result.extend(errors) + + if warnings: + if len(warnings) > 5: + result.extend(warnings[:5]) + result.append(f"... ({len(warnings) - 5} more warnings)") + else: + result.extend(warnings) + + if test_results: + result.extend(test_results) + + if reactor_lines: + result.extend(reactor_lines) + + if build_result: + result.append(build_result) + if timing_line: + result.append(timing_line) + + return "\n".join(result) if result else output + + def _process_gradle(self, output: str) -> str: + lines = output.splitlines() + result: list[str] = [] + skipped_tasks = 0 + executed_tasks: list[str] = [] + errors: list[str] = [] + test_results: list[str] = [] + build_result = "" + actionable_line = "" + in_error_block = False + + for line in lines: + stripped = line.strip() + + m = _GRADLE_TASK_RE.match(stripped) + if m: + if _GRADLE_UPTODATE_RE.search(stripped): + skipped_tasks += 1 + else: + executed_tasks.append(m.group(1)) + in_error_block = False + continue + + if _GRADLE_BUILD_RESULT_RE.match(stripped): + build_result = line + in_error_block = False + continue + + if _GRADLE_ACTIONABLE_RE.match(stripped): + actionable_line = line + continue + + if _GRADLE_TEST_RESULT_RE.match(stripped): + test_results.append(line) + continue + + if _GRADLE_ERROR_RE.match(stripped): + in_error_block = True + errors.append(line) + continue + + if in_error_block and stripped: + errors.append(line) + continue + + # Build compressed output + summary_parts = [] + if executed_tasks: + summary_parts.append(f"{len(executed_tasks)} executed") + if skipped_tasks: + summary_parts.append(f"{skipped_tasks} up-to-date") + if summary_parts: + result.append(f"Tasks: {', '.join(summary_parts)}") + + if executed_tasks and len(executed_tasks) <= 10: + for task in executed_tasks: + result.append(f" :{task}") + elif executed_tasks: + for task in executed_tasks[:5]: + result.append(f" :{task}") + result.append(f" ... ({len(executed_tasks) - 5} more)") + + if errors: + result.extend(errors) + + if test_results: + result.extend(test_results) + + if build_result: + result.append(build_result) + if actionable_line: + result.append(actionable_line) + + return "\n".join(result) if result else output diff --git a/src/processors/python_install.py b/src/processors/python_install.py new file mode 100644 index 0000000..739b19c --- /dev/null +++ b/src/processors/python_install.py @@ -0,0 +1,216 @@ +"""Python install processor: pip install, poetry install/update/add, uv pip install/sync.""" + +import re + +from .base import Processor + +_PIP_INSTALL_RE = re.compile(r"\bpip3?\s+install\b") +_POETRY_RE = re.compile(r"\bpoetry\s+(install|update|add)\b") +_UV_RE = re.compile(r"\buv\s+(pip\s+install|sync)\b") + +_COLLECTING_RE = re.compile(r"^\s*Collecting\s+") +_DOWNLOADING_RE = re.compile(r"^\s*(Downloading|Using cached)\s+") +_PROGRESS_RE = re.compile(r"^\s*━|^\s*\[.*\]\s+\d+%|^\s*\d+\.\d+\s*(kB|MB|GB)") +_ALREADY_RE = re.compile(r"^\s*Requirement already satisfied") +_INSTALLING_RE = re.compile(r"^\s*Installing collected packages:") +_SUCCESS_RE = re.compile(r"^\s*Successfully installed\s+(.+)") +_RESOLVING_RE = re.compile(r"^\s*(Resolving dependencies|Updating dependencies)") +_POETRY_INSTALL_RE = re.compile(r"^\s*(Installing|Updating|Removing)\s+(\S+)\s+\((.+?)\)") +_UV_RESOLVED_RE = re.compile(r"^\s*Resolved\s+(\d+)\s+packages?") +_UV_INSTALLED_RE = re.compile(r"^\s*(Installed|Uninstalled)\s+(\d+)\s+packages?") +_ERROR_RE = re.compile( + r"\b(error|Error|ERROR|exception|Exception|" + r"Could not|cannot|Cannot|FAILED|failed|" + r"conflict|Conflict|incompatible)\b" +) +_WARNING_RE = re.compile(r"\b(warning|Warning|WARNING|DEPRECATION)\b") + + +class PythonInstallProcessor(Processor): + priority = 24 + hook_patterns = [ + r"^(pip3?\s+install|poetry\s+(install|update|add)|uv\s+(pip\s+install|sync))\b", + ] + + @property + def name(self) -> str: + return "python_install" + + def can_handle(self, command: str) -> bool: + if re.search(r"\bpip3?\s+(list|freeze)\b", command): + return False + return bool( + _PIP_INSTALL_RE.search(command) or _POETRY_RE.search(command) or _UV_RE.search(command) + ) + + def process(self, command: str, output: str) -> str: + if not output or not output.strip(): + return output + + if _POETRY_RE.search(command): + return self._process_poetry(output) + if _UV_RE.search(command): + return self._process_uv(output) + return self._process_pip(output) + + def _process_pip(self, output: str) -> str: + lines = output.splitlines() + result: list[str] = [] + collecting_count = 0 + downloading_count = 0 + already_count = 0 + installed_packages: list[str] = [] + errors: list[str] = [] + warnings: list[str] = [] + + for line in lines: + stripped = line.strip() + if not stripped: + continue + + if _COLLECTING_RE.match(stripped): + collecting_count += 1 + elif _DOWNLOADING_RE.match(stripped) or _PROGRESS_RE.match(stripped): + downloading_count += 1 + elif _ALREADY_RE.match(stripped): + already_count += 1 + elif _INSTALLING_RE.match(stripped): + continue + elif m := _SUCCESS_RE.match(stripped): + pkgs = m.group(1).split() + installed_packages.extend(pkgs) + elif _ERROR_RE.search(stripped): + errors.append(line) + elif _WARNING_RE.search(stripped): + warnings.append(line) + + if collecting_count: + result.append(f"[{collecting_count} packages collected]") + if downloading_count: + result.append(f"[{downloading_count} downloads]") + if already_count: + result.append(f"[{already_count} already satisfied]") + + if errors: + result.extend(errors) + + if warnings: + result.extend(warnings[:5]) + if len(warnings) > 5: + result.append(f"... ({len(warnings) - 5} more warnings)") + + if installed_packages: + result.append(f"Successfully installed {len(installed_packages)} packages:") + # Show first 10 packages, summarize rest + for pkg in installed_packages[:10]: + result.append(f" {pkg}") + if len(installed_packages) > 10: + result.append(f" ... ({len(installed_packages) - 10} more)") + + return "\n".join(result) if result else output + + def _process_poetry(self, output: str) -> str: + lines = output.splitlines() + result: list[str] = [] + installed: list[str] = [] + updated: list[str] = [] + removed: list[str] = [] + errors: list[str] = [] + resolving_skipped = 0 + + for line in lines: + stripped = line.strip() + if not stripped: + continue + + if _RESOLVING_RE.match(stripped): + resolving_skipped += 1 + continue + + m = _POETRY_INSTALL_RE.match(stripped) + if m: + action, pkg, version = m.groups() + if action == "Installing": + installed.append(f"{pkg} ({version})") + elif action == "Updating": + updated.append(f"{pkg} ({version})") + elif action == "Removing": + removed.append(pkg) + continue + + if _ERROR_RE.search(stripped): + errors.append(line) + + if resolving_skipped: + result.append(f"[dependency resolution: {resolving_skipped} steps]") + + if errors: + result.extend(errors) + + if installed: + result.append(f"Installed {len(installed)} packages:") + for pkg in installed[:10]: + result.append(f" {pkg}") + if len(installed) > 10: + result.append(f" ... ({len(installed) - 10} more)") + + if updated: + result.append(f"Updated {len(updated)} packages:") + for pkg in updated[:5]: + result.append(f" {pkg}") + if len(updated) > 5: + result.append(f" ... ({len(updated) - 5} more)") + + if removed: + result.append(f"Removed {len(removed)} packages") + + return "\n".join(result) if result else output + + def _process_uv(self, output: str) -> str: + lines = output.splitlines() + result: list[str] = [] + errors: list[str] = [] + resolved = 0 + installed = 0 + uninstalled = 0 + downloading_count = 0 + + for line in lines: + stripped = line.strip() + if not stripped: + continue + + m = _UV_RESOLVED_RE.match(stripped) + if m: + resolved = int(m.group(1)) + continue + + m = _UV_INSTALLED_RE.match(stripped) + if m: + action = m.group(1) + count = int(m.group(2)) + if action == "Installed": + installed = count + else: + uninstalled = count + continue + + if _DOWNLOADING_RE.match(stripped) or _PROGRESS_RE.match(stripped): + downloading_count += 1 + continue + + if _ERROR_RE.search(stripped): + errors.append(line) + + if resolved: + result.append(f"Resolved {resolved} packages") + if downloading_count: + result.append(f"[{downloading_count} downloads]") + if errors: + result.extend(errors) + if installed: + result.append(f"Installed {installed} packages") + if uninstalled: + result.append(f"Uninstalled {uninstalled} packages") + + return "\n".join(result) if result else output diff --git a/src/processors/structured_log.py b/src/processors/structured_log.py new file mode 100644 index 0000000..a7f5012 --- /dev/null +++ b/src/processors/structured_log.py @@ -0,0 +1,159 @@ +"""Structured log processor: JSON Lines output from stern, kubetail, and similar tools.""" + +import json +import re +from collections import defaultdict + +from .. import config +from .base import Processor +from .utils import compress_log_lines + +_STERN_RE = re.compile(r"\b(stern|kubetail)\b") + +# Common JSON log level keys +_LEVEL_KEYS = ("level", "severity", "log_level", "loglevel", "lvl", "log.level") +_MESSAGE_KEYS = ("msg", "message", "text", "log", "body") +_TIMESTAMP_KEYS = ("timestamp", "time", "ts", "@timestamp", "datetime", "date") + +_ERROR_LEVELS = {"error", "fatal", "critical", "panic", "err", "crit", "emerg", "alert"} +_WARN_LEVELS = {"warn", "warning"} + + +class StructuredLogProcessor(Processor): + priority = 45 + hook_patterns = [ + r"^(stern|kubetail)\b", + ] + + @property + def name(self) -> str: + return "structured_log" + + def can_handle(self, command: str) -> bool: + return bool(_STERN_RE.search(command)) + + def process(self, command: str, output: str) -> str: + if not output or not output.strip(): + return output + + lines = output.splitlines() + if len(lines) < 5: + return output + + # Try to parse as JSON lines + parsed_lines: list[dict | None] = [] + json_count = 0 + for line in lines: + stripped = line.strip() + if not stripped: + parsed_lines.append(None) + continue + try: + obj = json.loads(stripped) + if isinstance(obj, dict): + parsed_lines.append(obj) + json_count += 1 + else: + parsed_lines.append(None) + except (json.JSONDecodeError, ValueError): + parsed_lines.append(None) + + non_empty = sum(1 for line in lines if line.strip()) + # If less than 50% lines are JSON objects, fall back to log compression + if non_empty == 0 or json_count / non_empty < 0.5: + keep_head = config.get("kubectl_keep_head") + keep_tail = config.get("kubectl_keep_tail") + return compress_log_lines(lines, keep_head=keep_head, keep_tail=keep_tail) + + return self._process_json_lines(lines, parsed_lines) + + def _process_json_lines(self, raw_lines: list[str], parsed: list[dict | None]) -> str: + # Group by level + level_counts: dict[str, int] = defaultdict(int) + error_lines: list[str] = [] + total = 0 + + for i, obj in enumerate(parsed): + if obj is None: + continue + total += 1 + level = self._extract_level(obj) + level_counts[level] += 1 + + if level in _ERROR_LEVELS: + msg = self._extract_message(obj) + if msg: + error_lines.append(f" [{level.upper()}] {msg}") + else: + # Keep raw line but truncate + raw = raw_lines[i].strip() + if len(raw) > 200: + raw = raw[:197] + "..." + error_lines.append(f" {raw}") + + result = [f"{total} log entries:"] + + # Level summary + for level in ( + "error", + "fatal", + "critical", + "panic", + "warn", + "warning", + "info", + "debug", + "trace", + ): + if level in level_counts: + result.append(f" {level}: {level_counts[level]}") + + # Other levels not in the standard list + for level, count in sorted(level_counts.items(), key=lambda x: -x[1]): + if level not in ( + "error", + "fatal", + "critical", + "panic", + "warn", + "warning", + "info", + "debug", + "trace", + ): + result.append(f" {level}: {count}") + + # Show error messages + if error_lines: + result.append(f"\nErrors ({len(error_lines)}):") + max_errors = 10 + result.extend(error_lines[:max_errors]) + if len(error_lines) > max_errors: + result.append(f" ... ({len(error_lines) - max_errors} more)") + + return "\n".join(result) + + def _extract_level(self, obj: dict) -> str: + """Extract log level from a JSON log entry.""" + for key in _LEVEL_KEYS: + if key in obj: + val = str(obj[key]).lower().strip() + return val + # Fallback: look for common patterns in message + msg = self._extract_message(obj) + if msg: + if re.search(r"\b(ERROR|FATAL|PANIC)\b", msg): + return "error" + if re.search(r"\bWARN(ING)?\b", msg): + return "warn" + return "unknown" + + def _extract_message(self, obj: dict) -> str: + """Extract message from a JSON log entry.""" + for key in _MESSAGE_KEYS: + if key in obj: + val = str(obj[key]) + if len(val) > 200: + val = val[:197] + "..." + return val + return "" diff --git a/tests/test_config.py b/tests/test_config.py index 49dc511..d8b30e5 100644 --- a/tests/test_config.py +++ b/tests/test_config.py @@ -54,6 +54,47 @@ def test_env_override_bool(self): del os.environ["TOKEN_SAVER_DEBUG"] config.reload() + def test_default_disabled_processors(self, monkeypatch): + for key in list(os.environ): + if key.startswith("TOKEN_SAVER_"): + monkeypatch.delenv(key) + config.reload() + assert config.get("disabled_processors") == [] + + def test_env_override_list(self): + os.environ["TOKEN_SAVER_DISABLED_PROCESSORS"] = "git,docker" # noqa: S105 + config.reload() + try: + assert config.get("disabled_processors") == ["git", "docker"] + finally: + del os.environ["TOKEN_SAVER_DISABLED_PROCESSORS"] + config.reload() + + def test_env_override_list_single_value(self): + os.environ["TOKEN_SAVER_DISABLED_PROCESSORS"] = "git" # noqa: S105 + config.reload() + try: + assert config.get("disabled_processors") == ["git"] + finally: + del os.environ["TOKEN_SAVER_DISABLED_PROCESSORS"] + config.reload() + + def test_default_max_chain_depth(self, monkeypatch): + for key in list(os.environ): + if key.startswith("TOKEN_SAVER_"): + monkeypatch.delenv(key) + config.reload() + assert config.get("max_chain_depth") == 3 + + def test_env_override_list_empty_string(self): + os.environ["TOKEN_SAVER_DISABLED_PROCESSORS"] = "" + config.reload() + try: + assert config.get("disabled_processors") == [] + finally: + del os.environ["TOKEN_SAVER_DISABLED_PROCESSORS"] + config.reload() + def test_invalid_env_value_ignored(self): os.environ["TOKEN_SAVER_MIN_INPUT_LENGTH"] = "not_a_number" # noqa: S105 config.reload() diff --git a/tests/test_engine.py b/tests/test_engine.py index 765a2b1..f59fc9d 100644 --- a/tests/test_engine.py +++ b/tests/test_engine.py @@ -210,9 +210,9 @@ class TestProcessorRegistry: """Tests for auto-discovery and the processor registry.""" def test_discover_processors_finds_all(self): - """Auto-discovery should find all 25 processors.""" + """Auto-discovery should find all 29 processors.""" processors = discover_processors() - assert len(processors) == 25 + assert len(processors) == 29 def test_discover_processors_sorted_by_priority(self): """Processors must be returned in ascending priority order.""" @@ -247,8 +247,11 @@ def test_expected_priority_order(self): assert name_to_priority["test"] == 21 assert name_to_priority["cargo"] == 22 assert name_to_priority["go"] == 23 + assert name_to_priority["python_install"] == 24 assert name_to_priority["build"] == 25 + assert name_to_priority["cargo_clippy"] == 26 assert name_to_priority["lint"] == 27 + assert name_to_priority["maven_gradle"] == 28 assert name_to_priority["network"] == 30 assert name_to_priority["docker"] == 31 assert name_to_priority["kubectl"] == 32 @@ -264,6 +267,7 @@ def test_expected_priority_order(self): assert name_to_priority["syslog"] == 42 assert name_to_priority["ssh"] == 43 assert name_to_priority["jq_yq"] == 44 + assert name_to_priority["structured_log"] == 45 assert name_to_priority["file_listing"] == 50 assert name_to_priority["file_content"] == 51 assert name_to_priority["generic"] == 999 @@ -417,6 +421,25 @@ def test_collect_hook_patterns_covers_key_commands(self): "yq . config.yaml", "ssh host 'ls -la'", "scp file.txt host:/tmp/", + # Python install (dedicated processor) + "pip install flask", + "pip3 install -r requirements.txt", + "poetry install", + "poetry update", + "poetry add requests", + "uv pip install flask", + "uv sync", + # Cargo clippy (dedicated processor) + "cargo clippy", + # Maven/Gradle (dedicated processor) + "mvn clean install", + "mvn package", + "./mvnw verify", + "gradle build", + "./gradlew assemble", + # Structured log + "stern my-pod", + "kubetail my-service", ] for cmd in test_commands: @@ -433,6 +456,78 @@ def test_engine_uses_discovered_processors(self): assert ep.priority == dp.priority +class TestDisabledProcessors: + """Tests for per-processor enable/disable.""" + + def test_disabled_processor_excluded(self, monkeypatch): + monkeypatch.setenv("TOKEN_SAVER_DISABLED_PROCESSORS", "git") + from src import config + + config.reload() + engine = CompressionEngine() + names = [p.name for p in engine.processors] + assert "git" not in names + assert "build" in names # Other processors still present + monkeypatch.delenv("TOKEN_SAVER_DISABLED_PROCESSORS") + config.reload() + + def test_disabled_generic_ignored(self, monkeypatch): + """Generic processor cannot be disabled.""" + monkeypatch.setenv("TOKEN_SAVER_DISABLED_PROCESSORS", "generic") + from src import config + + config.reload() + engine = CompressionEngine() + names = [p.name for p in engine.processors] + assert "generic" in names + monkeypatch.delenv("TOKEN_SAVER_DISABLED_PROCESSORS") + config.reload() + + def test_disabled_multiple_processors(self, monkeypatch): + monkeypatch.setenv("TOKEN_SAVER_DISABLED_PROCESSORS", "git,docker,lint") + from src import config + + config.reload() + engine = CompressionEngine() + names = [p.name for p in engine.processors] + assert "git" not in names + assert "docker" not in names + assert "lint" not in names + assert "build" in names + monkeypatch.delenv("TOKEN_SAVER_DISABLED_PROCESSORS") + config.reload() + + def test_disabled_processors_string_in_json_ignored(self, monkeypatch): + """If disabled_processors is a string (wrong type from JSON), treat as empty.""" + from src import config + + # Simulate a JSON config with wrong type: "lint" instead of ["lint"] + cfg = {**config._load_config(), "disabled_processors": "lint"} + monkeypatch.setattr(config, "_config", cfg) + engine = CompressionEngine() + names = [p.name for p in engine.processors] + # "lint" as string should NOT disable any processor (would be {"l","i","n","t"} otherwise) + assert "lint" in names + config.reload() + + def test_disabled_processors_hook_patterns(self, monkeypatch): + """Disabled processors should not contribute hook patterns.""" + import re + + monkeypatch.setenv("TOKEN_SAVER_DISABLED_PROCESSORS", "git") + from src import config + + config.reload() + patterns = collect_hook_patterns() + compiled = [re.compile(p) for p in patterns] + # git status should NOT match any pattern + assert not any(p.search("git status") for p in compiled) + # Other commands should still match + assert any(p.search("pytest tests/") for p in compiled) + monkeypatch.delenv("TOKEN_SAVER_DISABLED_PROCESSORS") + config.reload() + + class TestProcessorChaining: """Tests for multi-processor chaining infrastructure.""" @@ -441,7 +536,10 @@ def setup_method(self): def test_chain_to_attribute_default_none(self): for p in self.engine.processors: - assert p.chain_to is None + if p.name == "cargo_clippy": + assert p.chain_to == ["lint"] + else: + assert p.chain_to is None def test_processor_by_name_lookup(self): assert "git" in self.engine._by_name @@ -450,3 +548,226 @@ def test_processor_by_name_lookup(self): assert "go" in self.engine._by_name assert "ssh" in self.engine._by_name assert "jq_yq" in self.engine._by_name + assert "python_install" in self.engine._by_name + assert "cargo_clippy" in self.engine._by_name + assert "maven_gradle" in self.engine._by_name + assert "structured_log" in self.engine._by_name + + def test_chain_to_string_backward_compat(self): + """String chain_to should work (normalized to single-element list).""" + from src.processors.base import Processor + + class FakeA(Processor): + priority = 1 + hook_patterns = [] + chain_to = "generic" + + @property + def name(self): + return "fake_a" + + def can_handle(self, command): + return command == "fake_chain" + + def process(self, command, output): + return output.replace("AAA", "BBB") + + engine = self.engine + # Inject fake processor + engine.processors.insert(0, FakeA()) + engine._by_name["fake_a"] = engine.processors[0] + + output = "AAA\n" * 300 + _compressed, proc, _was = engine.compress("fake_chain", output) + # FakeA transforms AAA->BBB, then chains to generic + assert proc in ("fake_a", "generic") + + def test_chain_to_list(self): + """List chain_to should apply processors in sequence.""" + from src.processors.base import Processor + + class ProcA(Processor): + priority = 1 + hook_patterns = [] + chain_to = ["proc_b"] + + @property + def name(self): + return "proc_a" + + def can_handle(self, command): + return command == "chain_list_test" + + def process(self, command, output): + return output.replace("STEP1", "STEP2") + + class ProcB(Processor): + priority = 2 + hook_patterns = [] + + @property + def name(self): + return "proc_b" + + def can_handle(self, command): + return False + + def process(self, command, output): + return output.replace("STEP2", "STEP3") + + engine = self.engine + a, b = ProcA(), ProcB() + engine.processors.insert(0, a) + engine.processors.insert(1, b) + engine._by_name["proc_a"] = a + engine._by_name["proc_b"] = b + + output = "STEP1\n" * 100 + compressed, _proc, was = engine.compress("chain_list_test", output) + if was: + assert "STEP3" in compressed + + def test_chain_cycle_detection(self): + """Cycle in chain_to should not cause infinite loop.""" + from src.processors.base import Processor + + class CycleA(Processor): + priority = 1 + hook_patterns = [] + chain_to = ["cycle_b"] + + @property + def name(self): + return "cycle_a" + + def can_handle(self, command): + return command == "cycle_test" + + def process(self, command, output): + return output + "\nA" + + class CycleB(Processor): + priority = 2 + hook_patterns = [] + chain_to = ["cycle_a"] + + @property + def name(self): + return "cycle_b" + + def can_handle(self, command): + return False + + def process(self, command, output): + return output + "\nB" + + engine = self.engine + a, b = CycleA(), CycleB() + engine.processors.insert(0, a) + engine.processors.insert(1, b) + engine._by_name["cycle_a"] = a + engine._by_name["cycle_b"] = b + + output = "start\n" * 100 + # Should not hang + _compressed, proc, _was = engine.compress("cycle_test", output) + assert proc in ("cycle_a", "generic", "none") + + def test_chain_unknown_name_skipped(self): + """Unknown processor name in chain_to should be silently skipped.""" + from src.processors.base import Processor + + class UnknownChain(Processor): + priority = 1 + hook_patterns = [] + chain_to = ["nonexistent_processor"] + + @property + def name(self): + return "unknown_chain" + + def can_handle(self, command): + return command == "unknown_chain_test" + + def process(self, command, output): + return output.replace("X", "Y") + + engine = self.engine + p = UnknownChain() + engine.processors.insert(0, p) + engine._by_name["unknown_chain"] = p + + output = "X\n" * 100 + # Should not raise + _compressed, proc, _was = engine.compress("unknown_chain_test", output) + assert proc in ("unknown_chain", "generic", "none") + + def test_chain_max_depth(self, monkeypatch): + """max_chain_depth config should limit chaining.""" + from src import config + from src.processors.base import Processor + + monkeypatch.setenv("TOKEN_SAVER_MAX_CHAIN_DEPTH", "1") + config.reload() + + class DepthA(Processor): + priority = 1 + hook_patterns = [] + chain_to = ["depth_b", "depth_c"] + + @property + def name(self): + return "depth_a" + + def can_handle(self, command): + return command == "depth_test" + + def process(self, command, output): + return output.replace("D0", "D1") + + class DepthB(Processor): + priority = 2 + hook_patterns = [] + + @property + def name(self): + return "depth_b" + + def can_handle(self, command): + return False + + def process(self, command, output): + return output.replace("D1", "D2") + + class DepthC(Processor): + priority = 3 + hook_patterns = [] + + @property + def name(self): + return "depth_c" + + def can_handle(self, command): + return False + + def process(self, command, output): + return output.replace("D2", "D3") + + engine = CompressionEngine() + a, b, c = DepthA(), DepthB(), DepthC() + engine.processors.insert(0, a) + engine.processors.insert(1, b) + engine.processors.insert(2, c) + engine._by_name["depth_a"] = a + engine._by_name["depth_b"] = b + engine._by_name["depth_c"] = c + + output = "D0\n" * 100 + compressed, _proc, was = engine.compress("depth_test", output) + if was: + # With max_depth=1, only depth_b should run (not depth_c) + assert "D2" in compressed + assert "D3" not in compressed + + monkeypatch.delenv("TOKEN_SAVER_MAX_CHAIN_DEPTH") + config.reload() diff --git a/tests/test_hooks.py b/tests/test_hooks.py index dc070b8..0284ff2 100644 --- a/tests/test_hooks.py +++ b/tests/test_hooks.py @@ -49,6 +49,25 @@ def test_build_commands_compressible(self): assert is_compressible("webpack") assert is_compressible("next build") + def test_python_install_commands_compressible(self): + assert is_compressible("pip install flask") + assert is_compressible("pip3 install -r requirements.txt") + assert is_compressible("poetry install") + assert is_compressible("poetry update") + assert is_compressible("poetry add requests") + assert is_compressible("uv pip install flask") + assert is_compressible("uv sync") + + def test_maven_gradle_commands_compressible(self): + assert is_compressible("mvn clean install") + assert is_compressible("mvn package") + assert is_compressible("gradle build") + assert is_compressible("./gradlew assemble") + + def test_structured_log_commands_compressible(self): + assert is_compressible("stern my-pod") + assert is_compressible("kubetail my-service") + def test_lint_commands_compressible(self): assert is_compressible("eslint src/") assert is_compressible("ruff check .") diff --git a/tests/test_processors.py b/tests/test_processors.py index 90dfc9b..e9d64a3 100644 --- a/tests/test_processors.py +++ b/tests/test_processors.py @@ -9,6 +9,7 @@ from src.processors.ansible import AnsibleProcessor from src.processors.build_output import BuildOutputProcessor from src.processors.cargo import CargoProcessor +from src.processors.cargo_clippy import CargoClippyProcessor from src.processors.cloud_cli import CloudCliProcessor from src.processors.db_query import DbQueryProcessor from src.processors.docker import DockerProcessor @@ -23,10 +24,13 @@ from src.processors.jq_yq import JqYqProcessor from src.processors.kubectl import KubectlProcessor from src.processors.lint_output import LintOutputProcessor +from src.processors.maven_gradle import MavenGradleProcessor from src.processors.network import NetworkProcessor from src.processors.package_list import PackageListProcessor +from src.processors.python_install import PythonInstallProcessor from src.processors.search import SearchProcessor from src.processors.ssh import SshProcessor +from src.processors.structured_log import StructuredLogProcessor from src.processors.syslog import SyslogProcessor from src.processors.system_info import SystemInfoProcessor from src.processors.terraform import TerraformProcessor @@ -684,10 +688,13 @@ def test_can_handle_build_commands(self): assert self.p.can_handle("npm run build") assert not self.p.can_handle("cargo build") # handled by CargoProcessor assert self.p.can_handle("make") - assert self.p.can_handle("pip install -r requirements.txt") + assert not self.p.can_handle("pip install -r requirements.txt") # PythonInstallProcessor assert self.p.can_handle("yarn add lodash") assert self.p.can_handle("next build") assert not self.p.can_handle("git status") + assert not self.p.can_handle("mvn clean install") # MavenGradleProcessor + assert not self.p.can_handle("gradle build") # MavenGradleProcessor + assert not self.p.can_handle("./gradlew assemble") # MavenGradleProcessor def test_empty_output(self): assert self.p.process("npm run build", "") == "" @@ -821,20 +828,10 @@ def test_npm_audit_groups_by_severity(self): assert "high" in result assert "vulnerabilities" in result.lower() or "found" in result.lower() - def test_pip_progress_skipped(self): - output = "\n".join( - [ - "Collecting requests", - " Downloading requests-2.31.0-py3-none-any.whl", - " ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 62.6/62.6 kB 1.2 MB/s", - "Installing collected packages: requests", - "Successfully installed requests-2.31.0", - ] - ) - result = self.p.process("pip install requests", output) - assert "━" not in result - assert "Collecting" not in result - assert "Build succeeded" in result + def test_pip_install_not_handled(self): + """pip install is now handled by PythonInstallProcessor.""" + assert not self.p.can_handle("pip install requests") + assert not self.p.can_handle("pip3 install flask") def test_yarn_berry_step_progress_skipped(self): """Yarn Berry (v2+) outputs step lines prefixed with ➤ YN0000: ┌/└.""" @@ -3908,3 +3905,359 @@ def test_yq_large_output_summarized(self): output = "\n".join(lines) result = self.p.process("yq . config.yaml", output) assert len(result) < len(output) + + +# ─────────────────────────────────────────────────────────────── +# PythonInstallProcessor +# ─────────────────────────────────────────────────────────────── + + +class TestPythonInstallProcessor: + def setup_method(self): + self.p = PythonInstallProcessor() + + def test_can_handle_pip_install(self): + assert self.p.can_handle("pip install flask") + assert self.p.can_handle("pip3 install -r requirements.txt") + + def test_can_handle_poetry(self): + assert self.p.can_handle("poetry install") + assert self.p.can_handle("poetry update") + assert self.p.can_handle("poetry add requests") + + def test_can_handle_uv(self): + assert self.p.can_handle("uv pip install flask") + assert self.p.can_handle("uv sync") + + def test_not_handle_pip_list(self): + assert not self.p.can_handle("pip list") + assert not self.p.can_handle("pip freeze") + + def test_not_handle_unrelated(self): + assert not self.p.can_handle("npm install") + assert not self.p.can_handle("git status") + + def test_empty_output(self): + assert self.p.process("pip install flask", "") == "" + + def test_pip_install_compressed(self): + lines = [] + for i in range(30): + lines.append(f"Collecting package-{i}>=1.0") + for i in range(30): + lines.append(f" Downloading package_{i}-1.2.3-py3-none-any.whl (10 kB)") + lines.append("Installing collected packages: " + ", ".join(f"p{i}" for i in range(30))) + lines.append("Successfully installed " + " ".join(f"package-{i}-1.2.3" for i in range(30))) + output = "\n".join(lines) + + result = self.p.process("pip install -r requirements.txt", output) + assert len(result) < len(output) + assert "30 packages collected" in result + assert "30 downloads" in result + assert "Successfully installed 30 packages" in result + + def test_pip_errors_preserved(self): + output = ( + "Collecting nonexistent-pkg\n" + " ERROR: Could not find a version that satisfies the requirement\n" + "ERROR: No matching distribution found for nonexistent-pkg" + ) + result = self.p.process("pip install nonexistent-pkg", output) + assert "ERROR" in result + assert "Could not find" in result + + def test_pip_already_satisfied(self): + lines = [f"Requirement already satisfied: pkg-{i} in /usr/lib" for i in range(20)] + output = "\n".join(lines) + result = self.p.process("pip install flask", output) + assert "20 already satisfied" in result + + def test_poetry_install_compressed(self): + lines = ["Resolving dependencies..."] + for i in range(20): + lines.append(f" Installing package-{i} (1.{i}.0)") + output = "\n".join(lines) + + result = self.p.process("poetry install", output) + assert len(result) < len(output) + assert "Installed 20 packages" in result + assert "dependency resolution" in result + + def test_uv_sync_compressed(self): + output = ( + "Resolved 42 packages in 1.2s\n" + "Downloading flask-2.0.0\n" + "Downloading requests-2.28.0\n" + "Installed 5 packages in 0.5s" + ) + result = self.p.process("uv sync", output) + assert "Resolved 42 packages" in result + assert "Installed 5 packages" in result + + +# ─────────────────────────────────────────────────────────────── +# CargoClippyProcessor +# ─────────────────────────────────────────────────────────────── + + +class TestCargoClippyProcessor: + def setup_method(self): + self.p = CargoClippyProcessor() + + def test_can_handle_cargo_clippy(self): + assert self.p.can_handle("cargo clippy") + assert self.p.can_handle("cargo clippy --all-targets") + assert self.p.can_handle("cargo clippy -- -W clippy::all") + + def test_not_handle_cargo_build(self): + assert not self.p.can_handle("cargo build") + assert not self.p.can_handle("cargo test") + + def test_empty_output(self): + assert self.p.process("cargo clippy", "") == "" + + def test_warning_blocks_grouped(self): + lines = [] + # 5 warnings of same rule + for i in range(5): + lines.append("warning[clippy::needless_return]: unneeded `return` statement") + lines.append(f" --> src/file{i}.rs:10:5") + lines.append(" |") + lines.append("10 | return x;") + lines.append(" | ^^^^^^^^^ help: remove `return`") + lines.append(" |") + lines.append("warning: `my_crate` (bin) generated 5 warnings") + output = "\n".join(lines) + + result = self.p.process("cargo clippy", output) + assert "clippy::needless_return" in result + assert "5 occurrences" in result + assert len(result) < len(output) + + def test_errors_preserved(self): + output = ( + "error[E0308]: mismatched types\n" + " --> src/main.rs:5:5\n" + " |\n" + '5 | let x: i32 = "hello";\n' + " | ^^^^^^^ expected `i32`, found `&str`" + ) + result = self.p.process("cargo clippy", output) + assert "error[E0308]" in result + assert "mismatched types" in result + + def test_checking_count(self): + output = ( + " Checking serde v1.0.0\n" + " Checking tokio v1.0.0\n" + " Checking my-crate v0.1.0\n" + " Finished `dev` profile\n" + ) + result = self.p.process("cargo clippy", output) + assert "3 checked" in result + + def test_mixed_warnings_and_errors(self): + output = ( + " Checking my-crate v0.1.0\n" + "warning[clippy::unused_imports]: unused import\n" + " --> src/lib.rs:1:5\n" + " |\n" + "1 | use std::io;\n" + " | ^^^^^^^\n" + "error[E0599]: method not found\n" + " --> src/main.rs:10:5\n" + " |\n" + "10 | x.foo();\n" + " | ^^^ method not found\n" + ) + result = self.p.process("cargo clippy", output) + assert "error[E0599]" in result + assert "method not found" in result + + +# ─────────────────────────────────────────────────────────────── +# MavenGradleProcessor +# ─────────────────────────────────────────────────────────────── + + +class TestMavenGradleProcessor: + def setup_method(self): + self.p = MavenGradleProcessor() + + def test_can_handle_mvn(self): + assert self.p.can_handle("mvn clean install") + assert self.p.can_handle("mvn package") + assert self.p.can_handle("./mvnw verify") + + def test_can_handle_gradle(self): + assert self.p.can_handle("gradle build") + assert self.p.can_handle("./gradlew assemble") + assert self.p.can_handle("gradle test") + + def test_not_handle_unrelated(self): + assert not self.p.can_handle("npm run build") + assert not self.p.can_handle("cargo build") + + def test_empty_output(self): + assert self.p.process("mvn clean install", "") == "" + + def test_maven_downloads_stripped(self): + lines = [] + for i in range(50): + lines.append( + f"[INFO] Downloading from central: https://repo.maven.org/artifact-{i}.jar" + ) + lines.append( + f"[INFO] Downloaded from central: https://repo.maven.org/artifact-{i}.jar (10 kB)" + ) + lines.append("[INFO] Building my-project 1.0.0 [1/3]") + lines.append("[INFO] BUILD SUCCESS") + lines.append("[INFO] Total time: 45.2 s") + output = "\n".join(lines) + + result = self.p.process("mvn clean install", output) + assert len(result) < len(output) + assert "100 downloads" in result + assert "BUILD SUCCESS" in result + + def test_maven_errors_preserved(self): + output = ( + "[INFO] Building my-project 1.0.0\n" + "[ERROR] Failed to execute goal: compilation failure\n" + "[ERROR] src/main/java/App.java:[10,5] cannot find symbol\n" + "[INFO] BUILD FAILURE\n" + "[INFO] Total time: 5.1 s" + ) + result = self.p.process("mvn compile", output) + assert "ERROR" in result + assert "cannot find symbol" in result + assert "BUILD FAILURE" in result + + def test_maven_test_results_preserved(self): + output = ( + "[INFO] Building my-project 1.0.0\n" + "[INFO] Tests run: 42, Failures: 1, Errors: 0, Skipped: 2\n" + "[INFO] BUILD FAILURE" + ) + result = self.p.process("mvn test", output) + assert "Tests run: 42" in result + + def test_gradle_tasks_compressed(self): + lines = [] + for i in range(20): + lines.append(f"> Task :sub{i}:compileJava UP-TO-DATE") + lines.append("> Task :app:compileJava") + lines.append("> Task :app:processResources NO-SOURCE") + lines.append("> Task :app:jar") + lines.append("") + lines.append("BUILD SUCCESSFUL in 12s") + lines.append("23 actionable tasks: 2 executed, 21 up-to-date") + output = "\n".join(lines) + + result = self.p.process("gradle build", output) + assert len(result) < len(output) + assert "2 executed" in result + assert "21 up-to-date" in result + assert "BUILD SUCCESSFUL" in result + + def test_gradle_errors_preserved(self): + output = ( + "> Task :app:compileJava\n" + "FAILURE: Build failed with an exception.\n" + "\n" + "* What went wrong:\n" + "Execution failed for task ':app:compileJava'.\n" + "> Compilation failed\n" + "\n" + "BUILD FAILED in 5s" + ) + result = self.p.process("./gradlew build", output) + assert "FAILURE" in result + assert "Compilation failed" in result + assert "BUILD FAILED" in result + + def test_gradle_test_results(self): + output = "> Task :test\n10 tests completed, 2 failed\n\nBUILD FAILED in 8s" + result = self.p.process("gradle test", output) + assert "10 tests completed, 2 failed" in result + + +# ─────────────────────────────────────────────────────────────── +# StructuredLogProcessor +# ─────────────────────────────────────────────────────────────── + + +class TestStructuredLogProcessor: + def setup_method(self): + self.p = StructuredLogProcessor() + + def test_can_handle_stern(self): + assert self.p.can_handle("stern my-pod") + assert self.p.can_handle("stern -n default my-pod") + + def test_can_handle_kubetail(self): + assert self.p.can_handle("kubetail my-service") + + def test_not_handle_unrelated(self): + assert not self.p.can_handle("kubectl logs my-pod") + assert not self.p.can_handle("docker logs container") + + def test_empty_output(self): + assert self.p.process("stern my-pod", "") == "" + + def test_json_lines_compressed(self): + import json + + lines = [] + for i in range(30): + entry = { + "level": "info", + "msg": f"processing item {i}", + "ts": f"2024-01-01T00:00:{i:02d}Z", + } + lines.append(json.dumps(entry)) + for i in range(5): + entry = { + "level": "error", + "msg": f"failed to process item {i}", + "ts": f"2024-01-01T00:01:{i:02d}Z", + } + lines.append(json.dumps(entry)) + output = "\n".join(lines) + + result = self.p.process("stern my-pod", output) + assert len(result) < len(output) + assert "35 log entries" in result + assert "info: 30" in result + assert "error: 5" in result + assert "Errors (5)" in result + + def test_non_json_fallback(self): + lines = [f"plain log line {i}" for i in range(50)] + output = "\n".join(lines) + result = self.p.process("stern my-pod", output) + # Should still compress via log compression fallback + assert len(result) < len(output) + + def test_mixed_json_non_json(self): + import json + + lines = ["plain text line"] + for i in range(20): + lines.append(json.dumps({"level": "info", "msg": f"msg {i}"})) + lines.append("another plain line") + output = "\n".join(lines) + result = self.p.process("stern my-pod", output) + assert "log entries" in result + + def test_error_messages_shown(self): + import json + + lines = [] + for i in range(10): + lines.append(json.dumps({"level": "info", "msg": f"ok {i}"})) + lines.append(json.dumps({"level": "error", "msg": "database connection failed"})) + output = "\n".join(lines) + + result = self.p.process("stern my-pod", output) + assert "database connection failed" in result