Skip to content

add LocalProcessIntrospectTool for incident response#2518

Open
X1Vi wants to merge 4 commits into
Tracer-Cloud:mainfrom
X1Vi:feature/#1506-local-process-introspect-tool
Open

add LocalProcessIntrospectTool for incident response#2518
X1Vi wants to merge 4 commits into
Tracer-Cloud:mainfrom
X1Vi:feature/#1506-local-process-introspect-tool

Conversation

@X1Vi
Copy link
Copy Markdown
Contributor

@X1Vi X1Vi commented May 25, 2026

Wraps app/agents/probe.py + tail buffer + error signals as a first-class OpenSRE investigation tool. The planner calls this to diagnose stuck or misbehaving local agents from the opensre interactive shell.

  • @tool decorator with surfaces=('investigation',)
  • Returns psutil snapshot + last 50 stdout lines + error/retry rates
  • Registered in telemetry allowlist (_TOOLS_WITHOUT_DELIBERATE_CATCH)
  • 13 tests including contract, edge cases (missing PID, no stdout, etc.)

Fixes #1506

Code Understanding and AI Usage

Did you use AI assistance (ChatGPT, Claude, Copilot, etc.) to write any part of this code?

  • [] No, I wrote all the code myself
  • Yes, I used AI assistance (continue below)

If you used AI assistance:

  • I have reviewed every single line of the AI-generated code
  • I can explain the purpose and logic of each function/component I added
  • I have tested edge cases and understand how the code handles them
  • I have modified the AI output to follow this project's coding standards and conventions

Explain your implementation approach:

What I have made is an inspection tool that works with PID. We pass the PID into the function and the tail the logs then we check the logs for error signals. We pass everything as an output the snapshot of the process, error signal and tail logs.

We can run this via functions which I will provide in the demo.

I have added the tests for this in telemetry as well.


Checklist before requesting a review

  • I have added proper PR title and linked to the issue
  • I have performed a self-review of my code
  • I can explain the purpose of every function, class, and logic block I added
  • I understand why my changes work and have tested them thoroughly
  • I have considered potential edge cases and how my code handles them
  • If it is a core feature, I have added thorough tests
  • My code follows the project's style guidelines and conventions

Demo
Screencast from 2026-05-25 19-02-32.webm

Note: Please check Allow edits from maintainers if you would like us to assist in the PR.

Wraps app/agents/probe.py + tail buffer + error signals as a first-class
OpenSRE investigation tool. The planner calls this to diagnose stuck or
misbehaving local agents from the opensre interactive shell.

- @tool decorator with surfaces=('investigation',)
- Returns psutil snapshot + last 50 stdout lines + error/retry rates
- Registered in telemetry allowlist (_TOOLS_WITHOUT_DELIBERATE_CATCH)
- 13 tests including contract, edge cases (missing PID, no stdout, etc.)
@github-actions
Copy link
Copy Markdown
Contributor

Greptile code review

This repo uses Greptile for automated review. Before merge, aim for Confidence Score: 5/5 with zero unresolved review threads — see CONTRIBUTING.md.

Run a review — add a PR comment with:

@greptile review

Give it ~5-10 minutes (sometimes longer) for results, then fix feedback and re-trigger until you reach Confidence Score: 5/5.

Optional: automate with the greploop skill.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 25, 2026

Greptile Summary

This PR adds LocalProcessIntrospectTool, a new investigation-surface tool that wraps probe(), a bounded stdout tail read, and ErrorSignals classification behind a single @tool-decorated function callable from the OpenSRE interactive shell. It also promotes _resolve_target to the public API (resolve_target) so the tool can use it without coupling to a private symbol.

  • New tool (app/tools/LocalProcessIntrospectTool/__init__.py): returns a psutil snapshot, last 50 stdout lines (bounded to DEFAULT_MAX_BYTES via seek-before-read), and per-category error counts for a given PID; gracefully returns null fields when the process is inaccessible.
  • tail.py API promotion: _resolve_target renamed to resolve_target and exported via __all__; all existing internal call-sites and tests updated accordingly.
  • Tests (tests/tools/test_local_process_introspect.py): 13 tests covering the tool contract, bounded-read behaviour for large log files, error-signal extraction, and each null-returning edge case.

Confidence Score: 5/5

Safe to merge; the tool is additive, the bounded seek-before-read correctly prevents OOM on large log files, and all edge cases have test coverage.

The change is purely additive — no existing behaviour is modified beyond a symbol rename that is fully reflected across all call-sites and tests. The previously identified unbounded-read risk is addressed by the seek-offset approach, and the 13 new tests (including the large-file bound test) give good confidence in the implementation.

No files require special attention beyond the minor inconsistency in LocalProcessIntrospectTool/init.py noted in the review comments.

Important Files Changed

Filename Overview
app/agents/tail.py Renames _resolve_target to resolve_target and exports it via __all__; a clean, minimal change to promote the function to public API. However, the companion _ResolvedTarget dataclass it returns is still private and absent from __all__, creating an incomplete public surface.
app/tools/LocalProcessIntrospectTool/init.py New investigation tool wrapping probe + bounded stdout tail + ErrorSignals. The bounded-seek read correctly prevents OOM on large log files. Minor inconsistency: _read_stdout_tail returns "" (not None) for an accessible but empty file, diverging from the documented null-when-unavailable contract.
tests/tools/test_local_process_introspect.py Comprehensive tests covering happy path, missing PID, no stdout, bounded large-file read, error signal extraction, and no-stdout signal behaviour. Contract test via BaseToolContract is also present.
tests/agents/test_tail.py Updated all internal references from _resolve_target to the now-public resolve_target; mechanically correct rename with no logic changes.
tests/tools/test_telemetry.py Adds local_process_introspect to _TOOLS_WITHOUT_DELIBERATE_CATCH allowlist; straightforward one-liner addition consistent with the existing pattern.
docs/agents.mdx Adds documentation for local_process_introspect including output field table and example. Content is accurate and consistent with the implementation.

Sequence Diagram

sequenceDiagram
    participant Planner as Investigation Planner
    participant Tool as local_process_introspect
    participant Probe as probe()
    participant Tail as _read_stdout_tail()
    participant RT as resolve_target()
    participant FS as Filesystem (stdout file)
    participant ES as ErrorSignals

    Planner->>Tool: "local_process_introspect(pid=1234)"
    Tool->>Probe: probe(pid) [blocks 100ms for cpu%]
    Probe-->>Tool: "ProcessSnapshot | None"
    Tool->>Tail: _read_stdout_tail(pid)
    Tail->>RT: resolve_target(pid)
    RT-->>Tail: ResolvedTarget(path) or raises AttachUnsupported/OSError
    alt resolve succeeds
        Tail->>FS: fstat → seek(size - DEFAULT_MAX_BYTES) → read()
        FS-->>Tail: ≤4 MiB bytes
        Tail-->>Tool: "last 50 lines as str | None"
    else resolve fails
        Tail-->>Tool: None
    end
    alt stdout_tail is truthy
        Tool->>ES: ErrorSignals().observe(stdout_tail)
        ES-->>Tool: rate_per_minute() → error_counts dict
    end
    Tool-->>Planner: "{snapshot, stdout_tail, error_counts}"
Loading

Reviews (4): Last reviewed commit: "docs: merge local_process_introspect int..." | Re-trigger Greptile

Comment thread app/tools/LocalProcessIntrospectTool/__init__.py
Comment thread app/tools/LocalProcessIntrospectTool/__init__.py
Comment thread app/tools/LocalProcessIntrospectTool/__init__.py
…vent OOM

- Rename _resolve_target -> resolve_target (public API, added to __all__)
- Replace unbounded f.read() with seek-based bounded read in _read_stdout_tail
- Use os.fstat + DEFAULT_MAX_BYTES to read only the last 4 MB of stdout
@X1Vi X1Vi force-pushed the feature/#1506-local-process-introspect-tool branch from c5e537e to 17f9416 Compare May 25, 2026 13:09
@X1Vi
Copy link
Copy Markdown
Contributor Author

X1Vi commented May 25, 2026

@greptile review

Copy link
Copy Markdown
Contributor

@kespineira kespineira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

found a couple of things:

  1. looks like the new tool is missing its docs page and docs/docs.json entry.

    AGENTS.md says new tools should ship with docs/<tool_name>.mdx plus a matching docs/docs.json entry in the same pr.

    this pr adds the tool source, tail rename, and tests, but nothing under docs/, and i don't see a local_process_introspect entry in docs/docs.json.

    since the other tools in app/tools/ have matching docs pages, the @tool decorator text is currently the only user-visible documentation for this one.

  2. the 4 mib seek-based bounded read path doesn't seem to be tested.

    the commit says it replaced the unbounded f.read() with a bounded read using os.fstat + DEFAULT_MAX_BYTES, but the current _read_stdout_tail test only writes a small sample file. because of that, offset is always 0 and this branch never runs:

    if offset > 0:
        f.seek(offset)

    so the oom regression this was meant to prevent is not really covered yet.

    could we add a test with a stdout file larger than 4 mib, assert that the read starts at st_size - DEFAULT_MAX_BYTES, and also check that the returned tail still includes the file's last line?

Comment thread app/tools/LocalProcessIntrospectTool/__init__.py
…ad test

- error_signals key renamed to error_counts since rate_per_minute()
  returns raw counts when all events share a single timestamp
- add docs/local_process_introspect.mdx and docs.json entry
- add test_read_stdout_tail_bounded_read_large_file covering the
  offset > 0 / f.seek() path for files exceeding DEFAULT_MAX_BYTES
@X1Vi
Copy link
Copy Markdown
Contributor Author

X1Vi commented May 26, 2026

@greptile review

@X1Vi X1Vi requested a review from kespineira May 26, 2026 05:44
@X1Vi
Copy link
Copy Markdown
Contributor Author

X1Vi commented May 26, 2026

@kespineira I have implemented the changes you wanted, a review could help.

Copy link
Copy Markdown
Contributor

@kespineira kespineira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@X1Vi small docs follow-up (non-blocking): since local_process_introspect is conceptually part of the agents subsystem, i think the cleanest home for this content would be a section inside docs/agents.mdx.

it imports resolve_target, probe, and ErrorSignals from app/agents/, and exposes that same machinery as a planner tool. docs/agents.mdx is also where /agents trace is already described, so it feels like the most natural place for this content.

as a smaller change, moving the standalone page from Observability and incidents to Integrations > Overview, next to agents, would already fix the current SaaS-vs-local grouping mismatch.

@X1Vi
Copy link
Copy Markdown
Contributor Author

X1Vi commented May 26, 2026

@kespineira Done

@X1Vi small docs follow-up (non-blocking): since local_process_introspect is conceptually part of the agents subsystem, i think the cleanest home for this content would be a section inside docs/agents.mdx.

it imports resolve_target, probe, and ErrorSignals from app/agents/, and exposes that same machinery as a planner tool. docs/agents.mdx is also where /agents trace is already described, so it feels like the most natural place for this content.

as a smaller change, moving the standalone page from Observability and incidents to Integrations > Overview, next to agents, would already fix the current SaaS-vs-local grouping mismatch.

Done @kespineira

@X1Vi X1Vi requested a review from kespineira May 26, 2026 08:06
@cerencamkiran
Copy link
Copy Markdown
Collaborator

Thanks for your effort. Just two small long-term thoughts:
resolve_target being public is okay here, but if more tools use it later, changing it could be harder.
stdout_tail + ErrorSignals inline is fine for now since only this tool uses it, but maybe later it can be shared if other tools need the same flow too.

@X1Vi
Copy link
Copy Markdown
Contributor Author

X1Vi commented May 26, 2026

@cerencamkiran I am assuming these are the points of concerns rather than instructions for next steps.
If you have any definite steps or changes that needs to be taken.
Do ping me.
Thanks

@X1Vi
Copy link
Copy Markdown
Contributor Author

X1Vi commented May 26, 2026

@greptile review

Copy link
Copy Markdown
Contributor

@kespineira kespineira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@muddlebee
Copy link
Copy Markdown
Collaborator

@X1Vi demo is missing, it doesnt make sense. please run a proper investigate scenario with LocalProcessIntrospectTool

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

LocalProcessIntrospectTool registration

4 participants