add LocalProcessIntrospectTool for incident response by X1Vi · Pull Request #2518 · Tracer-Cloud/opensre

X1Vi · 2026-05-25T12:35:48Z

Wraps app/agents/probe.py + tail buffer + error signals as a first-class OpenSRE investigation tool. The planner calls this to diagnose stuck or misbehaving local agents from the opensre interactive shell.

@tool decorator with surfaces=('investigation',)
Returns psutil snapshot + last 50 stdout lines + error/retry rates
Registered in telemetry allowlist (_TOOLS_WITHOUT_DELIBERATE_CATCH)
13 tests including contract, edge cases (missing PID, no stdout, etc.)

Fixes #1506

Code Understanding and AI Usage

Did you use AI assistance (ChatGPT, Claude, Copilot, etc.) to write any part of this code?

[] No, I wrote all the code myself
Yes, I used AI assistance (continue below)

If you used AI assistance:

I have reviewed every single line of the AI-generated code
I can explain the purpose and logic of each function/component I added
I have tested edge cases and understand how the code handles them
I have modified the AI output to follow this project's coding standards and conventions

Explain your implementation approach:

What I have made is an inspection tool that works with PID. We pass the PID into the function and the tail the logs then we check the logs for error signals. We pass everything as an output the snapshot of the process, error signal and tail logs.

We can run this via functions which I will provide in the demo.

I have added the tests for this in telemetry as well.

Checklist before requesting a review

I have added proper PR title and linked to the issue
I have performed a self-review of my code
I can explain the purpose of every function, class, and logic block I added
I understand why my changes work and have tested them thoroughly
I have considered potential edge cases and how my code handles them
If it is a core feature, I have added thorough tests
My code follows the project's style guidelines and conventions

Demo
Screencast from 2026-05-25 19-02-32.webm

Note: Please check Allow edits from maintainers if you would like us to assist in the PR.

@tool

Wraps app/agents/probe.py + tail buffer + error signals as a first-class OpenSRE investigation tool. The planner calls this to diagnose stuck or misbehaving local agents from the opensre interactive shell. - @tool decorator with surfaces=('investigation',) - Returns psutil snapshot + last 50 stdout lines + error/retry rates - Registered in telemetry allowlist (_TOOLS_WITHOUT_DELIBERATE_CATCH) - 13 tests including contract, edge cases (missing PID, no stdout, etc.)

github-actions · 2026-05-25T12:35:57Z

Greptile code review

This repo uses Greptile for automated review. Before merge, aim for Confidence Score: 5/5 with zero unresolved review threads — see CONTRIBUTING.md.

Run a review — add a PR comment with:

@greptile review

Give it ~5-10 minutes (sometimes longer) for results, then fix feedback and re-trigger until you reach Confidence Score: 5/5.

Optional: automate with the greploop skill.

greptile-apps · 2026-05-25T12:39:03Z

Greptile Summary

This PR adds LocalProcessIntrospectTool, a new investigation-surface tool that wraps probe(), a bounded stdout tail read, and ErrorSignals classification behind a single @tool-decorated function callable from the OpenSRE interactive shell. It also promotes _resolve_target to the public API (resolve_target) so the tool can use it without coupling to a private symbol.

New tool (app/tools/LocalProcessIntrospectTool/__init__.py): returns a psutil snapshot, last 50 stdout lines (bounded to DEFAULT_MAX_BYTES via seek-before-read), and per-category error counts for a given PID; gracefully returns null fields when the process is inaccessible.
tail.py API promotion: _resolve_target renamed to resolve_target and exported via __all__; all existing internal call-sites and tests updated accordingly.
Tests (tests/tools/test_local_process_introspect.py): 13 tests covering the tool contract, bounded-read behaviour for large log files, error-signal extraction, and each null-returning edge case.

Confidence Score: 5/5

Safe to merge; the tool is additive, the bounded seek-before-read correctly prevents OOM on large log files, and all edge cases have test coverage.

The change is purely additive — no existing behaviour is modified beyond a symbol rename that is fully reflected across all call-sites and tests. The previously identified unbounded-read risk is addressed by the seek-offset approach, and the 13 new tests (including the large-file bound test) give good confidence in the implementation.

No files require special attention beyond the minor inconsistency in LocalProcessIntrospectTool/init.py noted in the review comments.

Important Files Changed

Filename	Overview
app/agents/tail.py	Renames `_resolve_target` to `resolve_target` and exports it via `__all__`; a clean, minimal change to promote the function to public API. However, the companion `_ResolvedTarget` dataclass it returns is still private and absent from `__all__`, creating an incomplete public surface.
app/tools/LocalProcessIntrospectTool/init.py	New investigation tool wrapping probe + bounded stdout tail + ErrorSignals. The bounded-seek read correctly prevents OOM on large log files. Minor inconsistency: `_read_stdout_tail` returns `""` (not `None`) for an accessible but empty file, diverging from the documented null-when-unavailable contract.
tests/tools/test_local_process_introspect.py	Comprehensive tests covering happy path, missing PID, no stdout, bounded large-file read, error signal extraction, and no-stdout signal behaviour. Contract test via BaseToolContract is also present.
tests/agents/test_tail.py	Updated all internal references from `_resolve_target` to the now-public `resolve_target`; mechanically correct rename with no logic changes.
tests/tools/test_telemetry.py	Adds `local_process_introspect` to `_TOOLS_WITHOUT_DELIBERATE_CATCH` allowlist; straightforward one-liner addition consistent with the existing pattern.
docs/agents.mdx	Adds documentation for `local_process_introspect` including output field table and example. Content is accurate and consistent with the implementation.

Sequence Diagram

sequenceDiagram
    participant Planner as Investigation Planner
    participant Tool as local_process_introspect
    participant Probe as probe()
    participant Tail as _read_stdout_tail()
    participant RT as resolve_target()
    participant FS as Filesystem (stdout file)
    participant ES as ErrorSignals

    Planner->>Tool: "local_process_introspect(pid=1234)"
    Tool->>Probe: probe(pid) [blocks 100ms for cpu%]
    Probe-->>Tool: "ProcessSnapshot | None"
    Tool->>Tail: _read_stdout_tail(pid)
    Tail->>RT: resolve_target(pid)
    RT-->>Tail: ResolvedTarget(path) or raises AttachUnsupported/OSError
    alt resolve succeeds
        Tail->>FS: fstat → seek(size - DEFAULT_MAX_BYTES) → read()
        FS-->>Tail: ≤4 MiB bytes
        Tail-->>Tool: "last 50 lines as str | None"
    else resolve fails
        Tail-->>Tool: None
    end
    alt stdout_tail is truthy
        Tool->>ES: ErrorSignals().observe(stdout_tail)
        ES-->>Tool: rate_per_minute() → error_counts dict
    end
    Tool-->>Planner: "{snapshot, stdout_tail, error_counts}"

_{Reviews (4): Last reviewed commit: "docs: merge local_process_introspect int..." | Re-trigger Greptile}

…vent OOM - Rename _resolve_target -> resolve_target (public API, added to __all__) - Replace unbounded f.read() with seek-based bounded read in _read_stdout_tail - Use os.fstat + DEFAULT_MAX_BYTES to read only the last 4 MB of stdout

X1Vi · 2026-05-25T13:11:33Z

@greptile review

kespineira

found a couple of things:

looks like the new tool is missing its docs page and docs/docs.json entry.

AGENTS.md says new tools should ship with docs/<tool_name>.mdx plus a matching docs/docs.json entry in the same pr.

this pr adds the tool source, tail rename, and tests, but nothing under docs/, and i don't see a local_process_introspect entry in docs/docs.json.

since the other tools in app/tools/ have matching docs pages, the @tool decorator text is currently the only user-visible documentation for this one.
the 4 mib seek-based bounded read path doesn't seem to be tested.

the commit says it replaced the unbounded f.read() with a bounded read using os.fstat + DEFAULT_MAX_BYTES, but the current _read_stdout_tail test only writes a small sample file. because of that, offset is always 0 and this branch never runs:
```
if offset > 0:
    f.seek(offset)
```
so the oom regression this was meant to prevent is not really covered yet.

could we add a test with a stdout file larger than 4 mib, assert that the read starts at st_size - DEFAULT_MAX_BYTES, and also check that the returned tail still includes the file's last line?

…ad test - error_signals key renamed to error_counts since rate_per_minute() returns raw counts when all events share a single timestamp - add docs/local_process_introspect.mdx and docs.json entry - add test_read_stdout_tail_bounded_read_large_file covering the offset > 0 / f.seek() path for files exceeding DEFAULT_MAX_BYTES

X1Vi · 2026-05-26T05:34:27Z

@greptile review

X1Vi · 2026-05-26T05:45:22Z

@kespineira I have implemented the changes you wanted, a review could help.

kespineira

@X1Vi small docs follow-up (non-blocking): since local_process_introspect is conceptually part of the agents subsystem, i think the cleanest home for this content would be a section inside docs/agents.mdx.

it imports resolve_target, probe, and ErrorSignals from app/agents/, and exposes that same machinery as a planner tool. docs/agents.mdx is also where /agents trace is already described, so it feels like the most natural place for this content.

as a smaller change, moving the standalone page from Observability and incidents to Integrations > Overview, next to agents, would already fix the current SaaS-vs-local grouping mismatch.

X1Vi · 2026-05-26T08:06:25Z

@kespineira Done

@X1Vi small docs follow-up (non-blocking): since local_process_introspect is conceptually part of the agents subsystem, i think the cleanest home for this content would be a section inside docs/agents.mdx.

it imports resolve_target, probe, and ErrorSignals from app/agents/, and exposes that same machinery as a planner tool. docs/agents.mdx is also where /agents trace is already described, so it feels like the most natural place for this content.

as a smaller change, moving the standalone page from Observability and incidents to Integrations > Overview, next to agents, would already fix the current SaaS-vs-local grouping mismatch.

Done @kespineira

cerencamkiran · 2026-05-26T08:41:50Z

Thanks for your effort. Just two small long-term thoughts:
resolve_target being public is okay here, but if more tools use it later, changing it could be harder.
stdout_tail + ErrorSignals inline is fine for now since only this tool uses it, but maybe later it can be shared if other tools need the same flow too.

X1Vi · 2026-05-26T09:02:06Z

@cerencamkiran I am assuming these are the points of concerns rather than instructions for next steps.
If you have any definite steps or changes that needs to be taken.
Do ping me.
Thanks

X1Vi · 2026-05-26T09:02:55Z

@greptile review

kespineira

LGTM 👍

muddlebee · 2026-05-29T21:33:52Z

@X1Vi demo is missing, it doesnt make sense. please run a proper investigate scenario with LocalProcessIntrospectTool

greptile-apps Bot reviewed May 25, 2026

View reviewed changes

Comment thread app/tools/LocalProcessIntrospectTool/__init__.py

Comment thread app/tools/LocalProcessIntrospectTool/__init__.py

Comment thread app/tools/LocalProcessIntrospectTool/__init__.py

X1Vi force-pushed the feature/#1506-local-process-introspect-tool branch from c5e537e to 17f9416 Compare May 25, 2026 13:09

kespineira suggested changes May 25, 2026

View reviewed changes

kespineira reviewed May 25, 2026

View reviewed changes

Comment thread app/tools/LocalProcessIntrospectTool/__init__.py

X1Vi requested a review from kespineira May 26, 2026 05:44

kespineira reviewed May 26, 2026

View reviewed changes

docs: merge local_process_introspect into agents.mdx

00b2173

X1Vi requested a review from kespineira May 26, 2026 08:06

kespineira approved these changes May 26, 2026

View reviewed changes

X1Vi mentioned this pull request May 26, 2026

Looking for Maintainers – $300-$500/month Symbolic Sponsorship #344

Open

muddlebee added the requirement-not-met label May 29, 2026

Conversation

X1Vi commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Understanding and AI Usage

Checklist before requesting a review

Uh oh!

github-actions Bot commented May 25, 2026

Greptile code review

Uh oh!

greptile-apps Bot commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

Uh oh!

X1Vi commented May 25, 2026

Uh oh!

kespineira left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

X1Vi commented May 26, 2026

Uh oh!

X1Vi commented May 26, 2026

Uh oh!

kespineira left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

X1Vi commented May 26, 2026

Uh oh!

cerencamkiran commented May 26, 2026

Uh oh!

X1Vi commented May 26, 2026

Uh oh!

X1Vi commented May 26, 2026

Uh oh!

kespineira left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

muddlebee commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

X1Vi commented May 25, 2026 •

edited

Loading

greptile-apps Bot commented May 25, 2026 •

edited

Loading

kespineira left a comment •

edited

Loading

kespineira left a comment •

edited

Loading

kespineira left a comment •

edited

Loading