Skip to content

feat(cli) : background RCA email notifications via SMTP#2657

Draft
muddlebee wants to merge 3 commits into
Tracer-Cloud:mainfrom
muddlebee:background
Draft

feat(cli) : background RCA email notifications via SMTP#2657
muddlebee wants to merge 3 commits into
Tracer-Cloud:mainfrom
muddlebee:background

Conversation

@muddlebee
Copy link
Copy Markdown
Collaborator

Fixes #2654

Describe the changes you have made in this PR -

  • add REPL background investigation mode with /background on|off|status|list|show|use|notify ...
  • route free-text alerts and /investigate launches into session-local background tasks when background mode is enabled
  • collect completed RCA summaries without taking over the active follow-up context until /background use <task_id>
  • add a new smtp integration with CLI setup, env bootstrap, strict config validation, and opensre integrations verify smtp
  • send background RCA completion notifications through SMTP using a plain-text summary focused on root cause, top analysis, and next steps
  • add docs for background investigations and SMTP setup, plus targeted CLI/integration/unit coverage

Demo/Screenshot for feature changes and bug fixes -

Terminal + local Mailpit verification:

$ SMTP_HOST=127.0.0.1 SMTP_PORT=1025 SMTP_SECURITY=none SMTP_FROM_ADDRESS=opensre@example.com SMTP_DEFAULT_TO=team@example.com uv run opensre integrations verify smtp

  SERVICE    SOURCE       STATUS      DETAIL
  smtp      local env    passed      Connected to SMTP server successfully.
$ uv run python ... send_smtp_report(...)
{'ok': True, 'error': '', 'subject': 'OpenSRE RCA complete: bg-local-1'}
{"total":1,"messages":[{"Subject":"OpenSRE RCA complete: bg-local-1","To":[{"Address":"team@example.com"}],"From":{"Address":"opensre@example.com"}}]}

Code Understanding and AI Usage

Did you use AI assistance (ChatGPT, Claude, Copilot, etc.) to write any part of this code?

  • No, I wrote all the code myself
  • Yes, I used AI assistance (continue below)

If you used AI assistance:

  • I have reviewed every single line of the AI-generated code
  • I can explain the purpose and logic of each function/component I added
  • I have tested edge cases and understand how the code handles them
  • I have modified the AI output to follow this project's coding standards and conventions

Explain your implementation approach:

  • This PR implements the main scoped issue only: session-local background investigations plus email-first RCA completion notifications.
  • I kept background execution inside the existing interactive-shell session instead of adding durable workers/queues, because that is the smallest change that preserves current behavior while enabling async RCA runs.
  • SMTP is implemented as a provider-agnostic client integration using Python stdlib (smtplib + email.message) so users can plug in any existing relay instead of OpenSRE owning a mail server.
  • The main pieces are:
    • background_cmds.py for the REPL command surface
    • background_runner.py for launching non-rendering background investigations and storing summaries
    • smtp_delivery.py for formatting/sending plain-text RCA emails
    • integration catalog/verify/CLI wiring for the new smtp service
  • I explicitly kept notification handling email-only in this PR; the broader communication-channel work remains split into the follow-up issue.

Checklist before requesting a review

  • I have added proper PR title and linked to the issue
  • I have performed a self-review of my code
  • I can explain the purpose of every function, class, and logic block I added
  • I understand why my changes work and have tested them thoroughly
  • I have considered potential edge cases and how my code handles them
  • If it is a core feature, I have added thorough tests
  • My code follows the project's style guidelines and conventions

Validation run summary:

  • make lint
  • make format-check
  • make typecheck
  • make verify-integrations SERVICE=smtp
  • targeted SMTP/background tests ✅
  • make test-cov ⚠️ blocked by missing live LLM credentials in this environment (ANTHROPIC_API_KEY required by existing live routing tests); no change-specific failures were observed before those environment errors

@github-actions
Copy link
Copy Markdown
Contributor

Greptile code review

This repo uses Greptile for automated review. Before merge, aim for Confidence Score: 5/5 with zero unresolved review threads — see CONTRIBUTING.md.

Run a review — add a PR comment with:

@greptile review

Give it ~5-10 minutes (sometimes longer) for results, then fix feedback and re-trigger until you reach Confidence Score: 5/5.

Optional: automate with the greploop skill.

@muddlebee
Copy link
Copy Markdown
Collaborator Author

@greptile review

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 29, 2026

Greptile Summary

This PR introduces session-local background investigations in the interactive REPL and SMTP-based RCA completion notifications. The monkey-patch approach for render_report is cleanly replaced with explicit keyword parameters (render_to_terminal, open_report_in_editor), removing the _render_report_patch_lock that previously serialized concurrent pipeline runs.

  • Background mode adds /background on|off|status|list|show|use|notify commands, a daemon thread per investigation, and closed-over record objects so the previous KeyError race is avoided; completed RCA state is promoted into the session via /background use <task_id>.
  • SMTP integration adds a new SMTPIntegrationConfig with strict field validation, an env-variable bootstrap path, a verify smtp adapter, and plain-text email delivery for RCA completions.
  • Non-render pipeline path (render=False) reuses _run_session_alert_payload and consumes the same synthetic event stream as the streaming renderer, collecting final state from on_chain_end events; the _events() generator's finally block ensures the asyncio pump is always cancelled.

Confidence Score: 5/5

The change is safe to merge. The architectural shift from global monkeypatching to explicit parameters is well-contained, and the background thread model correctly avoids the dict-lookup race that existed in earlier iterations.

All three subsystems (background runner, SMTP delivery, pipeline refactor) are well-isolated and have corresponding tests. The removal of _render_report_patch_lock solves the serialization problem without introducing new shared state. Error handling in _connect_client, send_smtp_report, and the worker thread is symmetric and complete. The only inaccuracy found is that the email stats section may report 0 for tool-call count and loop count because those fields aren't forwarded through the publish_findings event, but this is cosmetic and doesn't affect RCA delivery.

app/cli/investigation/investigate.py — the render=False event-collection loop is new, untested by unit tests (only integration-level LLM tests cover it), so any future change to the synthetic event schema in runners.py could silently break background state collection.

Important Files Changed

Filename Overview
app/pipeline/runners.py Removes _render_report_patch_lock and replaces module-level monkeypatching with explicit render_to_terminal/open_report_in_editor kwargs; _traced_node correctly passes **kwargs through to generate_report.
app/cli/interactive_shell/runtime/background_runner.py New background investigation runner using daemon threads; record is closed over (not looked up by dict key) so the previous KeyError race is fixed; _safe_console_print wraps patch_stdout for partial thread-safety.
app/cli/investigation/investigate.py Adds render=False path that builds final_state by consuming synthetic on_chain_end events; publish_findings event provides root_cause/validated_claims/remediation_steps correctly, but evidence_entries and investigation_loop_count are absent from that event.
app/utils/smtp_delivery.py New SMTP delivery helper; _connect_client properly cleans up on STARTTLS/login failure via try/except with suppress; send_smtp_report and verify_smtp_connection both quit/close in finally blocks.
app/integrations/config_models.py Adds SMTPIntegrationConfig with port range validation, security mode whitelist, and auth-pair consistency check; all field normalizers follow the existing StrictConfigModel pattern.
app/cli/interactive_shell/runtime/background_notifications.py Thin dispatch layer; resolves effective integrations on the worker thread and delegates to smtp_delivery; gracefully marks unsupported channels and missing SMTP config without raising.
app/cli/interactive_shell/runtime/session.py Adds background_mode_enabled, background_investigations dict, and background_notification_preferences fields; reset() now clears all three, resolving the prior stale-preferences concern.
app/delivery/publish_findings/node.py generate_report gains render_to_terminal and open_report_in_editor keyword-only params with True defaults; existing callers without these kwargs continue to render and open the editor unchanged.

Sequence Diagram

sequenceDiagram
    participant User as REPL User
    participant REPL as Interactive Shell
    participant Runner as background_runner
    participant Worker as Worker Thread
    participant Pipeline as astream_investigation
    participant Notify as deliver_background_notifications
    participant SMTP as SMTP Server

    User->>REPL: /background on
    REPL->>REPL: "session.background_mode_enabled = True"

    User->>REPL: free-text alert or /investigate
    REPL->>Runner: start_background_text_investigation()
    Runner->>Runner: create TaskRecord + BackgroundInvestigationRecord
    Runner->>Worker: thread.start() (daemon)
    Runner-->>REPL: task_id (immediate return)
    REPL-->>User: background investigation started

    Worker->>Pipeline: "run_fn(render=False, suppress_editor=True)"
    Pipeline->>Pipeline: astream_investigation events
    Pipeline-->>Worker: final_state

    Worker->>Worker: "record.status = completed"
    Worker->>Notify: deliver_background_notifications(record, channels)
    Notify->>Notify: resolve_effective_integrations()
    Notify->>SMTP: send_smtp_report(subject, body)
    SMTP-->>Notify: ok/error
    Notify-->>Worker: email result

    Worker->>Worker: task.mark_completed()
    Worker->>REPL: console print complete

    User->>REPL: /background use bg-xxx
    REPL->>REPL: "session.last_state = record.final_state"
    REPL-->>User: background RCA active
Loading

Reviews (3): Last reviewed commit: "Fix background runner startup race" | Re-trigger Greptile

Comment thread app/utils/smtp_delivery.py Outdated
Comment thread app/cli/interactive_shell/runtime/session.py
Comment thread app/cli/interactive_shell/runtime/background_runner.py
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 29, 2026

Greptile Summary

This PR adds session-local background RCA investigations to the interactive REPL and wires up SMTP as the first notification channel, so users can run investigations asynchronously and receive an email summary when they complete.

  • Background mode (/background on|off|status|list|show|use|notify) launches investigations in daemon threads, stores structured summaries (root_cause, top_analysis, next_steps, stats), and supports promoting a completed RCA into the active follow-up context via /background use <task_id>.
  • SMTP integration adds SMTPIntegrationConfig with strict validation, env bootstrap (SMTP_HOST / SMTP_PORT / SMTP_SECURITY / SMTP_FROM_ADDRESS / SMTP_DEFAULT_TO), a CLI setup wizard, and a verify smtp adapter using stdlib smtplib; send_smtp_report sends a plain-text email on investigation completion.
  • Non-rendering investigation path (render=False) suppresses the terminal stream renderer and editor open while still driving the full LangGraph pipeline, with a new suppress_editor flag in astream_investigation.

Confidence Score: 3/5

Background investigations share a global threading lock with foreground investigations, meaning a running background job will block the REPL; plus a small race window on /reset can silently kill a worker thread.

Two concrete defects in the concurrency model affect the core value of background mode. The global _render_report_patch_lock is held for the full duration of LLM inference inside every investigation — background or foreground — so a user who starts a background investigation and then tries to investigate something in the foreground will see the REPL hang until the background job finishes. Separately, the record dict lookup inside _worker sits outside the try/except block, so a /reset call during thread startup can terminate the worker with no feedback, no mark_failed, and no cleanup of the in-flight call. The SMTP integration itself, the config validation, the CLI wiring, and the test coverage are all clean.

app/pipeline/runners.py (lock scope) and app/cli/interactive_shell/runtime/background_runner.py (record lookup placement) need attention before merge.

Important Files Changed

Filename Overview
app/cli/interactive_shell/runtime/background_runner.py Launches background investigation threads; record lookup is outside the try block, creating a silent KeyError failure path on concurrent /reset.
app/pipeline/runners.py Adds suppress_editor flag; _render_report_patch_lock is held for the entire _merge() call, which serializes background and foreground investigations under the same lock.
app/utils/smtp_delivery.py New SMTP delivery helper with correct connection lifecycle (starttls, ssl, none), proper finally-cleanup, and plain-text email construction via stdlib only.
app/integrations/config_models.py Adds SMTPIntegrationConfig with strict validation for port range, security mode, auth pair, and email address format.
app/cli/interactive_shell/runtime/background_notifications.py Notification dispatcher; gracefully handles missing SMTP config with "missing smtp integration" result, but sends email by default for all SMTP-configured users.
app/cli/interactive_shell/runtime/background.py Defines BackgroundInvestigationRecord and BackgroundNotificationPreferences dataclasses; default channel ("email",) opts users into email without explicit consent.
app/cli/interactive_shell/command_registry/background_cmds.py Clean REPL surface for background commands; all subcommands validated, unknown subcommands handled with errors and mark_latest(ok=False).
app/cli/investigation/investigate.py Adds render=False non-rendering path for background investigations; event-based state collection may produce sparser final_state than the StreamRenderer path.
app/integrations/_catalog_impl.py Adds SMTP env bootstrap (SMTP_HOST, SMTP_PORT, etc.) and catalog classification; follows existing patterns for other integrations.
tests/utils/test_smtp_delivery.py Good unit coverage with a faithful fake SMTP client; covers starttls, ssl, send, and missing-recipient failure paths.

Sequence Diagram

sequenceDiagram
    participant User as User (REPL)
    participant Exec as execution.py
    participant Runner as background_runner.py
    participant Thread as Daemon Thread
    participant Pipeline as astream_investigation
    participant Lock as _render_report_patch_lock
    participant Notify as background_notifications.py
    participant SMTP as smtp_delivery.py

    User->>Exec: free-text alert (background mode on)
    Exec->>Runner: start_background_text_investigation()
    Runner->>Thread: "threading.Thread(target=_worker).start()"
    Runner-->>User: background investigation started — task bg-xxx

    Thread->>Pipeline: run_investigation_for_session_background()
    Pipeline->>Lock: acquire (held for full _merge() duration)
    Note over Lock: Foreground investigations block here
    Pipeline-->>Thread: final_state dict
    Lock-->>Pipeline: release

    Thread->>Notify: deliver_background_notifications(record, channels)
    Notify->>SMTP: send_smtp_report(body, subject, smtp_ctx)
    SMTP-->>Notify: (ok, error)
    Notify-->>Thread: results dict

    Thread->>User: console.print(background investigation complete — task bg-xxx ready)
Loading

Reviews (2): Last reviewed commit: "Add background RCA email notifications v..." | Re-trigger Greptile

Comment thread app/cli/interactive_shell/runtime/background_runner.py
Comment thread app/pipeline/runners.py Outdated
Comment thread app/cli/interactive_shell/runtime/background.py
@muddlebee
Copy link
Copy Markdown
Collaborator Author

@greptile review

@muddlebee muddlebee marked this pull request as draft May 29, 2026 11:20
@Devesh36
Copy link
Copy Markdown
Collaborator

🚀

@muddlebee muddlebee changed the title [FEATURE] Background RCA email notifications via SMTP feat(cli) : Background RCA email notifications via SMTP May 29, 2026
@muddlebee muddlebee changed the title feat(cli) : Background RCA email notifications via SMTP feat(cli) : background RCA email notifications via SMTP May 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Background investigation mode with email-first RCA notifications

3 participants