Skip to content

feat(integrations): add Temporal workflow platform integration#2664

Open
Adityuh1 wants to merge 2 commits into
Tracer-Cloud:mainfrom
Adityuh1:main
Open

feat(integrations): add Temporal workflow platform integration#2664
Adityuh1 wants to merge 2 commits into
Tracer-Cloud:mainfrom
Adityuh1:main

Conversation

@Adityuh1
Copy link
Copy Markdown

  • Integration config under app/integrations/temporal.py
  • HTTP service client under app/services/temporal/client.py
  • Four tools under app/tools/TemporalTool/ with typed input schemas
  • 25 unit/mock tests under tests/integrations/test_temporal.py
  • Docs page at docs/integrations/temporal.mdx registered in docs/docs.json
  • Registered in config_models, effective_models, and registry

Fixes #2572

Describe the changes you have made in this PR -

Adds full Temporal workflow platform integration enabling OpenSRE to investigate
failed workflows, activity timeouts, worker failures, and namespace health issues.

Four tools added:

  • temporal_list_workflows: list recent executions with status and failure reason
  • temporal_workflow_history: fetch event history for a specific run
  • temporal_task_queue: list task queue pollers and worker health
  • temporal_namespace_metrics: namespace-level metrics and cluster info

Demo/Screenshot for feature changes and bug fixes -

openSre-Temporal-demo.mp4

Code Understanding and AI Usage

Did you use AI assistance (ChatGPT, Claude, Copilot, etc.) to write any part of this code?

  • No, I wrote all the code myself
  • Yes, I used AI assistance (continue below)

If you used AI assistance:

  • I have reviewed every single line of the AI-generated code
  • I can explain the purpose and logic of each function/component I added
  • I have tested edge cases and understand how the code handles them
  • I have modified the AI output to follow this project's coding standards and conventions

Explain your implementation approach:

Describe in your own words:

  • What problem does your code solve?
    Teams using Temporal for workflow orchestration had no way to investigate failures through OpenSRE. When a workflow times out or workers stop polling, engineers had to manually dig through the Temporal UI to find the root cause. This integration brings that data directly into OpenSRE's investigation pipeline so failures can be surfaced automatically during an RCA.

  • What alternative approaches did you consider?
    Temporal has both a gRPC API and an HTTP API. I initially considered using the official Temporal Python SDK which uses gRPC, but that would add a heavy dependency. The HTTP API achieves the same result with just httpx which is already a common dependency, so I went with that instead.

  • Why did you choose this specific implementation?
    I looked at how the Prefect integration was structured and followed the same pattern, a config model, a service client, and tool classes on top. This keeps things consistent with the rest of the codebase and makes it easier for maintainers to review. Keeping the client thin and putting error handling in the tools means each layer has a single responsibility.

  • What are the key functions/components and what do they do?There arre 4 main key components :
    1] TemporalConfig :---: holds connection settings and loads them from environment variables
    2] TemporalClient :---: thin HTTP wrapper with four methods matching the four investigation use cases
    3] Four LangChain tools :---: each wraps one client method, validates inputs with Pydantic, and returns a clean JSON string the agent can reason over
    4] get_temporal_tools() :---: factory function that wires everything together and returns all four tools ready to use


Checklist before requesting a review

  • I have added proper PR title and linked to the issue
  • I have performed a self-review of my code
  • I can explain the purpose of every function, class, and logic block I added
  • I understand why my changes work and have tested them thoroughly
  • I have considered potential edge cases and how my code handles them
  • If it is a core feature, I have added thorough tests
  • My code follows the project's style guidelines and conventions

Note: Please check Allow edits from maintainers if you would like us to assist in the PR.

- Integration config under app/integrations/temporal.py
- HTTP service client under app/services/temporal/client.py
- Four tools under app/tools/TemporalTool/ with typed input schemas
- 25 unit/mock tests under tests/integrations/test_temporal.py
- Docs page at docs/integrations/temporal.mdx registered in docs/docs.json
- Registered in config_models, effective_models, and registry
@github-actions
Copy link
Copy Markdown
Contributor

Greptile code review

This repo uses Greptile for automated review. Before merge, aim for Confidence Score: 5/5 with zero unresolved review threads — see CONTRIBUTING.md.

Run a review — add a PR comment with:

@greptile review

Give it ~5-10 minutes (sometimes longer) for results, then fix feedback and re-trigger until you reach Confidence Score: 5/5.

Optional: automate with the greploop skill.

@Adityuh1
Copy link
Copy Markdown
Author

@greptile review

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 29, 2026

Greptile Summary

This PR adds a Temporal workflow platform integration to OpenSRE, enabling investigation of failed workflows, activity timeouts, worker health, and namespace metrics through four new LangChain tools backed by a thin httpx HTTP client.

  • Client layer (app/services/temporal/client.py): workflow_id and run_id are interpolated into URL paths without URL-encoding, which breaks for any workflow ID containing / or other special characters. Both _get and _post also pass verify=self.config.tls, meaning HTTPS connections where tls=False skip SSL certificate verification entirely.
  • Tool layer (app/tools/TemporalTool/tool.py): temporal_list_workflows attempts to surface failure reasons from memo.fields, but Temporal's memo is user-defined metadata — actual failure details only appear in workflow history events, so this feature silently produces nothing for most failed workflows.
  • Config layer (app/integrations/config_models.py + app/integrations/temporal.py): Two disconnected config models exist for the same integration — TemporalIntegrationConfig.api_key defaults to str = \"\" while TemporalConfig.api_key is str | None = None — and the tools never consume the framework-level config, so settings configured through the integration UI are silently ignored.

Confidence Score: 3/5

Not safe to merge without fixes — multiple real defects in the new client and tool layer that would cause silent failures or incorrect behaviour in production.

The client constructs URL paths by interpolating user-supplied workflow IDs and run IDs without encoding, so any slash in a workflow ID silently 404s. SSL certificate verification is disabled whenever TLS is set to false, which is the default, leaving any HTTPS-only deployment open to cert bypass. The temporal_list_workflows failure-reason feature is a documented capability that produces nothing in practice because it reads from the wrong field in the API response. And the two config models for the same integration are wired to different defaults and are never connected, so framework-level config is silently ignored at runtime.

app/services/temporal/client.py and app/tools/TemporalTool/tool.py need the most attention; app/integrations/config_models.py and app/integrations/temporal.py need alignment on the config model.

Important Files Changed

Filename Overview
app/services/temporal/client.py Thin httpx wrapper for Temporal REST API; has URL path injection (unencoded workflow_id/run_id) and inverted SSL-verification logic (verify=self.config.tls).
app/tools/TemporalTool/tool.py Four LangChain tools wrapping the client; failure-reason surfacing in temporal_list_workflows silently no-ops because memo is user metadata, not system failure data.
app/integrations/temporal.py Standalone TemporalConfig (BaseModel) with env-var loader; diverges from TemporalIntegrationConfig in config_models.py (api_key type mismatch, no shared wiring).
app/integrations/config_models.py Adds TemporalIntegrationConfig with api_key: str = "", inconsistent with `TemporalConfig.api_key: str
app/integrations/effective_models.py Adds `temporal: EffectiveIntegrationEntry
app/integrations/registry.py Registers temporal integration spec; no issues.
tests/integrations/test_temporal.py 25 mock-based unit tests covering config, client, and all four tools; good coverage of happy paths and error cases, but no test for the URL encoding scenario.
docs/integrations/temporal.mdx Well-written docs page covering setup, tool descriptions, and an example investigation flow.

Sequence Diagram

sequenceDiagram
    participant Agent
    participant Tool as TemporalTool (_run)
    participant _make_client
    participant EnvVars as Environment Variables
    participant Client as TemporalClient
    participant Temporal as Temporal HTTP API

    Agent->>Tool: invoke(query, page_size, ...)
    Tool->>_make_client: _make_client(self.config)
    _make_client->>EnvVars: load_temporal_config_from_env() [if no config]
    EnvVars-->>_make_client: TemporalConfig
    _make_client-->>Tool: TemporalClient

    Tool->>Client: list_workflows / get_workflow_history / ...
    Client->>Temporal: "POST/GET /api/v1/namespaces/{ns}/..."
    Temporal-->>Client: JSON response
    Client-->>Tool: list[dict] / dict
    Tool-->>Agent: JSON string
Loading

Comments Outside Diff (1)

  1. app/tools/TemporalTool/tool.py, line 385-390 (link)

    P1 Failure reason extraction is a no-op for most workflows

    The code surfaces a failure reason from ex.get("memo", {}).get("fields", {}) for FAILED/TIMED_OUT/TERMINATED executions. However, memo in the list workflows response is user-defined metadata set by the application at workflow start — it is not populated by Temporal with system failure information. Actual failure details (failure message, cause, stack trace) live only in the workflow history events, which requires a separate get_workflow_history call. As written, entry["memo"] will be empty for the vast majority of failed workflows, making the "failure reason" feature described in the tool description and docs a no-op in practice.

Reviews (1): Last reviewed commit: "feat(integrations): add Temporal workflo..." | Re-trigger Greptile

Comment thread app/services/temporal/client.py Outdated
Comment on lines +96 to +101
ns = self.config.namespace
path = (
f"/api/v1/namespaces/{ns}/workflows/{workflow_id}"
f"/runs/{run_id}/history"
)
data = self._get(path, params={"maximumPageSize": max_event_count})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 URL path injection via unencoded workflow_id and run_id

workflow_id and run_id are interpolated directly into the URL path without URL-encoding. Temporal workflow IDs are user-defined strings and can legally contain / (e.g., order-service/txn-42). A slash would split the path into extra segments, causing the server to return 404 with no obvious error message. The same applies to task_queue in list_task_queues and namespace in every path. Use urllib.parse.quote(workflow_id, safe="") (and similarly for run_id) before inserting into the f-string.

Comment on lines +39 to +43
def _get(self, path: str, params: dict[str, Any] | None = None) -> dict[str, Any]:
url = f"{self.config.base_url}{path}"
try:
with httpx.Client(timeout=DEFAULT_TIMEOUT, verify=self.config.tls) as client:
response = client.get(url, headers=self._headers, params=params or {})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 verify=self.config.tls disables SSL verification for non-TLS connections

httpx.Client(verify=False) disables certificate verification for HTTPS connections. Here it is False whenever tls=False. For plain-HTTP deployments this has no effect, but any self-hosted Temporal behind a TLS reverse-proxy (where the operator forgot to set tls=True) would silently skip cert verification and accept invalid/expired certificates. The two concepts — does this endpoint speak TLS? (which controls the URL scheme) and should we verify the certificate? — are semantically independent and should not be coupled.

Suggested change
def _get(self, path: str, params: dict[str, Any] | None = None) -> dict[str, Any]:
url = f"{self.config.base_url}{path}"
try:
with httpx.Client(timeout=DEFAULT_TIMEOUT, verify=self.config.tls) as client:
response = client.get(url, headers=self._headers, params=params or {})
def _get(self, path: str, params: dict[str, Any] | None = None) -> dict[str, Any]:
url = f"{self.config.base_url}{path}"
try:
with httpx.Client(timeout=DEFAULT_TIMEOUT, verify=True) as client:
response = client.get(url, headers=self._headers, params=params or {})

Comment on lines +55 to +59
def _post(self, path: str, body: dict[str, Any] | None = None) -> dict[str, Any]:
url = f"{self.config.base_url}{path}"
try:
with httpx.Client(timeout=DEFAULT_TIMEOUT, verify=self.config.tls) as client:
response = client.post(url, headers=self._headers, json=body or {})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Same verify=self.config.tls coupling in _post. Should be verify=True so that any HTTPS connection (regardless of how tls is set) always validates the server certificate.

Suggested change
def _post(self, path: str, body: dict[str, Any] | None = None) -> dict[str, Any]:
url = f"{self.config.base_url}{path}"
try:
with httpx.Client(timeout=DEFAULT_TIMEOUT, verify=self.config.tls) as client:
response = client.post(url, headers=self._headers, json=body or {})
def _post(self, path: str, body: dict[str, Any] | None = None) -> dict[str, Any]:
url = f"{self.config.base_url}{path}"
try:
with httpx.Client(timeout=DEFAULT_TIMEOUT, verify=True) as client:
response = client.post(url, headers=self._headers, json=body or {})

Comment on lines 891 to +899
)

class TemporalIntegrationConfig(StrictConfigModel):
host: str = "localhost"
port: int = 7233
namespace: str = "default"
api_key: str = ""
tls: bool = False
_normalize_strs = field_validator("host", "namespace", "api_key", mode="before")(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 TemporalIntegrationConfig.api_key type differs from TemporalConfig.api_key

TemporalIntegrationConfig (here) declares api_key: str = "", but TemporalConfig in app/integrations/temporal.py declares api_key: str | None = None. The two models represent the same concept — the API key for Temporal Cloud — but with different defaults and types. The tools and client are wired to TemporalConfig, so configuration persisted through the integration framework (which uses TemporalIntegrationConfig) is never consumed by the client. Any operator who configures Temporal via the integration framework expecting the key to be picked up will get silent auth failures.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 29, 2026

Greptile Summary

This PR adds a Temporal workflow platform integration, including a config model, an HTTP service client, four LangChain tools (list workflows, workflow history, task queue, namespace metrics), 25 unit tests, and docs. It follows the same layered structure as the existing Prefect integration.

  • The HTTP client in app/services/temporal/client.py embeds workflow_id, run_id, and task_queue directly into URL path strings without urllib.parse.quote, which would corrupt requests for any Temporal workflow ID containing / (a common pattern).
  • TemporalIntegrationConfig is registered in the integration framework, but the tools always fall back to load_temporal_config_from_env() and never read config from the registry entry \u2014 any values stored through the integration system are silently ignored at runtime.

Confidence Score: 3/5

Not safe to merge as-is — workflow IDs with slashes will silently hit wrong API endpoints, and stored integration config is never applied.

The HTTP client builds URL paths by string-interpolating user-supplied IDs without encoding them. Temporal workflow IDs that contain / would corrupt the path, causing requests to resolve to the wrong endpoint. Additionally, the four tools always call load_temporal_config_from_env() as their fallback, meaning any configuration saved through the integration registry is ignored at runtime.

app/services/temporal/client.py (URL encoding on all path segments) and app/tools/TemporalTool/tool.py (wiring registry config into the tool factory)

Important Files Changed

Filename Overview
app/services/temporal/client.py Thin HTTP client for the Temporal REST API; missing URL encoding on path segments (workflow_id, run_id, task_queue) and misuses verify=self.config.tls.
app/tools/TemporalTool/tool.py Four LangChain tools wrapping the Temporal client; integration registry config is never wired in — tools always fall back to env-var loading regardless of stored integration config.
app/integrations/temporal.py Standalone TemporalConfig (BaseModel) and env-var loader; duplicates fields already in TemporalIntegrationConfig (StrictConfigModel) in config_models.py, but the separation mirrors how Prefect is structured.
app/integrations/config_models.py Adds TemporalIntegrationConfig to the registry model; missing blank line before class definition is the only issue.
tests/integrations/test_temporal.py 25 unit tests covering config, client, and all four tools with mock HTTP; good coverage but mocked status values may not reflect actual Temporal REST API enum format.

Comments Outside Diff (2)

  1. app/services/temporal/client.py, line 218-224 (link)

    P1 Missing URL encoding on path segments

    workflow_id, run_id, and task_queue are embedded into URL path strings without urllib.parse.quote. Temporal workflow IDs routinely contain / (e.g. order/v2/abc-123), which would silently corrupt the path into an entirely different API endpoint, returning a 404 or wrong data. The sibling Prefect client already uses quote(work_pool_name, safe='') for the same reason. All three path segments need encoding: workflow_id, run_id, and task_queue.

  2. app/tools/TemporalTool/tool.py, line 342-343 (link)

    P1 Integration registry config is silently bypassed

    TemporalIntegrationConfig is registered in config_models.py, effective_models.py, and registry.py, giving it a proper place in the integration system. However, _make_client falls back to load_temporal_config_from_env() whenever self.config is None — and none of the tool-construction paths pass a config derived from the registry entry. The result is that any configuration stored through the integration UI/API is ignored at runtime; tools always read environment variables. The get_temporal_tools() factory should accept and forward the registry config instead of relying on env-var fallback.

Reviews (2): Last reviewed commit: "feat(integrations): add Temporal workflo..." | Re-trigger Greptile

Comment thread app/services/temporal/client.py Outdated
Comment on lines +42 to +43
with httpx.Client(timeout=DEFAULT_TIMEOUT, verify=self.config.tls) as client:
response = client.get(url, headers=self._headers, params=params or {})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 verify=self.config.tls passes the TLS enable flag as httpx's certificate-verification flag. When tls=False (plain HTTP), this silently sets verify=False — a no-op on HTTP but bad practice because it would also suppress SSL warnings on any accidental HTTPS call. The verify parameter controls certificate validation; the URL scheme (http/https) already controls whether TLS is used.

Suggested change
with httpx.Client(timeout=DEFAULT_TIMEOUT, verify=self.config.tls) as client:
response = client.get(url, headers=self._headers, params=params or {})
with httpx.Client(timeout=DEFAULT_TIMEOUT) as client:
response = client.get(url, headers=self._headers, params=params or {})

Comment thread app/services/temporal/client.py Outdated
Comment on lines +58 to +59
with httpx.Client(timeout=DEFAULT_TIMEOUT, verify=self.config.tls) as client:
response = client.post(url, headers=self._headers, json=body or {})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Same verify=self.config.tls issue on the POST path — should not conflate the TLS enable flag with certificate verification.

Suggested change
with httpx.Client(timeout=DEFAULT_TIMEOUT, verify=self.config.tls) as client:
response = client.post(url, headers=self._headers, json=body or {})
with httpx.Client(timeout=DEFAULT_TIMEOUT) as client:
response = client.post(url, headers=self._headers, json=body or {})

Comment on lines 889 to +893
_normalize_strs = field_validator("api_key", "account_id", "workspace_id", mode="before")(
normalize_str()
)

class TemporalIntegrationConfig(StrictConfigModel):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 A blank line is missing before the class definition. All other StrictConfigModel subclasses in this file follow a two-blank-line separation per PEP 8.

Suggested change
_normalize_strs = field_validator("api_key", "account_id", "workspace_id", mode="before")(
normalize_str()
)
class TemporalIntegrationConfig(StrictConfigModel):
_normalize_strs = field_validator("api_key", "account_id", "workspace_id", mode="before")(
normalize_str()
)
class TemporalIntegrationConfig(StrictConfigModel):

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

@cerencamkiran
Copy link
Copy Markdown
Collaborator

Thanks for your effort

In "app/integrations/temporal.py" and "app/integrations/config_models.py", there are currently two separate Temporal config models ("TemporalConfig" and "TemporalIntegrationConfig") and they don't seem to be wired together, so integration-level config may be ignored at runtime.

In "app/services/temporal/client.py", "workflow_id", "run_id", "task_queue", and similar path parameters should probably be URL-encoded before being added to API paths.

In "app/services/temporal/client.py", "verify=self.config.tls" mixes TLS enablement with certificate verification. Those are usually separate concerns.

In "app/integrations/temporal.py", the config mentions the default gRPC port (7233), but the implementation uses an HTTP client, so the expected deployment path is a bit unclear.

In "app/tools/TemporalTool/tool.py", failure reasons appear to be read from "memo", but actual Temporal failure details are usually available through workflow history events.

Could you take another look at these?

@Adityuh1
Copy link
Copy Markdown
Author

got it , will be back with the fixes asap!

- URL-encode workflow_id, run_id, task_queue, namespace in all API paths
- Remove verify=self.config.tls, decouple TLS from cert verification
- Align api_key type between TemporalConfig and TemporalIntegrationConfig
- Wire TemporalIntegrationConfig to TemporalConfig via load_temporal_config_from_integration
- Replace memo-based failure reason with hint to use temporal_workflow_history
- Clarified HTTP API usage in docstring and port field description
@Adityuh1
Copy link
Copy Markdown
Author

Thanks for the detailed review @cerencamkiran !
All issues have been addressed:

1] URL encoding — workflow_id, run_id, task_queue, and namespace are now URL-encoded using urllib.parse.quote before being inserted into API paths.
2] TLS/verify decoupling — removed verify=self.config.tls, httpx now always validates certificates; the URL scheme controls TLS usage.
3] Config wiring — added load_temporal_config_from_integration() which converts TemporalIntegrationConfig (registry) into TemporalConfig (client); get_temporal_tools() now accepts and forwards registry config with env-var fallback.
4] api_key type alignment — TemporalIntegrationConfig.api_key changed to str | None = None matching TemporalConfig.
5] Port/HTTP clarity — module docstring and port field description updated to clarify this uses the HTTP API gateway on port 7233.

Lint, typecheck and all 25 tests passing.

Attached the demo for the same

OpenSre-temporal.integration.fixes.1.mp4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: add Temporal workflow orchestration integration

2 participants