Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Reviewer's GuideCentralizes tracing/logging configuration via a new telemetry crate, wires OTEL export into API and engine binaries, and enriches workflow/run lifecycle tracing (including NATS correlation) while updating docs to describe the new internals and tracing model. Sequence diagram for create_run to engine workflow_run with trace correlationsequenceDiagram
actor User
participant Api as metis_api
participant RunService
participant Nats as NATS
participant Engine as Engine_runtime
User->>Api: HTTP POST /runs
Api->>RunService: create_run(run_id, user_id, req)
note over RunService: #[tracing::instrument] creates span create_run
RunService->>RunService: EngineRequestValidator::new(engine_config)
RunService->>RunService: validate(req)
RunService->>RunService: span_id = Span::current().id()
RunService->>RunService: trace_id = format(span_id.into_u64())
RunService->>Nats: publish_run(topic, RunRequestMessage{run_id, request, user_id, trace_id})
RunService->>RunService: info!("Run created and dispatched")
RunService-->>Api: Ok(())
Nats-->>Engine: RunRequestMessage{run_id, request, user_id, trace_id}
Engine->>Engine: EngineRuntime::handle_run_message(...)
Engine->>Engine: run(run_id, user_id, request, trace_id)
note over Engine: info_span workflow_run created with api_trace_id field
Engine->>Engine: run_inner(run_id, user_id, request)
Engine->>Engine: state=initializing → building → executing → running → parsing → updating_db → cleanup
Engine->>Engine: duration_ms recorded
Engine->>Engine: on error: record error field and log "Workflow execution failed"
Engine-->>Nats: completion/heartbeat events (unchanged)
Engine-->>Api: DB state updated, visible via existing endpoints
Sequence diagram for run cancellation tracingsequenceDiagram
actor User
participant Api as metis_api
participant RunService
participant Nats as NATS
participant Engine as Engine_runtime
User->>Api: HTTP POST /runs/{id}/cancel
Api->>RunService: request_cancel(id)
note over RunService: #[tracing::instrument] span request_cancel with run_id
RunService->>RunService: run = repo.find_by_id(id)
alt run.state == Queued
RunService->>RunService: update_state_if(Queued -> Canceled)
RunService->>RunService: info!("Queued run canceled immediately")
RunService-->>Api: Ok(None)
else run.state == Running
RunService->>RunService: engine_id = run.engine_id
RunService->>Nats: publish_cancel(engine_id, run_id)
RunService->>RunService: info!("Cancel signal dispatched to engine")
RunService-->>Api: Ok(Some(engine_id))
Nats-->>Engine: cancel message
Engine->>Engine: EngineRuntime::cancel(run_id)
note over Engine: info_span workflow_cancel
Engine->>Engine: mark run_id as cancelled
Engine->>Engine: pid = pid_store.get(run_id)
alt pid exists
Engine->>Engine: ProcessExecutor::cancel(pid)
Engine->>Engine: pid_store.remove(run_id)
Engine->>Engine: info!("Workflow cancellation completed")
else pid missing
Engine->>Engine: warn!("No PID found for run_id...")
end
else
RunService-->>Api: Ok(None) %% no-op for terminal states
end
Class diagram for telemetry initialization and engine runtime changesclassDiagram
class Telemetry {
+init_tracing(service_name: &'static str) void
+tracing
+tracing_subscriber
}
class ApiTracingModule {
+init_tracing() void
}
class ApiMain {
+main() Result
}
class EngineServer {
+bootstrap(engine: Engine, name: &'static str) EngineResult
}
class EngineRuntime {
+config() &Arc_FullEngineConfig
+run(run_id: Uuid, user_id: String, req: ValidatedRunRequest, trace_id: Option_String) Result_RunSummary_EngineError
+run_inner(run_id: Uuid, user_id: String, req: ValidatedRunRequest) Result_RunSummary_EngineError
+cancel(run_id: &str) Result_void_EngineError
}
class RunService {
+create_run(run_id: &str, user_id: &str, req: CreateRunRequest) ServiceResult_void
+request_cancel(id: &RunId) ServiceResult_Option_String
}
class RunRequestMessage {
+run_id: String
+request: ValidatedRunRequest
+user_id: String
+trace_id: Option_String
}
class Engine {
<<trait>>
+new() Self
+get_workflow_results() Result_HashMap_Category_Files
+get_task_logs() Result_Vec_TaskLog
}
Telemetry <|.. ApiTracingModule : uses
ApiTracingModule <|.. ApiMain : calls
Telemetry <|.. EngineServer : uses
EngineServer o--> EngineRuntime
EngineRuntime ..> RunRequestMessage
RunService ..> RunRequestMessage
Engine <|.. GenericEngine
class GenericEngine {
+new() Self
}
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 3 issues, and left some high level feedback:
- The
telemetrycrate is declared withedition = "2024", which is not yet a stable Rust edition; consider aligning it with the rest of the workspace (likelyedition = "2021") to avoid toolchain issues. - The
enginecrate still usestracingmacros (e.g.,info!,error!) but itsCargo.tomlno longer declares a directtracingdependency; either re-addtracingas a dependency or import it viatelemetry::tracingand adjust theusestatements accordingly. - In
telemetry::init_tracing, failing OTEL exporter construction currentlyexpects and panics; you might want to downgrade this to a logged warning and fall back to local logging so a misconfigured OTEL endpoint does not crash services at startup.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The `telemetry` crate is declared with `edition = "2024"`, which is not yet a stable Rust edition; consider aligning it with the rest of the workspace (likely `edition = "2021"`) to avoid toolchain issues.
- The `engine` crate still uses `tracing` macros (e.g., `info!`, `error!`) but its `Cargo.toml` no longer declares a direct `tracing` dependency; either re-add `tracing` as a dependency or import it via `telemetry::tracing` and adjust the `use` statements accordingly.
- In `telemetry::init_tracing`, failing OTEL exporter construction currently `expect`s and panics; you might want to downgrade this to a logged warning and fall back to local logging so a misconfigured OTEL endpoint does not crash services at startup.
## Individual Comments
### Comment 1
<location path="crates/telemetry/Cargo.toml" line_range="10" />
<code_context>
+tracing-subscriber = { features = [ "env-filter", "json" ], workspace = true }
+
+[package]
+edition = "2024"
+name = "telemetry"
+version = "0.1.0"
</code_context>
<issue_to_address>
**issue (bug_risk):** The `edition = "2024"` setting will not compile on current stable Rust toolchains.
Rust 2021 is the latest stable edition; `2024` is not yet supported and will cause Cargo to error on all stable toolchains. Unless you’re targeting a specific nightly with 2024 support, this should be set to `"2021"` to avoid build failures for consumers.
</issue_to_address>
### Comment 2
<location path="crates/telemetry/src/lib.rs" line_range="18-25" />
<code_context>
+ .with_endpoint(endpoint)
+ .build()
+ .expect("Failed to build OTEL exporter");
+ let tracer_provider = opentelemetry_sdk::trace::TracerProvider::builder()
+ .with_batch_exporter(exporter, opentelemetry_sdk::runtime::Tokio)
+ .build();
</code_context>
<issue_to_address>
**suggestion:** The tracer provider is not configured with a resource, so spans may lack standard service metadata.
Consider building the `TracerProvider` with a `Resource` that sets `service.name` (and any other relevant attributes) from `service_name`, so traces from different binaries are distinguishable in backends. For example:
```rust
let resource = opentelemetry_sdk::Resource::new(vec![
opentelemetry::KeyValue::new("service.name", service_name),
]);
let tracer_provider = opentelemetry_sdk::trace::TracerProvider::builder()
.with_resource(resource)
.with_batch_exporter(exporter, opentelemetry_sdk::runtime::Tokio)
.build();
```
```suggestion
let exporter = opentelemetry_otlp::SpanExporter::builder()
.with_tonic()
.with_endpoint(endpoint)
.build()
.expect("Failed to build OTEL exporter");
let resource = opentelemetry_sdk::Resource::new(vec![
opentelemetry::KeyValue::new("service.name", service_name),
]);
let tracer_provider = opentelemetry_sdk::trace::TracerProvider::builder()
.with_resource(resource)
.with_batch_exporter(exporter, opentelemetry_sdk::runtime::Tokio)
.build();
```
</issue_to_address>
### Comment 3
<location path="docs/dev/engine.md" line_range="15" />
<code_context>
+
+## Engine Trait
+
+New engines implement two methods:
+
+```rust
</code_context>
<issue_to_address>
**issue (typo):** The text says "two methods" but the trait example shows three methods.
This wording is inconsistent with the `Engine` trait example, which defines three methods (`new`, `get_workflow_results`, `get_task_logs`). Please update the sentence to match the actual number of methods or refer to “the following methods.”
```suggestion
New engines implement the following methods:
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Comments
Summary by Sourcery
Centralize tracing and logging setup across API and engine services, enabling optional OpenTelemetry export and improved workflow run observability.
New Features:
Enhancements:
Build:
Documentation: