Skip to content

Feat/opentelemetry#7

Open
kudroma404 wants to merge 33 commits into
mainfrom
feat/opentelemetry
Open

Feat/opentelemetry#7
kudroma404 wants to merge 33 commits into
mainfrom
feat/opentelemetry

Conversation

@kudroma404

Copy link
Copy Markdown

No description provided.

teshaTe and others added 27 commits January 26, 2026 13:26
* exposed option for controlling face amount for decimation
Log lines for task received, parameters, generation timing, and task
completion use the format: task_id: message.

Co-authored-by: Cursor <cursoragent@cursor.com>
Loki and VictoriaMetrics labels use generator_id instead of worker_id.
SERVICE_NAME is set to 404-generator-mesh-v1 for metrics and log streams.

Co-authored-by: Cursor <cursoragent@cursor.com>
GENERATION_SYNTHETIC_FAILURE_RATE (default 0) triggers a random failure
before GPU work so generation_error_count increments without wasting
compute. Log startup when the rate is positive.

Co-authored-by: Cursor <cursoragent@cursor.com>
Loki stream labels and VictoriaMetrics counter/histogram labels use
generator_mesh_v1_id instead of generator_id. Update serve wiring and
VictoriaMetricsManager call sites.

Co-authored-by: Cursor <cursoragent@cursor.com>
Wire task_id into Loki stream labels via record extra; contextualize
/generate and generation_block so executor-thread logs include it.

Co-authored-by: Cursor <cursoragent@cursor.com>
…ontext

Replace sys.stderr with a Loguru-backed writer for the duration of
generation_block so that tqdm progress bars and any other stderr output
from third-party libraries (trellis, o_voxel) are captured as structured
DEBUG log lines. Because the redirect runs inside logger.contextualize(),
the task_id label is automatically included and the lines are shipped to
Loki alongside every other structured log from that task.

Co-authored-by: Cursor <cursoragent@cursor.com>
The synthetic failure mechanism was used for Victoria metrics testing
and is no longer needed. Remove _maybe_synthetic_generation_failure,
its call site, the startup warning log, and the now-unused random import.

Co-authored-by: Cursor <cursoragent@cursor.com>
Replace unused generation_synthetic_failure_rate with test_run bool.
When TEST_RUN=true the /generate endpoint skips real GPU work and
rotates through HTTP 500 / 503 / 429 errors and a mock success on each
successive request, recording metrics for each step. Mirrors the
worker's existing MockGenerator pattern.

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Restore TEST_RUN as the mesh v1 error-injection switch and fail roughly half of real generation requests.

Co-authored-by: Cursor <cursoragent@cursor.com>
Prefix xatlas and tqdm stderr progress lines with task_id so plain container logs can be correlated to requests.

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@kudroma404 kudroma404 requested a review from rkosti June 3, 2026 13:30
kudroma404 and others added 2 commits June 5, 2026 13:20
… metric

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
kudroma404 and others added 4 commits June 6, 2026 00:27
…/failure

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
A 1x1 degenerate image causes preprocess_image to produce an empty
bounding box, crashing with a numpy zero-size reduction error deep
in the pipeline. Fail early with a clear message instead.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants