Skip to content

Route company tasks by GPU worker tags#86

Open
Deniskore wants to merge 1 commit into
mainfrom
sn-378-assign-workers
Open

Route company tasks by GPU worker tags#86
Deniskore wants to merge 1 commit into
mainfrom
sn-378-assign-workers

Conversation

@Deniskore

@Deniskore Deniskore commented May 29, 2026

Copy link
Copy Markdown
Collaborator
  1. Add company worker-tag routing.
  2. Replace queue internals with ordered SkipMap scan-pick logic.
  3. Prevent queue reshuffling for skipped tasks.
  4. Add routing-aware exhaustion cache and scan cursors.
  5. Improve queue retire, rollback, and cleanup edge cases.
  6. Add multi-threaded correctness and ordering tests for the queue.

@Deniskore Deniskore force-pushed the sn-378-assign-workers branch from bb6af3f to 0a86557 Compare May 29, 2026 05:05
@Deniskore Deniskore force-pushed the sn-378-assign-workers branch 5 times, most recently from 862c9e4 to d1fb8be Compare June 10, 2026 04:53
@Deniskore Deniskore requested a review from Copilot June 10, 2026 05:12

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds worker-tag-aware routing so company tasks can be constrained to GPU workers advertising matching tags, and refactors the internal task queue to use an ordered scan/pick approach that avoids reshuffling skipped tasks while tracking routing-aware exhaustion state.

Changes:

  • Introduce worker_tags for companies and workers, normalize/validate them, and route queued company tasks based on tag intersection.
  • Replace queue internals with ordered SkipMap scanning, routing-aware exhausted cache, scan cursors, and improved retire/rollback/cleanup behavior.
  • Update metrics and tests (including multi-threaded queue stress tests) to cover routing and the new queue semantics.

Reviewed changes

Copilot reviewed 30 out of 31 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tests/event_tracker/activity.rs Updates test fixtures to include new worker_tags field.
tests/client_http_api/support.rs Adds DB helper to update company worker_tags in tests.
tests/client_http_api/get_tasks.rs Adds end-to-end tests for worker tag validation and routing behavior.
tests/client_http_api/add_task.rs Updates company info assertions to named fields.
src/test_support.rs Wires queue scan cap and queue-length gauge into test harness queue builder.
src/task/tests.rs Adjusts staged-result timing tests and adds bounded-grace stale staged-result test.
src/task/mod.rs Adds bounded grace tracking for pending staged results to prevent indefinite assignment.
src/raft/tests/cross_gateway_add_task.rs Improves HTTP bind race avoidance and updates queue/client configuration in raft tests.
src/raft/server/mod.rs Updates Protocol initialization after protocol buffer handling changes.
src/raft/mod.rs Constructs metrics earlier and passes queue scan cap + gauge into TaskQueue builder.
src/raft/client/mod.rs Updates Protocol initialization after protocol buffer handling changes.
src/protocol/mod.rs Switches recv path to read_to_end with size cap and updated error handling.
src/metrics/mod.rs Reworks queue length metric to IntGauge, exposes gauge handle, removes best-result metric.
src/http3/rate_limits.rs Adds company worker_tags to runtime rate-limit context.
src/http3/handlers/task/get_tasks.rs Normalizes worker tags and reserves tasks using routing-aware queue reservation.
src/http3/handlers/task/add_task.rs Enqueues company tasks with routing metadata derived from company worker tags.
src/http3/handlers/result/read.rs Removes “best result” per-worker metric updates.
src/http3/handlers/result/add_result.rs Records completion time only on successful completion path.
src/db/key_validator.rs Caches company worker tags in the API key validator company metadata.
src/db/data_access.rs Extends company meta queries/decoding to include worker_tags with normalization + tests.
src/common/queue/tests/routing.rs Adds focused routing + scan-cap + cursor behavior tests.
src/common/queue/tests/mod.rs Introduces shared queue test utilities and splits tests into modules.
src/common/queue/tests/lifecycle.rs Refactors lifecycle tests and adds coverage for new retire/rollback ordering behaviors.
src/common/queue/tests/expiration.rs Moves expiration/cleanup-related tests into a dedicated module.
src/common/queue/tests/concurrency.rs Adds multi-threaded correctness and ordering stress tests for scan/pick queue.
src/common/queue/tests/capacity.rs Adds capacity + queue length gauge tests.
src/common/queue/mod.rs Major queue refactor: ordered SkipMap scanning, routing keys, exhaustion/cursor caches, gauge integration.
src/api/request.rs Adds worker_tags to GetTasksRequest and shared tag normalization/validation helper.
dev-env/init-scripts/init-schema.sql Adds worker_tags column to companies table in dev schema.
Cargo.toml Adds crossbeam-skiplist dependency; bumps a few crate versions.
Cargo.lock Updates lockfile for dependency changes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/api/request.rs
Comment thread src/common/queue/mod.rs
Comment thread src/common/queue/mod.rs Outdated
Comment thread src/common/queue/mod.rs

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 31 out of 32 changed files in this pull request and generated 1 comment.

Comment thread src/common/queue/mod.rs Outdated
@Deniskore Deniskore force-pushed the sn-378-assign-workers branch from 9fd4f6c to 2c2e5ac Compare June 10, 2026 05:55
@Deniskore Deniskore force-pushed the sn-378-assign-workers branch from 2c2e5ac to 40d8c27 Compare June 10, 2026 11:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants