Skip to content

johnfitzy/scheduler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Scheduler

A Redis-backed HTTP service for managing recurring scrape jobs. Workers long-poll for jobs, execute them, then report success or failure. The scheduler handles timing, jitter, and rescheduling.

Architecture

┌─────────────┐   GET /jobs/next    ┌───────────┐
│   Workers   │ ◄─────────────────► │  HTTP API │
│  (scrapers) │   POST /jobs/*/     └─────┬─────┘
└─────────────┘   complete|error          │
                                    ┌─────▼─────┐
                                    │  Scheduler │
                                    │   (Go)     │
                                    └─────┬─────┘
                                    ┌─────▼─────┐
                                    │   Redis    │
                                    │  3 keys    │
                                    └───────────┘

A background goroutine polls Redis every poll_interval_s seconds and promotes due jobs from the schedule (sorted set) to the ready queue (list). Workers block on GET /jobs/next and receive jobs as they become available.

Requirements

  • Go 1.22+
  • Redis 6+

Installation

go build -o scheduler .

Configuration

Configuration is loaded from config.yaml in the current directory or $HOME/.scheduler/config.yaml. Every value can be overridden with an environment variable using the SCHED_ prefix (dots become underscores).

server:
  addr: ":8081"
  read_timeout_s: 5
  write_timeout_s: 35   # must exceed scheduler.ready_pop_timeout_s

redis:
  addr: "localhost:6379"
  password: ""
  db: 0

scheduler:
  poll_interval_s: 5        # how often the promoter checks for due jobs
  max_jitter_s: 30          # random offset added to every scheduled time
  default_interval_s: 300
  ready_pop_timeout_s: 30   # how long GET /jobs/next blocks
  promote_batch_size: 100   # max jobs promoted per poll tick

logging:
  format: "text"
  log_poll_stats: true      # log when jobs are promoted

Environment variable examples:

SCHED_REDIS_ADDR=redis:6379
SCHED_SERVER_ADDR=:9000
SCHED_SCHEDULER_MAX_JITTER_S=60

Usage

Start the server

./scheduler serve

Manage jobs

Add a single job:

./scheduler job add --url https://example.com/feed --interval 300
./scheduler job add --url https://example.com/feed --interval 60 --meta region=us --meta priority=high

Import jobs from a JSON file:

./scheduler job import --file jobs.json

jobs.json format:

[
  { "url": "https://example.com/a", "interval_s": 300 },
  { "url": "https://example.com/b", "interval_s": 60, "meta": { "region": "eu" } }
]

Bulk imports stagger first-run times evenly across the interval to avoid a thundering herd.

List jobs:

./scheduler job list
./scheduler job list --limit 100

Delete a job:

./scheduler job delete a1b2c3d4e5f6

HTTP API

GET /jobs/next

Long-polls for the next ready job. Blocks up to ready_pop_timeout_s seconds.

Status Meaning
200 OK Job returned as JSON
204 No Content No job became ready within the timeout
503 Service Unavailable Server is shutting down

Response body (200):

{
  "id": "a1b2c3d4e5f6",
  "url": "https://example.com/feed",
  "interval_s": 300,
  "meta": { "region": "us" }
}

POST /jobs/{id}/complete

Report successful execution. The job is rescheduled at now + interval + jitter.

Request body:

{
  "scheduled_at": 1710423000,
  "picked_up_at": 1710423005
}

Both fields are Unix seconds and are optional (use 0 if unavailable). They are used for telemetry only — the scheduler logs duration_ms and wait_ms.

Response: 204 No Content

POST /jobs/{id}/error

Report a failed execution. The job is rescheduled using the same formula as /complete.

Request body:

{
  "error": "connection timeout",
  "scheduled_at": 1710423000,
  "picked_up_at": 1710423005
}

Response: 204 No Content


Both POST endpoints reject bodies larger than 4 KB.

Job IDs

A job's ID is derived deterministically from its URL (truncated SHA-1, 12 hex characters). Importing the same URL twice is safe — ZADD NX ensures the schedule entry is not overwritten.

Redis Data Model

Key Type Contents
scraper:jobs Hash job_id → JSON — all job definitions
scraper:schedule Sorted Set job_id, scored by next-run Unix timestamp
scraper:ready List job IDs ready for immediate execution

Running Tests

go test ./...

Logging

Structured logs are emitted via log/slog. Key log events:

Event Level Fields
job_complete INFO job_id, duration_ms, wait_ms, next_run_at
job_error WARN job_id, error, duration_ms, wait_ms, will_retry_at
poll_stats INFO promoted_count, poll_duration_ms (only when jobs are promoted)
server starting INFO addr
promoter error ERROR error

TODO

  • Queue size will fill up if scraper fails

About

A Redis-backed job scheduler with HTTP API - stores recurring scrape jobs, promotes them when due, and serves them to workers via long-polling.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages