Scheduler

A Redis-backed HTTP service for managing recurring scrape jobs. Workers long-poll for jobs, execute them, then report success or failure. The scheduler handles timing, jitter, and rescheduling.

Architecture

┌─────────────┐   GET /jobs/next    ┌───────────┐
│   Workers   │ ◄─────────────────► │  HTTP API │
│  (scrapers) │   POST /jobs/*/     └─────┬─────┘
└─────────────┘   complete|error          │
                                    ┌─────▼─────┐
                                    │  Scheduler │
                                    │   (Go)     │
                                    └─────┬─────┘
                                    ┌─────▼─────┐
                                    │   Redis    │
                                    │  3 keys    │
                                    └───────────┘

A background goroutine polls Redis every poll_interval_s seconds and promotes due jobs from the schedule (sorted set) to the ready queue (list). Workers block on GET /jobs/next and receive jobs as they become available.

Requirements

Go 1.22+
Redis 6+

Installation

go build -o scheduler .

Configuration

Configuration is loaded from config.yaml in the current directory or $HOME/.scheduler/config.yaml. Every value can be overridden with an environment variable using the SCHED_ prefix (dots become underscores).

server:
  addr: ":8081"
  read_timeout_s: 5
  write_timeout_s: 35   # must exceed scheduler.ready_pop_timeout_s

redis:
  addr: "localhost:6379"
  password: ""
  db: 0

scheduler:
  poll_interval_s: 5        # how often the promoter checks for due jobs
  max_jitter_s: 30          # random offset added to every scheduled time
  default_interval_s: 300
  ready_pop_timeout_s: 30   # how long GET /jobs/next blocks
  promote_batch_size: 100   # max jobs promoted per poll tick

logging:
  format: "text"
  log_poll_stats: true      # log when jobs are promoted

Environment variable examples:

SCHED_REDIS_ADDR=redis:6379
SCHED_SERVER_ADDR=:9000
SCHED_SCHEDULER_MAX_JITTER_S=60

Usage

Start the server

./scheduler serve

Manage jobs

Add a single job:

./scheduler job add --url https://example.com/feed --interval 300
./scheduler job add --url https://example.com/feed --interval 60 --meta region=us --meta priority=high

Import jobs from a JSON file:

./scheduler job import --file jobs.json

jobs.json format:

[
  { "url": "https://example.com/a", "interval_s": 300 },
  { "url": "https://example.com/b", "interval_s": 60, "meta": { "region": "eu" } }
]

Bulk imports stagger first-run times evenly across the interval to avoid a thundering herd.

List jobs:

./scheduler job list
./scheduler job list --limit 100

Delete a job:

./scheduler job delete a1b2c3d4e5f6

HTTP API

`GET /jobs/next`

Long-polls for the next ready job. Blocks up to ready_pop_timeout_s seconds.

Status	Meaning
`200 OK`	Job returned as JSON
`204 No Content`	No job became ready within the timeout
`503 Service Unavailable`	Server is shutting down

Response body (200):

{
  "id": "a1b2c3d4e5f6",
  "url": "https://example.com/feed",
  "interval_s": 300,
  "meta": { "region": "us" }
}

`POST /jobs/{id}/complete`

Report successful execution. The job is rescheduled at now + interval + jitter.

Request body:

{
  "scheduled_at": 1710423000,
  "picked_up_at": 1710423005
}

Both fields are Unix seconds and are optional (use 0 if unavailable). They are used for telemetry only — the scheduler logs duration_ms and wait_ms.

Response: 204 No Content

`POST /jobs/{id}/error`

Report a failed execution. The job is rescheduled using the same formula as /complete.

Request body:

{
  "error": "connection timeout",
  "scheduled_at": 1710423000,
  "picked_up_at": 1710423005
}

Response: 204 No Content

Both POST endpoints reject bodies larger than 4 KB.

Job IDs

A job's ID is derived deterministically from its URL (truncated SHA-1, 12 hex characters). Importing the same URL twice is safe — ZADD NX ensures the schedule entry is not overwritten.

Redis Data Model

Key	Type	Contents
`scraper:jobs`	Hash	`job_id → JSON` — all job definitions
`scraper:schedule`	Sorted Set	`job_id`, scored by next-run Unix timestamp
`scraper:ready`	List	job IDs ready for immediate execution

Running Tests

go test ./...

Logging

Structured logs are emitted via log/slog. Key log events:

Event	Level	Fields
`job_complete`	INFO	`job_id`, `duration_ms`, `wait_ms`, `next_run_at`
`job_error`	WARN	`job_id`, `error`, `duration_ms`, `wait_ms`, `will_retry_at`
`poll_stats`	INFO	`promoted_count`, `poll_duration_ms` (only when jobs are promoted)
`server starting`	INFO	`addr`
`promoter error`	ERROR	`error`

TODO

Queue size will fill up if scraper fails

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
cmd		cmd
docker-compose		docker-compose
scheduler		scheduler
testdata		testdata
.gitignore		.gitignore
DEMO.md		DEMO.md
README.md		README.md
config.yaml		config.yaml
go.mod		go.mod
go.sum		go.sum
main.go		main.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scheduler

Architecture

Requirements

Installation

Configuration

Usage

Start the server

Manage jobs

HTTP API

`GET /jobs/next`

`POST /jobs/{id}/complete`

`POST /jobs/{id}/error`

Job IDs

Redis Data Model

Running Tests

Logging

TODO

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Scheduler

Architecture

Requirements

Installation

Configuration

Usage

Start the server

Manage jobs

HTTP API

GET /jobs/next

POST /jobs/{id}/complete

POST /jobs/{id}/error

Job IDs

Redis Data Model

Running Tests

Logging

TODO

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`GET /jobs/next`

`POST /jobs/{id}/complete`

`POST /jobs/{id}/error`

Packages