From cb73aeff2553ec37216fb04fb7d4a21377f5c669 Mon Sep 17 00:00:00 2001 From: Barak Korren Date: Thu, 18 Jun 2026 13:03:24 +0300 Subject: [PATCH] docs(adr): add polling-based work discovery ADR Record the per-repo polling subsystem: pluggable poll and invocation drivers, Jira optimistic locking, work item context handoff to agents, driver-scoped trigger definitions, and parity with event-driven dispatch routing. Update architecture.md and roadmap.md cross-references. Fixes fullsend-ai/fullsend#2263 Signed-off-by: Barak Korren Co-authored-by: Cursor --- ...sed-work-discovery-and-agent-invocation.md | 789 ++++++++++++++++++ docs/architecture.md | 3 +- docs/roadmap.md | 2 +- 3 files changed, 792 insertions(+), 2 deletions(-) create mode 100644 docs/ADRs/0049-polling-based-work-discovery-and-agent-invocation.md diff --git a/docs/ADRs/0049-polling-based-work-discovery-and-agent-invocation.md b/docs/ADRs/0049-polling-based-work-discovery-and-agent-invocation.md new file mode 100644 index 000000000..538c52e86 --- /dev/null +++ b/docs/ADRs/0049-polling-based-work-discovery-and-agent-invocation.md @@ -0,0 +1,789 @@ +--- +title: "49. Polling-based work discovery and agent invocation" +status: Accepted +relates_to: + - agent-architecture + - agent-infrastructure + - operational-observability +topics: + - polling + - jira + - dispatch + - drivers + - cli + - per-repo +--- + +# 49. Polling-based work discovery and agent invocation + +Date: 2026-06-18 + +## Status + +Accepted + +## Context + +Fullsend's primary dispatch path is **event-driven**: the target repo's shim +workflow reacts to issue comments, label changes, and pull-request events and +routes them through `reusable-dispatch.yml` to stage workflows +([ADR 0002](0002-initial-fullsend-design.md), +[ADR 0033](0033-per-repo-installation-mode.md), +[ADR 0041](0041-synchronous-workflow-call-event-dispatch.md)). That model works +well when the git forge delivers timely, reliable webhooks to the repo where +Fullsend is installed. + +Many teams using **per-repo installation mode** track work in systems that are +not the git forge — Jira is the most common example. Issues may live in Jira +while code and Fullsend configuration live in a single GitHub or GitLab repo. +Even on the forge itself, webhook delivery can be delayed, dropped, or +misconfigured; polling provides a **pull-based complement** that does not +depend on inbound webhook infrastructure. + +We need a mechanism scoped to **per-repo mode** that: + +1. **Discovers** candidate work items from one or more remote systems on a + schedule. +2. **Decides** whether each item warrants an agent invocation (slash command, + invoke label, or equivalent signal). +3. **Invokes** agents through the target repo's existing GitHub Actions + workflows (`fullsend.yml` → `reusable-dispatch.yml` → stage workflows). +4. **Coordinates** safely when multiple poll processes run in parallel or when + a poller crashes mid-cycle. + +The design must be **extensible**: Jira is the first poll source; GitHub and +GitLab issue queries may follow as additional poll drivers. Agent invocation +initially targets GitHub Actions workflows in the **target repo**; GitLab CI +pipelines and local agent runs may be added later. + +**Out of scope:** Per-org installation mode — no `.fullsend` config repo, +enrolled-repo shims, cross-repo `workflow_call` dispatch, or org-level polling +across multiple repos. Polling is a per-repo concern only. + +**Initial delivery vs extensibility:** The first implementation targets **Jira +polling in per-repo mode**. Poll and invocation driver interfaces are designed +so GitHub, GitLab, and additional sources can be added later without redesign, +but those drivers are not part of the initial scope. + +## Options + +### Option A: Extend webhook-only dispatch + +Add Jira (and other) webhooks that translate remote events into the target +repo's forge dispatch path. + +- Pro: Reuses the event-driven stack; near-real-time when webhooks work. +- Con: Requires webhook infrastructure per source system; Jira webhook setup is + org-specific and brittle; does not help when webhooks are unavailable or when + work items are not forge-native issues. + +### Option B: Central orchestrator with a shared work queue + +A single long-lived service polls all sources, enqueues work in a database, and +dispatches agents. + +- Pro: Strong locking and deduplication; one place to observe queue depth. +- Con: New operational component (database, HA, deployment); single point of + failure unless clustered; diverges from Fullsend's repo-as-coordination-layer + theme ([ADR 0002](0002-initial-fullsend-design.md)). + +### Option C: Pluggable poll drivers with source-native optimistic locking (recommended) + +Stateless `fullsend poll` invocations (scheduled externally) run pluggable +**poll drivers** that discover work and **invocation drivers** that trigger +agents in the target repo. Coordination state lives on the work items +themselves (Jira entity properties) rather than in a central queue. + +- Pro: No new long-lived service required initially; parallel poll cycles are + safe; crash recovery via stale-lock expiry; drivers compose (Jira + GitHub + + GitLab in one repo config). +- Con: Lock semantics are driver-specific; Jira entity-property locking adds + API calls and requires careful tuning of stale thresholds. + +## Decision + +Adopt **Option C**: a **driver-based polling subsystem** exposed as a CLI +command, scoped to **per-repo installation mode**, with Jira as the first poll +driver and GitHub Actions in the target repo as the first invocation driver. + +### Scope + +Polling is implemented **only for per-repo mode** +([ADR 0033](0033-per-repo-installation-mode.md)). A single target repository +owns its poll configuration, credentials references, and agent invocation path. +Per-org installations continue to rely on event-driven dispatch only; extending +polling to per-org mode is explicitly deferred and not part of this design. + +### Architecture overview + +``` +┌──────────────────────────────────────────────────────────────────┐ +│ Target repo (per-repo install) │ +│ │ +│ Scheduler (GHA schedule job, cron, k8s CronJob, etc.) │ +│ │ │ +│ ▼ │ +│ fullsend poll ──► Poll orchestrator │ +│ │ │ +│ ┌────────────┼────────────┐ │ +│ ▼ ▼ ▼ │ +│ Poll driver Poll driver Poll driver │ +│ (Jira) (GitHub) (GitLab) [future] │ +│ │ │ │ │ +│ └────────────┼────────────┘ │ +│ ▼ │ +│ Per-item: condition check │ +│ │ │ +│ ▼ │ +│ Invocation driver ──► poll-trigger workflow → stage workflows │ +└──────────────────────────────────────────────────────────────────┘ +``` + +Control flow remains **unidirectional** per [ADR 0016](0016-unidirectional-control-flow.md): +the poll orchestrator discovers work and invokes infrastructure; agents do not +drive the poll loop. + +### CLI entry points + +- **`fullsend poll`** — runs one poll cycle against the configured target repo + and its poll drivers, then exits. Typical schedulers: + - A **scheduled job** in the target repo's `.github/workflows/` that runs + `fullsend poll` (same repo context as the shim). + - External cron or Kubernetes CronJob with credentials to reach Jira and + trigger the target repo's workflows. +- **`fullsend watch`** — **deferred**. A future long-lived command may run poll + cycles on an internal timer. Not part of the initial implementation; the + one-shot `poll` command is sufficient to prove the design. + +The `poll` command operates in per-repo context: it reads configuration from +the target repo's `.fullsend/` directory and invokes workflows in that same +repo. + +### Configuration + +Poll drivers are configured in the target repo's `.fullsend/config.yaml` +([ADR 0033](0033-per-repo-installation-mode.md)). The schema is nested: each +**poll driver entry** owns its connection, queries, optional defaults, and +**agent mappings** (including triggers). Triggers are **not** top-level — they +always live under a specific poll driver's `agent_mappings`. + +```yaml +poll_drivers: + - type: jira + connection: { ... } + queries: [ ... ] + default_actor: { ... } # optional; Jira-shaped (see below) + agent_mappings: + - role: triage + triggers: [ ... ] + + - type: github + connection: { ... } + queries: [ ... ] + default_actor: { ... } # optional; GitHub-shaped + agent_mappings: + - role: code + triggers: [ ... ] +``` + +Each poll driver entry specifies at minimum: + +- **Driver type** (`jira`, and later `github`, `gitlab`, …). +- **Connection** (base URL, credentials reference). +- **Queries** — one or more issue search expressions (JQL for Jira; equivalent + filters for other forges when added). +- **Agent mappings** — nested list of mappings (each keyed by harness `role`), + each with one or more **triggers**. + +Until a system-agnostic trigger design exists on harness files +([ADR 0024](0024-harness-definitions.md)), **invocation conditions live on the +poll driver**. This is an interim placement: poll drivers declare per-mapping +triggers in `.fullsend/config.yaml`; drivers translate those signals to +source-system fields (Jira comments, Jira labels, GitHub comments, etc.). A +future harness trigger schema (not covered by [ADR 0045](0045-forge-portable-harness-schema.md)) +should supersede poll-driver mappings; until then, poll configuration owns +trigger definitions. + +Each agent mapping entry specifies at minimum: + +- **`role`** — pipeline stage to invoke (e.g. `triage`, `code`, `review`), + matching the harness `role` and dispatch routing stage names. +- **`triggers`** — one or more invocation conditions (see **Parity with + event-driven dispatch** below). At least one trigger is required per mapping. + +Trigger types use **system-agnostic names**; the enclosing poll driver maps +them to source-system APIs. **`when`** guards are also system-agnostic (labels, +bot exclusion). **`actor`** authorization is **driver-specific** — its fields +depend on the parent poll driver's `type` and MUST NOT include keys meant for +other drivers (config validation rejects a Jira driver entry that declares +`associations`, or a GitHub entry that declares `groups`). + +| Trigger type | Meaning | +|--------------|---------| +| `slash_command` | A comment whose first line starts with the given command (e.g. `/fs-triage`). | +| `label_added` | The given label was applied to the work item since the last check. | +| `issue_created` | The work item was created since the last check (or since baseline). | +| `issue_updated` | Summary/description (or forge-equivalent fields) changed since the last check. | +| `comment_added` | A new comment was added, subject to optional `when` guards (see below). | + +`comment_added` and `slash_command` triggers support optional **`when`** guards: + +- `when.has_labels` — all listed labels must be present on the issue. +- `when.not_labels` — none of the listed labels may be present. +- `when.exclude_bots` — comment author must not be an automation/bot account. + +**Actor authorization** (who may cause a trigger to fire) uses a driver-shaped +**`actor`** block on the trigger, or the poll driver's **`default_actor`**. +See **Actor authorization and trust boundaries** below. + +Example — **Jira** poll driver, auto-triage on human reply to a `needs-info` +issue: + +```yaml +poll_drivers: + - type: jira + connection: { ... } + queries: [ ... ] + agent_mappings: + - role: triage + triggers: + - type: comment_added + when: + has_labels: [needs-info] + not_labels: [feature] + exclude_bots: true + actor: + groups: [fullsend-operators] # optional tighten; omit for default + or_reporter: true +``` + +Example — **GitHub** poll driver, same pattern: + +```yaml +poll_drivers: + - type: github + connection: { ... } + queries: [ ... ] + agent_mappings: + - role: triage + triggers: + - type: comment_added + when: + has_labels: [needs-info] + not_labels: [feature] + exclude_bots: true + actor: + associations: [owner, member, collaborator, contributor] + or_reporter: true +``` + +At least one trigger is required per mapping. A mapping may combine multiple +triggers (OR semantics — any matching trigger since the last check is positive). + +Trigger-level **`actor`** overrides poll-driver defaults. Omitted `actor` uses +the driver default (see below). + +### Actor authorization and trust boundaries + +#### GitHub (event-driven today) + +`reusable-dispatch.yml` defines `is_authorized()` as comment author association +`OWNER`, `MEMBER`, or `COLLABORATOR` — users explicitly invited to the org or +repo. That is a **trust boundary** between invited collaborators and drive-by +contributors (`CONTRIBUTOR`), one-off interactors (`NONE`), or outsiders. + +Authorization is **inconsistent across stages today**: + +| Trigger | `is_authorized` required? | +|---------|---------------------------| +| `issues.opened` / `edited` / `labeled` | No (system events) | +| `/fs-triage`, `/fs-code`, `/fs-review` | **No** (today) | +| `/fs-fix`, `/fs-retro`, `/fs-prioritize` | **Yes** | +| needs-info auto-triage comment | Partial — non-bot; `association != NONE` **or** issue reporter | + +A **pending ADR** will require the invited-collaborator check (or stricter +policy) on all human-initiated triggers. Poll drivers for GitHub MUST follow +the same rules once that ADR lands; until then, mappings SHOULD document the +intended boundary per trigger. + +For **GitHub poll drivers** (`type: github`), **`actor`** on a trigger (or +`default_actor` on the driver) uses GitHub-only fields: + +```yaml +actor: + associations: [owner, member, collaborator] # maps to author_association + or_reporter: true # also allow issue creator +``` + +When `actor` is omitted on a human-initiated trigger, the GitHub poll driver +defaults to **`associations: [owner, member, collaborator]`** (the +`is_authorized` set). System triggers (`issue_created`, `issue_updated`, +`label_added` from automation) are not gated on comment author. + +#### Jira (no native equivalent) + +Jira has **no direct equivalent** to GitHub `author_association`. Only +authenticated users with project permission can comment; anonymous users cannot. +That is a **weaker default boundary** than GitHub's invited-collaborator check — +any Jira user who can comment on the issue can invoke slash commands unless +tightened. + +Projects that need a stronger boundary MUST restrict triggers to specific Jira +**groups** and/or project **roles**. Under a **`type: jira`** poll driver, +**`actor`** uses Jira-only fields: + +```yaml +actor: + groups: [fullsend-operators, platform-team] # user in any listed group + project_roles: [Administrators, Developers] # user's project role in any + or_reporter: true # also allow issue reporter +``` + +Rules (Jira poll driver only): + +- **`groups`** — comment/changelog actor must belong to at least one named Jira + group (global or project-scoped, per Jira configuration). +- **`project_roles`** — actor must hold at least one listed project role on the + issue's project. +- **`or_reporter`** — issue creator satisfies the actor check even if group/role + checks fail (mirrors GitHub needs-info / reporter escape hatch). +- **Omitted `actor`** on a human-initiated trigger — any authenticated user + who can comment on the issue (Jira default permission model). + +Poll drivers resolve group/role membership via Jira REST API (user lookup + +group/role membership endpoints) at condition-check time. + +#### Poll-driver defaults + +A poll driver entry may set **`default_actor`** (same schema as trigger-level +`actor` for that driver's `type`) applied to all mappings unless overridden per +trigger: + +```yaml +poll_drivers: + - type: jira + connection: ... + queries: [...] + default_actor: + groups: [fullsend-trusted] + agent_mappings: + - role: triage + triggers: + - type: slash_command + command: /fs-triage + # inherits default_actor.groups + - type: issue_created + # no actor gate — not human-initiated +``` + +Driver-level defaults let operators tighten an entire Jira project without +repeating group/role lists on every mapping. + +### Parity with event-driven dispatch + +Today, the shim listens for GitHub webhooks +(`shim-per-repo.yaml` / `fullsend.yaml`): + +| Webhook | Actions | +|---------|---------| +| `issues` | `opened`, `edited`, `labeled` | +| `issue_comment` | `created` | +| `pull_request_target` | `opened`, `synchronize`, `ready_for_review`, `closed` | +| `pull_request_review` | `submitted` | + +`reusable-dispatch.yml` routes those events to agent stages as follows. + +**Issue events** (in scope for Jira poll drivers): + +| GitHub event | Dispatch route | Poll trigger equivalent | Jira equivalent | +|--------------|----------------|-------------------------|-----------------| +| `issues.opened` | `triage` | `issue_created` → triage | Issue created | +| `issues.edited` | `triage` | `issue_updated` → triage | Summary/description (or other configured fields) updated | +| `issues.labeled` + `ready-to-code` | `code` | `label_added: ready-to-code` | Label added | +| `issues.labeled` + `ready-for-review` | `review` | `label_added: ready-for-review` | Label added | +| `issue_comment` + `/fs-triage` | `triage` | `slash_command: /fs-triage` | Comment added with command | +| `issue_comment` + `/fs-code` *(no linked PR)* | `code` | `slash_command: /fs-code` + work item has no linked change proposal | Comment added; gate on no linked dev/PR if applicable | +| `issue_comment` + `/fs-review` | `review` | `slash_command: /fs-review` | Comment added with command | +| `issue_comment` + `/fs-fix` *(linked PR, authorized human, non-bot)* | `fix` | `slash_command` + linked PR + non-bot + GitHub `actor.associations` | Comment + linked dev item + optional Jira `actor` | +| `issue_comment` + `/fs-retro` or `/fullsend retro` *(authorized, non-bot)* | `retro` | `slash_command` + non-bot + GitHub `actor.associations` | Comment + optional Jira `actor` tighten | +| `issue_comment` + `/fs-prioritize` *(authorized, non-bot)* | `prioritize` | `slash_command` + non-bot + GitHub `actor.associations` | Comment + optional Jira `actor` tighten | +| `issue_comment` *(non-command, `needs-info`, not `feature`, non-bot, member or reporter)* | `triage` | `comment_added` guards + non-bot + GitHub `actor` + `or_reporter` | Comment on labelled issue + optional Jira `actor` | + +**Pull-request events** (not native to Jira issues — poll only when the driver +can observe linked forge PRs, e.g. via Jira development panel or a GitHub poll +driver on the code repo): + +| GitHub event | Dispatch route | Notes for Jira | +|--------------|----------------|----------------| +| `pull_request_target` opened / synchronize / ready_for_review | `review` | No Jira native equivalent; use GitHub/GitLab poll driver or dev-link integration | +| `pull_request_target` closed (merged) | `retro` | Same | +| `pull_request_review` changes_requested from review bot | `fix` | Same; includes `fullsend-no-fix` / `fullsend-fix` label gates | + +Initial Jira poll-driver **default agent mappings** should mirror the issue +rows above (triage on create/edit, label-driven code/review, slash commands, +needs-info auto-triage). PR-linked stages (`fix`, `retro` from merge, review +from PR push) remain the responsibility of forge poll drivers unless Jira +issues carry resolvable links to change proposals. + +Poll drivers evaluate **changes since the last check** (not full event replay): +a mapping fires when its trigger condition became true due to a change in that +window. Label and comment triggers use changelog/comment APIs; `issue_updated` +compares field revision timestamps or changelog entries. + +Multiple poll drivers may be active in the same repo configuration (e.g. Jira +for planning issues and GitHub for repo-native issues). The orchestrator runs +all enabled poll drivers during each cycle. + +### Poll driver interface + +Each poll driver implements a common contract: + +1. **Discover** — return candidate work items matching configured queries. +2. **Lock** — attempt to acquire an exclusive lock on a work item (driver-specific). +3. **Check condition** — determine whether an agent should run for a locked item. +4. **Unlock** — release the lock (on failure, cancellation, or agent completion). +5. **Hand off lock** — pass lock metadata to the invocation driver for the + agent runner to maintain during execution (driver-specific). + +Drivers are responsible for mapping configured triggers to source-system +changelog fields (comments, labels, field updates, etc.). The orchestrator +handles concurrency, retries, and invocation dispatch. + +All outbound API calls (poll driver and invocation driver) use **retry with +backoff** on transient failures (rate limits, 5xx, network errors). + +### Invocation driver interface + +Each invocation driver triggers agent execution in the **target repo** for a +matched work item: + +- **Initial driver: `github-actions`** — dispatches a dedicated + **poll-trigger workflow** in the target repo (e.g. + `.github/workflows/fullsend-poll.yml`) via `workflow_dispatch`. The per-repo + event shim (`.github/workflows/fullsend.yaml`) does **not** listen on + `workflow_dispatch`; poll invocation uses a separate entry point per + [ADR 0041](0041-synchronous-workflow-call-event-dispatch.md) (non-event + `workflow_dispatch` is allowed). The poll-trigger workflow calls the + appropriate stage reusable workflow directly with explicit `stage`, + `work_item`, and `poll` inputs — it does not re-run `reusable-dispatch.yml` + routing because the poller already determined the stage + ([ADR 0033](0033-per-repo-installation-mode.md), + [ADR 0041](0041-synchronous-workflow-call-event-dispatch.md)). +- **Future drivers:** GitLab CI pipeline trigger in the target repo + ([ADR 0028](0028-gitlab-support.md)), local OpenShell/agent run on the host + running `fullsend poll`. + +Invocation is **asynchronous** from the poll cycle: the poll driver locks the +item, schedules invocation, and **hands off lock maintenance to the agent +runner** (see Work item context below). The one-shot `fullsend poll` process +does not stay alive for the duration of the agent run. + +### Work item context + +#### Current mechanism (event-driven, GitHub-only) + +Today, issue identity flows through the dispatch pipeline as a minimal +**`event_payload`** JSON blob built from `GITHUB_EVENT_PATH` in +`reusable-dispatch.yml`: + +```json +{ + "issue": { "number": 42, "html_url": "https://github.com/org/repo/issues/42" }, + "pull_request": { ... }, + "comment": { "body": "..." } +} +``` + +Stage workflows (`reusable-triage.yml`, `reusable-code.yml`, …) pass fields +from this payload into the job environment before `fullsend run`: + +| Variable | Source (today) | Consumers | +|----------|----------------|-----------| +| `GITHUB_ISSUE_URL` | `event_payload.issue.html_url` | Harness `runner_env`, pre/post scripts, agent prompts | +| `ISSUE_NUMBER` | `event_payload.issue.number` | `pre-code.sh`, code harness | +| `REPO_FULL_NAME` | `inputs.source_repo` | `pre-code.sh`, code harness | + +Pre-scripts (e.g. `pre-triage.sh`, `pre-code.sh`) **validate GitHub URL +format** and derive `REPO` / `ISSUE_NUMBER` from `GITHUB_ISSUE_URL`. Agent +definitions and skills assume a GitHub issue URL. This path does not carry +poll-lock metadata. + +#### Expanded mechanism (poll and non-GitHub sources) + +Polling and future non-forge work sources require a **system-agnostic work +item reference** alongside the existing GitHub-shaped fields where applicable. + +The invocation driver extends `event_payload` (or an equivalent input to a +poll-trigger workflow) with: + +```json +{ + "work_item": { + "source": "jira", + "url": "https://example.atlassian.net/browse/PROJ-123", + "key": "PROJ-123" + }, + "poll": { + "lock_id": "", + "lock_driver": "jira", + "lock_property": "fullsend.poll.lock" + }, + "issue": null, + "pull_request": null, + "comment": null +} +``` + +For GitHub-native poll sources, `work_item.source` is `github` and `url` / +`key` mirror `issue.html_url` / issue number; the `issue` block may remain +populated for backward compatibility with existing stage workflows. + +Stage workflows and `fullsend run` receive these as **runner environment +variables** (set by the workflow step or `setup-agent-env.sh` prefix mapping): + +| Variable | Purpose | +|----------|---------| +| `FULLSEND_WORK_ITEM_URL` | Canonical URL for the work item in its source system | +| `FULLSEND_WORK_ITEM_SOURCE` | Source identifier (`github`, `jira`, `gitlab`, …) | +| `FULLSEND_WORK_ITEM_KEY` | Stable key within the source (`42`, `PROJ-123`, …) | +| `FULLSEND_POLL_LOCK_ID` | Poller UUID that owns the optimistic lock | +| `FULLSEND_POLL_LOCK_DRIVER` | Poll driver that wrote the lock (`jira`, …) | +| `FULLSEND_POLL_LOCK_PROPERTY` | Entity-property key used for the lock | + +`GITHUB_ISSUE_URL`, `ISSUE_NUMBER`, and `REPO_FULL_NAME` remain set when +`work_item.source` is `github` so existing harnesses and scripts continue to +work without change. Agents operating on Jira (or other) items use +`FULLSEND_WORK_ITEM_*` and source-appropriate API servers / skills; pre-scripts +gain source-aware branches or parallel `pre-*-jira.sh` scripts. + +Harness `runner_env` should reference `FULLSEND_WORK_ITEM_URL` for new +source-agnostic agents; forge-specific blocks +([ADR 0045](0045-forge-portable-harness-schema.md)) continue to carry tokens +and URLs for the APIs each agent calls. + +#### Poll lock handoff + +Because `fullsend poll` is one-shot, **lock maintenance moves to the agent +runner** once invocation is scheduled successfully: + +1. **Poller** writes the lock (UUID + timestamp) before dispatch. +2. **Pre-invoke verification** — poller re-reads the lock; aborts if lost. +3. **Invocation driver** passes `poll.lock_id` (and related fields) in + `event_payload` / runner env. +4. **Agent runner** (`fullsend run` host process) starts a background routine + that refreshes the lock timestamp on the work item for the duration of the + run, using `FULLSEND_POLL_LOCK_*` to locate and update the correct entity + property. Refresh interval is configurable; default aligns with half the + stale-lock threshold. +5. **Lock removal** — the runner removes the lock in a `defer` / teardown path + after the agent exits (success or failure), or the harness `post_script` + clears it when post-processing completes. If the runner cannot reach the + source API, it logs and relies on stale-lock expiry. + +The runner must verify `FULLSEND_POLL_LOCK_ID` still matches the work item's +lock property **before** starting the agent (mirroring pre-invoke verification +in the poller). If the lock was lost, the run aborts without invoking the LLM. + +Poll-triggered workflow dispatch passes `stage` explicitly to the poll-trigger +workflow; it does not depend on `reusable-dispatch.yml` event routing. + +### Agent invocation condition check + +Before invoking an agent for a work item, the poll driver evaluates whether a +**new** triggering change occurred since the last check. **Initially**, trigger +definitions come from the poll driver's **agent mappings** in +`.fullsend/config.yaml` (see Configuration above). + +For each locked work item, the orchestrator evaluates the poll driver's +configured agent mappings against changes on the item since the last poll +check. When multiple mappings match, each match schedules a separate invocation +(subject to repo policy and role enablement in `config.yaml`). + +Per work item: + +1. Read an **entity property** storing the timestamp of the last condition + check. If absent, treat the baseline as **issue creation time**. On first + deployment against an existing backlog, operators SHOULD seed `lastCheck` + (or narrow JQL to recently changed issues) to avoid a one-time thundering + herd of `issue_created` / `issue_updated` triggers. +2. Inspect changes since that timestamp against **each agent mapping's + triggers** on the poll driver. The check is **positive for a mapping** when + any trigger matches a change in that window (see **Parity with event-driven + dispatch**): e.g. `issue_created`, `issue_updated`, `label_added`, + `slash_command` comment, or guarded `comment_added`. +3. Apply **dispatch-equivalent gates** before scheduling: e.g. `/fs-code` only + when no linked change proposal exists; `/fs-fix` only when one exists; + **actor authorization** using the enclosing poll driver's actor schema; + bot exclusion on comments. +4. Write an entity property with the timestamp of the **latest relevant change + on the issue at the time of the check** (not merely "now"), so repeated polls + without new signals do not re-trigger. + +Condition checks run **concurrently** with the poll-cycle loop and with one +another on a per-issue basis. + +**Future direction:** Harness YAML does not yet declare system-agnostic +triggers; event-driven dispatch hardcodes equivalent routing in +`reusable-dispatch.yml`. A follow-on design should add triggers to harness +files and migrate both dispatch routing and poll drivers to read them, removing +duplicated agent mappings from poll configuration. + +### Jira poll driver — optimistic locking + +Jira coordination uses **issue entity properties** and **optimistic locking**. +Property keys are namespaced (e.g. `fullsend.poll.lock`, +`fullsend.poll.lastCheck`) to avoid collisions. + +Each poll process: + +1. **Assigns a UUID** to itself at startup (one UUID per `fullsend poll` + invocation). +2. **Queries** for the top **M** unlocked issues matching configured JQL + queries. "Unlocked" means no active lock property, or a lock property whose + timestamp exceeds the stale threshold (see below). +3. **Randomly selects N** issues from the M candidates (`N < M`). Random + selection spreads load across concurrent poll invocations and reduces + thundering-herd contention when multiple pollers query the same JQL. +4. **Attempts to lock** each selected issue by writing a property containing the + poller UUID and a timestamp. +5. **Waits** a random interval between **500 ms and 1500 ms** (jitter). +6. **Re-queries** all issues that were lock candidates (the N issues). +7. For each re-queried issue: + - **7.1.** If the lock property timestamp is **stale**, remove the lock. + - **7.2.** If the lock property still contains **this poller's UUID** and the + invocation condition check is positive, schedule an agent invocation. + +**Recommended defaults** (overridable in `.fullsend/config.yaml`): + +| Parameter | Default | Rationale | +|-----------|---------|-----------| +| **M** | `50` | Matches Jira Cloud's default page size; one search API call per query under typical rate limits (~100 req/min). | +| **N** | `5` | With `N << M`, ~10 concurrent pollers can run without routinely selecting the same issues; keeps per-cycle API write volume modest. | +| **Stale lock threshold** | `300s` | Covers GHA queue latency under load plus time for the agent runner to start lock refresh; configurable per deployment. The runner refresh interval SHOULD be ≤ half this value. | + +JQL queries should include a filter excluding issues with a **fresh** lock +property where possible, so the search API returns mostly unlocked candidates. + +### Lock lifecycle during agent execution + +Once an invocation is scheduled for a locked issue: + +1. **Pre-invoke verification** — immediately before calling the invocation + driver, re-read the lock property. If the UUID no longer matches or the lock + is stale, **abort** invocation and do not dispatch the agent. +2. **Lock handoff** — include `poll.lock_id` and related fields in the + invocation payload so the **agent runner** can maintain and release the + lock (see Work item context). +3. **Lock removal on invocation failure** — if the invocation driver fails to + schedule the agent (API error after retries), the poller removes the lock so + another poll cycle can retry. + +Lock refresh during the agent run and lock removal on successful completion are +the **agent runner's** responsibility, not the poller's. + +### Concurrency model + +Within a single `fullsend poll` invocation: + +- The **poll-cycle loop** (discover → lock → verify → schedule) runs for each + configured driver. +- **Condition checks** and **invocation attempts** for individual issues run in + parallel goroutines, bounded by a configurable concurrency limit. +- **Lock maintenance** during agent execution runs in the agent runner process, + not in the poller. + +Multiple `fullsend poll` processes may run concurrently (e.g. overlapping cron +schedules, manual runs during scheduled polls). Jira optimistic locking ensures +at most one poller owns a given issue at dispatch time. + +## Consequences + +### Positive + +- Per-repo installations can trigger Fullsend agents from Jira (and future + non-forge sources) without webhook infrastructure on those systems. +- **Driver composition** lets one repo poll Jira for planning issues and + GitHub for repo-native issues in the same `.fullsend/config.yaml`. +- **Parallel poll cycles** are safe via source-native locking; crashed pollers + recover automatically when locks go stale. +- **External scheduling** keeps the initial implementation simple — no long-lived + daemon required. +- Retries on API calls improve robustness against transient rate limits and + network failures. +- Polling reuses the existing per-repo workflow chain; no new cross-repo + dispatch path is introduced. + +### Negative / risks + +- **Duplicated trigger config (interim)** — until harness-level triggers exist, + invocation conditions must be maintained in poll-driver agent mappings + separately from `reusable-dispatch.yml` routing; drift between the two paths + is possible. +- **Per-org gap** — organizations using per-org installation mode cannot use + polling until a separate design extends it; this is intentional scope reduction. +- **Polling latency** — work is discovered at scheduler granularity, not + real-time. Shorter intervals increase API load. +- **Jira API cost** — each cycle consumes search, read, and property-write + quota; M, N, and schedule interval must be tuned per repo. +- **Optimistic-lock races** — the 500–1500 ms wait and re-query add latency per + cycle; mis-tuned stale thresholds can cause duplicate invocations (too short) + or stuck issues (too long). **Agent stages invoked via polling MUST be + idempotent** (safe to run twice for the same work item) as a defense in depth + when duplicate dispatch occurs despite locking. +- **Driver-specific locking** — GitHub/GitLab poll drivers will need their own + lock primitives (labels, issue comments, or forge-specific metadata); the + orchestrator contract is shared but implementations differ. +- **Jira trust boundary gap** — Jira's default (any comment-capable user) is + weaker than GitHub's invited-collaborator model; teams needing parity must + configure `actor.groups` / `project_roles` on the Jira poll driver explicitly. +- **Authorization drift** — pending GitHub dispatch ADR will tighten stages + that today skip `is_authorized`; poll mappings must track that ADR. +- **Jira changelog fidelity** — mapping `issue_updated`, label, and comment + triggers requires reliable Jira changelog/comment APIs; field-level edits, + author identity, and group/role membership lookups must match dispatch semantics. +- **Work item abstraction gap** — harnesses, pre-scripts, and agent prompts are + GitHub-centric today; Jira and other sources require `FULLSEND_WORK_ITEM_*` + plumbing and source-aware scripts or API servers. + +## Open questions + +- **Per-driver actor schema** — validation rules for `actor` / `default_actor` + fields per poll driver `type`, and alignment with the pending GitHub + authorization ADR. +- Exact **`.fullsend/config.yaml` schema** for poll driver entries and agent + mappings (fields, credential references, per-driver M/N/stale overrides). +- **Harness trigger schema (future)** — system-agnostic invocation conditions on + harness files; migration of `reusable-dispatch.yml` and poll drivers to + consume the same definitions, superseding poll-driver agent mappings. +- **Credential placement** — the repo running `fullsend poll` (or its scheduled + workflow) needs credentials for both Jira and workflow dispatch; these live + alongside the repo's existing Fullsend secrets + ([ADR 0033](0033-per-repo-installation-mode.md)). +- Whether poll-triggered invocations pass **synthetic event context** into + `reusable-dispatch.yml` or use a dedicated `workflow_dispatch` input shape + with explicit `stage` and `work_item` fields. +- **`event_payload` / env schema** — exact field names and backward-compat rules + for `work_item` and `poll` blocks; migration of stage workflows off + hardcoded `GITHUB_ISSUE_URL` extraction. +- **Runner lock refresh implementation** — interval, Jira API calls from + `fullsend run`, and interaction with harness `post_script` for lock release. +- Whether **GitHub poll driver** locks via issue labels, assignees, or a + dedicated bot comment — deferred until that driver is implemented. +- How the invocation driver **learns agent completion** for lock release when + dispatch is asynchronous (poll GHA run status, webhook callback, or agent + self-report via entity property). +- **Metrics and observability** — poll cycle duration, lock contention rate, + invocation success/failure counters ([#896](https://github.com/fullsend-ai/fullsend/issues/896)). +- **`fullsend watch`** scheduling semantics (internal timer vs long poll) when + implemented. + +## References + +- [ADR 0002 — Initial Fullsend Design](0002-initial-fullsend-design.md) +- [ADR 0024 — Harness definitions](0024-harness-definitions.md) +- [ADR 0045 — Forge-portable harness schema](0045-forge-portable-harness-schema.md) +- [ADR 0016 — Unidirectional control flow](0016-unidirectional-control-flow.md) +- [ADR 0028 — GitLab Support Architecture](0028-gitlab-support.md) +- [ADR 0033 — Per-repo installation mode](0033-per-repo-installation-mode.md) +- [ADR 0041 — Synchronous workflow_call for event-driven dispatch](0041-synchronous-workflow-call-event-dispatch.md) +- [ADR 0042 — Use /fs- prefix for all slash commands](0042-fs-prefix-for-slash-commands.md) diff --git a/docs/architecture.md b/docs/architecture.md index 92b92aed8..ab7d9b4b1 100644 --- a/docs/architecture.md +++ b/docs/architecture.md @@ -145,10 +145,11 @@ The existing design principle is that [the repo is the coordinator](problems/age **Decided:** - Event-driven stage dispatch runs synchronously via `workflow_call` to preserve run correlation in the GitHub Actions UI (see [ADR 0041](ADRs/0041-synchronous-workflow-call-event-dispatch.md)). +- Per-repo **polling** complements webhook dispatch: `fullsend poll` discovers work from remote systems (Jira first) on a schedule, uses source-native optimistic locking, and hands off to agent runners in the target repo (see [ADR 0049](ADRs/0049-polling-based-work-discovery-and-agent-invocation.md)). Initial scope is per-repo mode only. **Open questions:** -- Is GitHub's event system sufficient, or do we need additional coordination logic (e.g. to prevent two code agents from picking up the same issue)? +- ~~Is GitHub's event system sufficient, or do we need additional coordination logic (e.g. to prevent two code agents from picking up the same issue)?~~ Partially decided for per-repo Jira polling: entity-property locks and runner lock refresh (ADR 0049). Event-driven GitHub dispatch remains the primary path for forge-native triggers; duplicate protection still relies on label/state conventions plus agent idempotency. - How does work assignment interact with the backlog/priority agent described in [agent-architecture.md](problems/agent-architecture.md)? - What happens when work needs to be cancelled, retried, or reassigned? - Does the coordinator need state (a queue, a lock, a claim system), or can it be stateless and event-driven? diff --git a/docs/roadmap.md b/docs/roadmap.md index 9f85c06af..df67d3061 100644 --- a/docs/roadmap.md +++ b/docs/roadmap.md @@ -132,7 +132,7 @@ With feature refinement establishing the pattern, extend agent capabilities deep Examples of work that could move this forward: -- JIRA trigger model ([#2263](https://github.com/fullsend-ai/fullsend/issues/2263)) +- JIRA trigger model ([#2263](https://github.com/fullsend-ai/fullsend/issues/2263)) — design in [ADR 0049](ADRs/0049-polling-based-work-discovery-and-agent-invocation.md) - Per-agent JIRA support: triage ([#2264](https://github.com/fullsend-ai/fullsend/issues/2264)), code ([#2265](https://github.com/fullsend-ai/fullsend/issues/2265)), prioritize ([#2266](https://github.com/fullsend-ai/fullsend/issues/2266)), retro ([#2267](https://github.com/fullsend-ai/fullsend/issues/2267)), review ([#2268](https://github.com/fullsend-ai/fullsend/issues/2268)), refine ([#1341](https://github.com/fullsend-ai/fullsend/issues/1341)) - JIRA identity and credential management ([#2269](https://github.com/fullsend-ai/fullsend/issues/2269))