Skip to content

[FEATURE] Bump + publish the llmkube-runtimes coder image as part of the release process #932

Description

@Defilan

Feature Description

Make bumping and publishing the Foreman coder image (ghcr.io/defilantech/llmkube-foreman-agent-coder) a codified part of the LLMKube release process, instead of a manual, easy-to-forget step in a separate repo.

Problem Statement

The in-cluster foreman-agent runs the coder image, which is built in the separate defilantech/llmkube-runtimes repo. That image pins the release it was built from via ARG LLMKUBE_REF in coder/Dockerfile, and is published by pushing a coder-v<version> tag. Today, after every LLMKube release, someone must manually (a) bump ARG LLMKUBE_REF to the new tag and (b) push a matching coder-v<version> tag in llmkube-runtimes.

Because it lives in another repo and is not part of the release checklist, it is easy to forget or to lag. When it lags, deploying the new release fails: the in-cluster foreman-agent hits ImagePullBackOff on the coder image tag that was never published (this happened on the 0.8.23 rollout).

As a maintainer cutting a release, I want the coder image for that version to be built and published automatically (or via a single documented step), so a fresh deploy of the release never lands on a missing coder image.

Proposed Solution

Preferred: on LLMKube release publish, trigger the llmkube-runtimes coder build for the new tag automatically. Options:

  1. Cross-repo dispatch (recommended): the LLMKube release workflow sends a repository_dispatch (or gh workflow run) to llmkube-runtimes with the new v<version>; a workflow there bumps ARG LLMKUBE_REF, commits, and pushes coder-v<version> (which its existing build-coder workflow already turns into the published image). Requires a cross-repo token.
  2. release-please post-release hook: wire the bump into the release automation that already cuts the tag.
  3. Documented release-checklist step (minimum): add a RELEASING.md entry (2-step: bump ARG LLMKUBE_REF, push coder-v<version>), and/or a release-workflow check that fails/warns if coder-v<version> does not exist in llmkube-runtimes after a release.

At minimum ship option 3 so the step is never silently skipped; option 1 is the real fix.

Alternatives Considered

Continuing to do it by hand each release (current state, error-prone). Folding the coder image build into the LLMKube repo itself (rejected: keeping the toolchain/runtime image in llmkube-runtimes keeps the operator repo air-gap-clean and its build fast).

Additional Context

  • Coder image repo: defilantech/llmkube-runtimes, coder/Dockerfile (ARG LLMKUBE_REF), tag pattern coder-v<version>, workflow build-coder.
  • Related failure mode: in-cluster foreman-agent ImagePullBackOff on a coder tag that was never published (0.8.23).
  • The llmkubelab Ansible deploy pins the in-cluster agent image tag to the release version, so the coder image must exist at :<version> before a deploy.

Priority

  • High - Would significantly improve my workflow

Willingness to Contribute

  • Yes, I can submit a PR

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions