Feature Description
Make bumping and publishing the Foreman coder image (ghcr.io/defilantech/llmkube-foreman-agent-coder) a codified part of the LLMKube release process, instead of a manual, easy-to-forget step in a separate repo.
Problem Statement
The in-cluster foreman-agent runs the coder image, which is built in the separate defilantech/llmkube-runtimes repo. That image pins the release it was built from via ARG LLMKUBE_REF in coder/Dockerfile, and is published by pushing a coder-v<version> tag. Today, after every LLMKube release, someone must manually (a) bump ARG LLMKUBE_REF to the new tag and (b) push a matching coder-v<version> tag in llmkube-runtimes.
Because it lives in another repo and is not part of the release checklist, it is easy to forget or to lag. When it lags, deploying the new release fails: the in-cluster foreman-agent hits ImagePullBackOff on the coder image tag that was never published (this happened on the 0.8.23 rollout).
As a maintainer cutting a release, I want the coder image for that version to be built and published automatically (or via a single documented step), so a fresh deploy of the release never lands on a missing coder image.
Proposed Solution
Preferred: on LLMKube release publish, trigger the llmkube-runtimes coder build for the new tag automatically. Options:
- Cross-repo dispatch (recommended): the LLMKube release workflow sends a
repository_dispatch (or gh workflow run) to llmkube-runtimes with the new v<version>; a workflow there bumps ARG LLMKUBE_REF, commits, and pushes coder-v<version> (which its existing build-coder workflow already turns into the published image). Requires a cross-repo token.
- release-please post-release hook: wire the bump into the release automation that already cuts the tag.
- Documented release-checklist step (minimum): add a
RELEASING.md entry (2-step: bump ARG LLMKUBE_REF, push coder-v<version>), and/or a release-workflow check that fails/warns if coder-v<version> does not exist in llmkube-runtimes after a release.
At minimum ship option 3 so the step is never silently skipped; option 1 is the real fix.
Alternatives Considered
Continuing to do it by hand each release (current state, error-prone). Folding the coder image build into the LLMKube repo itself (rejected: keeping the toolchain/runtime image in llmkube-runtimes keeps the operator repo air-gap-clean and its build fast).
Additional Context
- Coder image repo:
defilantech/llmkube-runtimes, coder/Dockerfile (ARG LLMKUBE_REF), tag pattern coder-v<version>, workflow build-coder.
- Related failure mode: in-cluster foreman-agent
ImagePullBackOff on a coder tag that was never published (0.8.23).
- The
llmkubelab Ansible deploy pins the in-cluster agent image tag to the release version, so the coder image must exist at :<version> before a deploy.
Priority
Willingness to Contribute
Feature Description
Make bumping and publishing the Foreman coder image (
ghcr.io/defilantech/llmkube-foreman-agent-coder) a codified part of the LLMKube release process, instead of a manual, easy-to-forget step in a separate repo.Problem Statement
The in-cluster foreman-agent runs the coder image, which is built in the separate
defilantech/llmkube-runtimesrepo. That image pins the release it was built from viaARG LLMKUBE_REFincoder/Dockerfile, and is published by pushing acoder-v<version>tag. Today, after every LLMKube release, someone must manually (a) bumpARG LLMKUBE_REFto the new tag and (b) push a matchingcoder-v<version>tag inllmkube-runtimes.Because it lives in another repo and is not part of the release checklist, it is easy to forget or to lag. When it lags, deploying the new release fails: the in-cluster foreman-agent hits
ImagePullBackOffon the coder image tag that was never published (this happened on the 0.8.23 rollout).Proposed Solution
Preferred: on LLMKube release publish, trigger the
llmkube-runtimescoder build for the new tag automatically. Options:repository_dispatch(orgh workflow run) tollmkube-runtimeswith the newv<version>; a workflow there bumpsARG LLMKUBE_REF, commits, and pushescoder-v<version>(which its existingbuild-coderworkflow already turns into the published image). Requires a cross-repo token.RELEASING.mdentry (2-step: bumpARG LLMKUBE_REF, pushcoder-v<version>), and/or a release-workflow check that fails/warns ifcoder-v<version>does not exist inllmkube-runtimesafter a release.At minimum ship option 3 so the step is never silently skipped; option 1 is the real fix.
Alternatives Considered
Continuing to do it by hand each release (current state, error-prone). Folding the coder image build into the LLMKube repo itself (rejected: keeping the toolchain/runtime image in
llmkube-runtimeskeeps the operator repo air-gap-clean and its build fast).Additional Context
defilantech/llmkube-runtimes,coder/Dockerfile(ARG LLMKUBE_REF), tag patterncoder-v<version>, workflowbuild-coder.ImagePullBackOffon a coder tag that was never published (0.8.23).llmkubelabAnsible deploy pins the in-cluster agent image tag to the release version, so the coder image must exist at:<version>before a deploy.Priority
Willingness to Contribute