Skip to content

CI: llvm-lto.yml docker-pull ENOSPC on the large ci image (pre-existing, intermittent) #73

Description

@avrabe

Split out from #71 (which fixed the zephyr-tests ENOSPC + the west sdk install rate-limit).

Symptom

llvm-lto-build and llvm-lto-test (*) jobs intermittently fail with:

Docker pull failed with exit code 1
... failed to register layer: ... no space left on device

This is pre-existing on main (e.g. runs on bc1bff2, 40b2742 are red) and reproduced on #71's runs.

Root cause

llvm-lto.yml uses the full ghcr.io/zephyrproject-rtos/ci image (larger than ci-base used by zephyr-tests). The image unpacks to /var/lib/docker on the runner's ~14 GB root fs before the container starts, so the /mnt workspace relocation and --personal-access-token fixes from #71 cannot reach it — those address build-time disk + the SDK API rate-limit, not the image pull.

Fix options (infra decision)

  1. De-containerize the llvm-lto jobs: drop job-level container:, free host disk first (rm -rf /opt/hostedtoolcache /usr/share/dotnet /usr/local/lib/android …, ~25 GB), then install the toolchain on the host or run the build via docker run. Most robust; biggest change.
  2. Smaller image: switch cici-base (llvm-lto installs its own Rust/clang/west anyway). Smaller pull; risk it lacks a dep llvm-lto needs.
  3. Bigger runner with more disk. Simplest; cost.

Notes

  • Not a test or code bug — pure CI infra.
  • smp_spinlock/SMP jobs are continue-on-error: true (non-blocking) and are a separate known x86_64-SMP-boot issue.

🤖 Generated with Claude Code

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions