Skip to content

feat: chunked image push via OCI compatible push#2760

Open
markphelps wants to merge 19 commits intomainfrom
mphelps/chunked-image-push
Open

feat: chunked image push via OCI compatible push#2760
markphelps wants to merge 19 commits intomainfrom
mphelps/chunked-image-push

Conversation

@markphelps
Copy link
Contributor

@markphelps markphelps commented Feb 23, 2026

Summary

Replace Docker's monolithic ImagePush with chunked layer uploads, gated behind COG_PUSH_OCI=1. The image is exported from Docker via ImageSave, loaded into memory using go-containerregistry's tarball.ImageFromPath, then each layer is pushed through the existing RegistryClient.WriteLayer chunked upload path — the same infrastructure weight artifacts already use. This bypasses request body limits that block Docker's native push for large layers.

Off by default. Set COG_PUSH_OCI=1 to enable. Falls back to Docker push on any non-fatal error (except context cancellation/timeout and auth errors), so there's zero regression risk.

What changed

Chunked image push

  • ImagePusher tries OCI chunked push first when COG_PUSH_OCI=1, falls back to Docker push
  • Exports image via docker.ImageSave() → tar → in-memory v1.Image → concurrent layer upload
  • shouldFallbackToDocker() blocks fallback on context cancellation/timeout and authentication errors (401/403) — auth errors won't be fixed by Docker fallback, so the original error is surfaced directly
  • Added ImageSave(ctx, imageRef) to the Docker Command interface

Unified push infrastructure

  • Single PushProgress type replaces separate image/weight progress types
  • writeLayerWithProgress() helper deduplicates progress channel boilerplate
  • GetPushConcurrency() (default 5, COG_PUSH_CONCURRENCY) shared by image layers, weight pushes, and CLI progress
  • BundlePusher.pushWeights() now has a concurrency limit (was unlimited)

Progress rendering

  • Replaced the mpb progress bar library with Docker's jsonmessage.DisplayJSONMessagesStream — the same rendering code used by docker push
  • Progress output now matches docker push exactly: layer ID, status, and progress bar on each line
  • Handles terminal resizing correctly: each line is erased and rewritten individually (ESC[2K + per-line cursor up/down), rather than relying on a bulk cursor-up count that desyncs when lines wrap after a resize
  • Terminal width is re-queried via ioctl(TIOCGWINSZ) on every render, so progress bars adapt dynamically
  • progressWriter adapter in pkg/cli/progress.go converts PushProgress callbacks into JSON-encoded JSONMessage structs fed to DisplayJSONMessagesStream via an io.Pipe
  • Retry messages are only logged via console.Warnf in non-TTY mode — in TTY mode the progress writer handles display inline, avoiding duplicate output
  • Removed mpb dependency entirely

HTTP/1.1 transport for chunked uploads

  • The registry client's chunked upload transport now forces HTTP/1.1 instead of allowing HTTP/2 negotiation
  • Problem: Go's default http.Transport negotiates HTTP/2 via TLS ALPN. When pushing large layers through certain CDN/proxy edges, the edge can send RST_STREAM INTERNAL_ERROR on subsequent PATCH chunks — killing the upload before it reaches the origin.
  • Fix: http1OnlyTransport() clones http.DefaultTransport, disables ForceAttemptHTTP2, and sets NextProtos: ["http/1.1"] in the TLS config. Each PATCH gets a clean request/response cycle without stream multiplexing issues.
  • HTTP/2 stream errors (RST_STREAM) are also now recognized by isRetryableError() as a belt-and-suspenders measure, so they trigger the existing retry-from-scratch logic in WriteLayer.

Retry and resource management

  • Retry jitter is now properly randomized (rand.Float64()) to avoid thundering herd when multiple clients retry simultaneously
  • Chunk buffers (up to 95 MB each) are pooled via sync.Pool to reduce memory pressure when pushing multiple layers concurrently

Server-driven chunk sizing (OCI-Chunk-Min/Max-Length)

  • initiateUpload() now parses OCI-Chunk-Min-Length and OCI-Chunk-Max-Length headers from the registry's 202 response
  • The server-advertised maximum always takes precedence over client defaults — chunk size is set to max - 64KB margin to stay safely under the limit
  • The result is clamped to be at least the server-advertised minimum
  • COG_PUSH_DEFAULT_CHUNK_SIZE is only used as a fallback when the registry doesn't advertise limits
  • New uploadSession struct carries location + chunk constraints from initiation through to upload

Registry config

  • Extracted getDefaultChunkSize() and getMultipartThreshold() env var helpers into pkg/registry/config.go
  • Reduced default chunk size from 256 MB to 95 MB to stay under common CDN/proxy request body limits (used only when registry doesn't advertise OCI-Chunk-Max-Length)

Cleanup

  • Deleted pkg/oci/ package — OCI image loading inlined into ImagePusher
  • Deleted tools/uploader/ — unused S3 multipart uploader (363 lines, zero imports)
  • Removed Pusher interface, OCIImagePusher, pushImageWithFallback() — consolidated into single ImagePusher
  • Unexported DefaultFactorydefaultFactory, NewImagePushernewImagePusher

Environment variables

Variable Default Description
COG_PUSH_OCI unset (off) Set to 1 to enable OCI chunked image push
COG_PUSH_CONCURRENCY 5 Max concurrent layer/weight uploads
COG_PUSH_DEFAULT_CHUNK_SIZE 99614720 (95 MiB) Fallback chunk size when registry doesn't advertise OCI-Chunk-Max-Length
COG_PUSH_MULTIPART_THRESHOLD 52428800 (50 MiB) Blobs above this size use chunked upload

@markphelps markphelps force-pushed the mphelps/chunked-image-push branch 2 times, most recently from 8b01c53 to 88ff363 Compare February 23, 2026 20:45
Replace Docker's monolithic ImagePush with a chunked push path for
container image layers. Images are exported from the Docker daemon to
OCI layout via ImageSave, then pushed through the registry client's
existing chunked upload infrastructure (WriteLayer with 256MB chunks).

This bypasses the ~500MB Cloudflare Workers request body limit that
blocks Docker's native push for large layers.

Key changes:
- Add OCIImagePusher to pkg/registry/ with concurrent layer uploads
- Export images from Docker daemon to OCI layout via ImageSave + tarball
- Integrate into Resolver.Push and BundlePusher with Docker push fallback
- Add ImageSave method to command.Command interface
- Delete unused tools/uploader/ S3 multipart code (363 lines)
…pkg/model

Move OCI layout utilities to pkg/oci/, extract registry transport config
(chunk size, multipart threshold env vars) to pkg/registry/config.go, and
relocate OCIImagePusher to pkg/model/ alongside ImagePusher and WeightPusher.

- pkg/oci/: pure OCI format utilities (Docker tar <-> OCI layout), no registry deps
- pkg/registry/config.go: configurable chunk size and multipart threshold
- pkg/model/oci_image_pusher.go: push orchestration with shared pushImageWithFallback()
- Deduplicate fallback logic between resolver.go and pusher.go
- Add error discrimination: no fallback on auth errors or context cancellation
- Create OCIImagePusher once in NewResolver, not per-push call
…weight pushers

- Unify ImagePushProgress and WeightPushProgress into shared PushProgress type
- Extract writeLayerWithProgress() helper to deduplicate progress channel
  boilerplate between OCIImagePusher and WeightPusher
- Unify push concurrency: both image layer pushes and weight pushes use
  GetPushConcurrency() (default 4, overridable via COG_PUSH_CONCURRENCY)
- Fix BundlePusher.pushWeights() which had no concurrency limit (launched
  all goroutines at once); now uses errgroup.SetLimit
- Implement auth error detection in shouldFallbackToDocker() to match its
  documented behavior (don't fall back on UNAUTHORIZED/DENIED errors)
String-based error detection is fragile. Fall back to Docker push on any
error except context cancellation/timeout.
Replace ~200 lines of custom ANSI escape progress rendering with the mpb
(multi-progress-bar) library, which was already a dependency but unused.

mpb handles TTY detection, cursor management, concurrent bar updates, and
size formatting natively. Retry status is shown via a dynamic decorator.
…etTotal completion

When bars are created with total > 0, mpb sets triggerComplete=true
internally. This causes SetTotal(n, true) to early-return without
triggering completion, so bars never finish and p.Wait() deadlocks.

Creating bars with total=0 leaves triggerComplete=false, allowing
explicit completion via SetTotal(current, true) after push finishes.
The real total is still set dynamically via ProgressFn callbacks.
Merge OCIImagePusher (OCI chunked push) and the old ImagePusher (Docker
push) into a single ImagePusher type that tries OCI first and falls back
to Docker push on non-fatal errors.

- ImagePusher.Push() handles OCI→Docker fallback internally
- Delete OCIImagePusher type and oci_image_pusher.go
- BundlePusher takes *ImagePusher directly instead of separate oci/docker pushers
- Resolver stores single imagePusher field instead of ociPusher
- Remove dead Pusher interface
- Consolidate tests into image_pusher_test.go
ImagePusher now calls p.docker.ImageSave() directly instead of going
through the oci.ImageSaveFunc indirection. The OCI layout export logic
is inlined into ImagePusher.ociPush(). The pkg/oci package is deleted
entirely since it had no other consumers.
OCI push is now opt-in rather than always-on when a registry client
is present. Requires COG_PUSH_OCI=1 to activate.
Signed-off-by: Mark Phelps <mphelps@cloudflare.com>
- Add dynamic mpb progress bars for per-layer upload progress during OCI push
- Wire ImageProgressFn through PushOptions → Resolver → ImagePusher
- Force HTTP/1.1 for registry chunked uploads to avoid HTTP/2 RST_STREAM errors
- Add HTTP/2 stream errors to isRetryableError for retry resilience
- Reduce default chunk size from 256MB to 95MB to stay under CDN body limits
@markphelps markphelps force-pushed the mphelps/chunked-image-push branch from 9930c18 to cd606d8 Compare February 24, 2026 17:47
@markphelps markphelps marked this pull request as ready for review February 24, 2026 19:01
@markphelps markphelps requested a review from a team as a code owner February 24, 2026 19:01
@markphelps markphelps marked this pull request as draft February 25, 2026 16:54
Parse OCI-Chunk-Min-Length and OCI-Chunk-Max-Length headers from the
registry's upload initiation response (POST /v2/.../blobs/uploads/).
The server-advertised maximum always takes precedence over client
defaults, and the result is clamped to be at least the server minimum.

Rename COG_PUSH_CHUNK_SIZE to COG_PUSH_DEFAULT_CHUNK_SIZE to clarify
that it is only a fallback for registries that don't advertise limits.
Replace the mpb progress bar library with Docker's jsonmessage rendering
(the same code used by `docker push`) for OCI layer and weight upload
progress. This fixes terminal corruption when the terminal is resized
during a push.

Root cause: mpb writes all bars then uses a bulk cursor-up (CUU N) to
reposition. When the terminal shrinks, previously rendered lines wrap to
occupy more visual lines, but the cursor-up count stays at the logical
count, leaving ghost copies of progress bars on screen.

Docker's jsonmessage avoids this by erasing and rewriting each line
individually (ESC[2K + per-line cursor up/down), and re-querying
terminal width on every render via ioctl(TIOCGWINSZ).

Also removes the mpb dependency entirely from go.mod.
…back

Strip HTML response bodies from transport errors (e.g., Cloudflare 413
pages) before displaying to user. Add OnFallback callback to close the
progress writer before Docker push starts, preventing stale OCI progress
bars from lingering above Docker's output.
@markphelps
Copy link
Contributor Author

Latest: sanitize error output and clear progress on Docker fallback

Two fixes for terminal output quality when OCI push fails and falls back to Docker push:

  1. HTML response bodies no longer leak into error messages — When a registry/CDN returns an HTML error page (e.g., Cloudflare 413), transport.Error includes the entire raw body in its Error() string. Added sanitizeError() that extracts just the HTTP status code and text (e.g., HTTP 413 Request Entity Too Large) instead of dumping a full HTML page.

  2. OCI progress bars are cleared before Docker fallback starts — Added OnFallback callback that closes the progressWriter before Docker push begins its own jsonmessage output. Without this, stale OCI progress lines remained on screen above Docker's progress bars.

Both fixes are covered by new tests (TestSanitizeError, TestImagePusher_OnFallback).

@markphelps markphelps marked this pull request as ready for review February 25, 2026 20:27
mfainberg-cf
mfainberg-cf previously approved these changes Feb 25, 2026
DefaultPushConcurrency = 4

// envPushConcurrency is the environment variable that overrides DefaultPushConcurrency.
envPushConcurrency = "COG_PUSH_CONCURRENCY"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for having this!

Comment on lines +11 to +18
DefaultMultipartThreshold = 50 * 1024 * 1024 // 50 MB

// DefaultChunkSize is the size (in bytes) of each chunk in a multipart upload.
// This is used as a fallback when the registry does not advertise chunk size
// limits via OCI-Chunk-Min-Length / OCI-Chunk-Max-Length headers.
// 95 MB stays under common CDN/proxy request body limits while still being
// large enough to reduce HTTP round-trips for multi-GB files.
DefaultChunkSize = 95 * 1024 * 1024 // 95 MB
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't these be inverted, typically we want the threshold to be higher than the chunksize by not a small margin, this prevents MPU for slightly larger than N. It's fine as is, but we're going to MPU for 1 chunk for the default.

Comment on lines +356 to +362
func http1OnlyTransport() *http.Transport {
t := http.DefaultTransport.(*http.Transport).Clone()
t.TLSClientConfig = tlsConfigHTTP1Only(t.TLSClientConfig)
// ForceAttemptHTTP2 is true by default on cloned transports; disable it.
t.ForceAttemptHTTP2 = false
return t
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hate how broken HTTP2 is. if we move to 1.26 we have some additional knobs with adjusting new connections per stream. Additionally, we should really consdier if we want HTTP2 at all vs multiple H1 connections only. High throughput http2 often has issues with the internal muxing of the streams on a single connection. This isn't tiny files a browser is downloading.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, head of line blocking and all that. good point. we can switch to http1 always

@bfirsh
Copy link
Member

bfirsh commented Feb 25, 2026

This is incredibly exciting.

- Remove unused OCI layout directory creation that doubled disk I/O (C1)
- Randomize retry jitter to avoid thundering herd (W2)
- Skip Docker fallback on 401/403 auth errors since they'd fail identically (W3)
- Pool chunk buffers via sync.Pool to reduce memory pressure (W4)
- Suppress duplicate retry log messages in TTY mode (S5)
@markphelps markphelps changed the title feat: chunked image push via OCI layout feat: chunked image push via OCI registry API Feb 25, 2026
@markphelps markphelps changed the title feat: chunked image push via OCI registry API feat: chunked image push via OCI compatible push Feb 25, 2026
Copy link
Member

@michaeldwan michaeldwan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strong overall, but a couple of design quirks that would be easy now to correct so the spirit of the OCI artifact & model resolver refactor isn't lost.

// which handles terminal resizing correctly: each line is erased and rewritten
// individually (ESC[2K + cursor up/down per line), rather than relying on a
// bulk cursor-up count that can desync when lines wrap after a terminal resize.
type progressWriter struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really a docker thing and I wonder if the type should live in the docker package. we'll soon need a standalone progress reporter for non-docker things, like fetching/processing weights, which feels more appropriate for a CLI UI concern.


const maxConcurrency = 4
sem := make(chan struct{}, maxConcurrency)
sem := make(chan struct{}, model.GetPushConcurrency())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be an error group with a concurrent limit set

// canOCIPush returns true if OCI chunked push is enabled.
// Requires COG_PUSH_OCI=1 and a registry client.
func (p *ImagePusher) canOCIPush() bool {
return os.Getenv("COG_PUSH_OCI") == "1" && p.registry != nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would cause the registry to be nil? curious if we could get into a place where a wonky code path was expecting to push a model bundle but failed because we didn't set a registry so a standard push happened instead, which is not a model bundle.


// ociPush exports the image from Docker daemon as a tar, then pushes all layers,
// config, and manifest to the registry using chunked uploads.
func (p *ImagePusher) ociPush(ctx context.Context, imageRef string, opt ImagePushOptions) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a regression. the pusher originally took a fully resolved Model type, which already has a parsed ref and all that. going back to a string sidesteps that recent improvement and opens the door to issues like the ref we thought we were pushing resolved to something else immediately before hitting docker. basically the only place we're dealing with a string reference is in resolve, the returned Model is either fully resolved and correct, or not, and we use that everywhere.

func NewBundlePusher(imagePusher *ImagePusher, reg registry.Client) *BundlePusher {
return &BundlePusher{
imagePusher: NewImagePusher(docker),
imagePusher: imagePusher,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is exposing some of the leaky design recently replaced. Why are we creating a weight pusher in this function but the image pusher in another? Part of the recent refactor was to unify the way we create these things so there's only one way to do it. ie given docker and registry clients, we can create a BundlePusher or ImagePusher and both satisfy the interface the CLI/model package operates on. nothing knows or cares what we're dealing with, only the code that determined which pusher to return. for example, we're going to probably add the json schema file as another bundle artifact, which might force knowledge of that (creation and consumption) onto the CLI and all this other code.

results := make(chan indexedResult, len(weights))
var wg sync.WaitGroup
g, ctx := errgroup.WithContext(ctx)
g.SetLimit(GetPushConcurrency())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

Comment on lines +138 to +144
img, err := tarball.ImageFromPath(tmpTar.Name(), &tag)
if err != nil {
return fmt.Errorf("load image from tar: %w", err)
}

return p.pushImage(ctx, imageRef, img, opt)
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to refactor this so we dont load the entire image into memory to push

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants