feat: chunked image push via OCI compatible push#2760
feat: chunked image push via OCI compatible push#2760markphelps wants to merge 19 commits intomainfrom
Conversation
8b01c53 to
88ff363
Compare
Replace Docker's monolithic ImagePush with a chunked push path for container image layers. Images are exported from the Docker daemon to OCI layout via ImageSave, then pushed through the registry client's existing chunked upload infrastructure (WriteLayer with 256MB chunks). This bypasses the ~500MB Cloudflare Workers request body limit that blocks Docker's native push for large layers. Key changes: - Add OCIImagePusher to pkg/registry/ with concurrent layer uploads - Export images from Docker daemon to OCI layout via ImageSave + tarball - Integrate into Resolver.Push and BundlePusher with Docker push fallback - Add ImageSave method to command.Command interface - Delete unused tools/uploader/ S3 multipart code (363 lines)
…pkg/model Move OCI layout utilities to pkg/oci/, extract registry transport config (chunk size, multipart threshold env vars) to pkg/registry/config.go, and relocate OCIImagePusher to pkg/model/ alongside ImagePusher and WeightPusher. - pkg/oci/: pure OCI format utilities (Docker tar <-> OCI layout), no registry deps - pkg/registry/config.go: configurable chunk size and multipart threshold - pkg/model/oci_image_pusher.go: push orchestration with shared pushImageWithFallback() - Deduplicate fallback logic between resolver.go and pusher.go - Add error discrimination: no fallback on auth errors or context cancellation - Create OCIImagePusher once in NewResolver, not per-push call
…weight pushers - Unify ImagePushProgress and WeightPushProgress into shared PushProgress type - Extract writeLayerWithProgress() helper to deduplicate progress channel boilerplate between OCIImagePusher and WeightPusher - Unify push concurrency: both image layer pushes and weight pushes use GetPushConcurrency() (default 4, overridable via COG_PUSH_CONCURRENCY) - Fix BundlePusher.pushWeights() which had no concurrency limit (launched all goroutines at once); now uses errgroup.SetLimit - Implement auth error detection in shouldFallbackToDocker() to match its documented behavior (don't fall back on UNAUTHORIZED/DENIED errors)
String-based error detection is fragile. Fall back to Docker push on any error except context cancellation/timeout.
Replace ~200 lines of custom ANSI escape progress rendering with the mpb (multi-progress-bar) library, which was already a dependency but unused. mpb handles TTY detection, cursor management, concurrent bar updates, and size formatting natively. Retry status is shown via a dynamic decorator.
…etTotal completion When bars are created with total > 0, mpb sets triggerComplete=true internally. This causes SetTotal(n, true) to early-return without triggering completion, so bars never finish and p.Wait() deadlocks. Creating bars with total=0 leaves triggerComplete=false, allowing explicit completion via SetTotal(current, true) after push finishes. The real total is still set dynamically via ProgressFn callbacks.
Merge OCIImagePusher (OCI chunked push) and the old ImagePusher (Docker push) into a single ImagePusher type that tries OCI first and falls back to Docker push on non-fatal errors. - ImagePusher.Push() handles OCI→Docker fallback internally - Delete OCIImagePusher type and oci_image_pusher.go - BundlePusher takes *ImagePusher directly instead of separate oci/docker pushers - Resolver stores single imagePusher field instead of ociPusher - Remove dead Pusher interface - Consolidate tests into image_pusher_test.go
ImagePusher now calls p.docker.ImageSave() directly instead of going through the oci.ImageSaveFunc indirection. The OCI layout export logic is inlined into ImagePusher.ociPush(). The pkg/oci package is deleted entirely since it had no other consumers.
OCI push is now opt-in rather than always-on when a registry client is present. Requires COG_PUSH_OCI=1 to activate.
Signed-off-by: Mark Phelps <mphelps@cloudflare.com>
- Add dynamic mpb progress bars for per-layer upload progress during OCI push - Wire ImageProgressFn through PushOptions → Resolver → ImagePusher - Force HTTP/1.1 for registry chunked uploads to avoid HTTP/2 RST_STREAM errors - Add HTTP/2 stream errors to isRetryableError for retry resilience - Reduce default chunk size from 256MB to 95MB to stay under CDN body limits
9930c18 to
cd606d8
Compare
Parse OCI-Chunk-Min-Length and OCI-Chunk-Max-Length headers from the registry's upload initiation response (POST /v2/.../blobs/uploads/). The server-advertised maximum always takes precedence over client defaults, and the result is clamped to be at least the server minimum. Rename COG_PUSH_CHUNK_SIZE to COG_PUSH_DEFAULT_CHUNK_SIZE to clarify that it is only a fallback for registries that don't advertise limits.
Replace the mpb progress bar library with Docker's jsonmessage rendering (the same code used by `docker push`) for OCI layer and weight upload progress. This fixes terminal corruption when the terminal is resized during a push. Root cause: mpb writes all bars then uses a bulk cursor-up (CUU N) to reposition. When the terminal shrinks, previously rendered lines wrap to occupy more visual lines, but the cursor-up count stays at the logical count, leaving ghost copies of progress bars on screen. Docker's jsonmessage avoids this by erasing and rewriting each line individually (ESC[2K + per-line cursor up/down), and re-querying terminal width on every render via ioctl(TIOCGWINSZ). Also removes the mpb dependency entirely from go.mod.
…back Strip HTML response bodies from transport errors (e.g., Cloudflare 413 pages) before displaying to user. Add OnFallback callback to close the progress writer before Docker push starts, preventing stale OCI progress bars from lingering above Docker's output.
Latest: sanitize error output and clear progress on Docker fallbackTwo fixes for terminal output quality when OCI push fails and falls back to Docker push:
Both fixes are covered by new tests ( |
| DefaultPushConcurrency = 4 | ||
|
|
||
| // envPushConcurrency is the environment variable that overrides DefaultPushConcurrency. | ||
| envPushConcurrency = "COG_PUSH_CONCURRENCY" |
There was a problem hiding this comment.
Thanks for having this!
| DefaultMultipartThreshold = 50 * 1024 * 1024 // 50 MB | ||
|
|
||
| // DefaultChunkSize is the size (in bytes) of each chunk in a multipart upload. | ||
| // This is used as a fallback when the registry does not advertise chunk size | ||
| // limits via OCI-Chunk-Min-Length / OCI-Chunk-Max-Length headers. | ||
| // 95 MB stays under common CDN/proxy request body limits while still being | ||
| // large enough to reduce HTTP round-trips for multi-GB files. | ||
| DefaultChunkSize = 95 * 1024 * 1024 // 95 MB |
There was a problem hiding this comment.
Shouldn't these be inverted, typically we want the threshold to be higher than the chunksize by not a small margin, this prevents MPU for slightly larger than N. It's fine as is, but we're going to MPU for 1 chunk for the default.
| func http1OnlyTransport() *http.Transport { | ||
| t := http.DefaultTransport.(*http.Transport).Clone() | ||
| t.TLSClientConfig = tlsConfigHTTP1Only(t.TLSClientConfig) | ||
| // ForceAttemptHTTP2 is true by default on cloned transports; disable it. | ||
| t.ForceAttemptHTTP2 = false | ||
| return t | ||
| } |
There was a problem hiding this comment.
I hate how broken HTTP2 is. if we move to 1.26 we have some additional knobs with adjusting new connections per stream. Additionally, we should really consdier if we want HTTP2 at all vs multiple H1 connections only. High throughput http2 often has issues with the internal muxing of the streams on a single connection. This isn't tiny files a browser is downloading.
There was a problem hiding this comment.
right, head of line blocking and all that. good point. we can switch to http1 always
|
This is incredibly exciting. |
- Remove unused OCI layout directory creation that doubled disk I/O (C1) - Randomize retry jitter to avoid thundering herd (W2) - Skip Docker fallback on 401/403 auth errors since they'd fail identically (W3) - Pool chunk buffers via sync.Pool to reduce memory pressure (W4) - Suppress duplicate retry log messages in TTY mode (S5)
michaeldwan
left a comment
There was a problem hiding this comment.
Strong overall, but a couple of design quirks that would be easy now to correct so the spirit of the OCI artifact & model resolver refactor isn't lost.
| // which handles terminal resizing correctly: each line is erased and rewritten | ||
| // individually (ESC[2K + cursor up/down per line), rather than relying on a | ||
| // bulk cursor-up count that can desync when lines wrap after a terminal resize. | ||
| type progressWriter struct { |
There was a problem hiding this comment.
This is really a docker thing and I wonder if the type should live in the docker package. we'll soon need a standalone progress reporter for non-docker things, like fetching/processing weights, which feels more appropriate for a CLI UI concern.
|
|
||
| const maxConcurrency = 4 | ||
| sem := make(chan struct{}, maxConcurrency) | ||
| sem := make(chan struct{}, model.GetPushConcurrency()) |
There was a problem hiding this comment.
This should probably be an error group with a concurrent limit set
| // canOCIPush returns true if OCI chunked push is enabled. | ||
| // Requires COG_PUSH_OCI=1 and a registry client. | ||
| func (p *ImagePusher) canOCIPush() bool { | ||
| return os.Getenv("COG_PUSH_OCI") == "1" && p.registry != nil |
There was a problem hiding this comment.
What would cause the registry to be nil? curious if we could get into a place where a wonky code path was expecting to push a model bundle but failed because we didn't set a registry so a standard push happened instead, which is not a model bundle.
|
|
||
| // ociPush exports the image from Docker daemon as a tar, then pushes all layers, | ||
| // config, and manifest to the registry using chunked uploads. | ||
| func (p *ImagePusher) ociPush(ctx context.Context, imageRef string, opt ImagePushOptions) error { |
There was a problem hiding this comment.
this is a regression. the pusher originally took a fully resolved Model type, which already has a parsed ref and all that. going back to a string sidesteps that recent improvement and opens the door to issues like the ref we thought we were pushing resolved to something else immediately before hitting docker. basically the only place we're dealing with a string reference is in resolve, the returned Model is either fully resolved and correct, or not, and we use that everywhere.
| func NewBundlePusher(imagePusher *ImagePusher, reg registry.Client) *BundlePusher { | ||
| return &BundlePusher{ | ||
| imagePusher: NewImagePusher(docker), | ||
| imagePusher: imagePusher, |
There was a problem hiding this comment.
This is exposing some of the leaky design recently replaced. Why are we creating a weight pusher in this function but the image pusher in another? Part of the recent refactor was to unify the way we create these things so there's only one way to do it. ie given docker and registry clients, we can create a BundlePusher or ImagePusher and both satisfy the interface the CLI/model package operates on. nothing knows or cares what we're dealing with, only the code that determined which pusher to return. for example, we're going to probably add the json schema file as another bundle artifact, which might force knowledge of that (creation and consumption) onto the CLI and all this other code.
| results := make(chan indexedResult, len(weights)) | ||
| var wg sync.WaitGroup | ||
| g, ctx := errgroup.WithContext(ctx) | ||
| g.SetLimit(GetPushConcurrency()) |
| img, err := tarball.ImageFromPath(tmpTar.Name(), &tag) | ||
| if err != nil { | ||
| return fmt.Errorf("load image from tar: %w", err) | ||
| } | ||
|
|
||
| return p.pushImage(ctx, imageRef, img, opt) | ||
| } |
There was a problem hiding this comment.
need to refactor this so we dont load the entire image into memory to push
Summary
Replace Docker's monolithic
ImagePushwith chunked layer uploads, gated behindCOG_PUSH_OCI=1. The image is exported from Docker viaImageSave, loaded into memory using go-containerregistry'starball.ImageFromPath, then each layer is pushed through the existingRegistryClient.WriteLayerchunked upload path — the same infrastructure weight artifacts already use. This bypasses request body limits that block Docker's native push for large layers.Off by default. Set
COG_PUSH_OCI=1to enable. Falls back to Docker push on any non-fatal error (except context cancellation/timeout and auth errors), so there's zero regression risk.What changed
Chunked image push
ImagePushertries OCI chunked push first whenCOG_PUSH_OCI=1, falls back to Docker pushdocker.ImageSave()→ tar → in-memory v1.Image → concurrent layer uploadshouldFallbackToDocker()blocks fallback on context cancellation/timeout and authentication errors (401/403) — auth errors won't be fixed by Docker fallback, so the original error is surfaced directlyImageSave(ctx, imageRef)to the DockerCommandinterfaceUnified push infrastructure
PushProgresstype replaces separate image/weight progress typeswriteLayerWithProgress()helper deduplicates progress channel boilerplateGetPushConcurrency()(default 5,COG_PUSH_CONCURRENCY) shared by image layers, weight pushes, and CLI progressBundlePusher.pushWeights()now has a concurrency limit (was unlimited)Progress rendering
mpbprogress bar library with Docker'sjsonmessage.DisplayJSONMessagesStream— the same rendering code used bydocker pushdocker pushexactly: layer ID, status, and progress bar on each lineESC[2K+ per-line cursor up/down), rather than relying on a bulk cursor-up count that desyncs when lines wrap after a resizeioctl(TIOCGWINSZ)on every render, so progress bars adapt dynamicallyprogressWriteradapter inpkg/cli/progress.goconvertsPushProgresscallbacks into JSON-encodedJSONMessagestructs fed toDisplayJSONMessagesStreamvia anio.Pipeconsole.Warnfin non-TTY mode — in TTY mode the progress writer handles display inline, avoiding duplicate outputmpbdependency entirelyHTTP/1.1 transport for chunked uploads
http.Transportnegotiates HTTP/2 via TLS ALPN. When pushing large layers through certain CDN/proxy edges, the edge can sendRST_STREAM INTERNAL_ERRORon subsequent PATCH chunks — killing the upload before it reaches the origin.http1OnlyTransport()cloneshttp.DefaultTransport, disablesForceAttemptHTTP2, and setsNextProtos: ["http/1.1"]in the TLS config. Each PATCH gets a clean request/response cycle without stream multiplexing issues.RST_STREAM) are also now recognized byisRetryableError()as a belt-and-suspenders measure, so they trigger the existing retry-from-scratch logic inWriteLayer.Retry and resource management
rand.Float64()) to avoid thundering herd when multiple clients retry simultaneouslysync.Poolto reduce memory pressure when pushing multiple layers concurrentlyServer-driven chunk sizing (OCI-Chunk-Min/Max-Length)
initiateUpload()now parsesOCI-Chunk-Min-LengthandOCI-Chunk-Max-Lengthheaders from the registry's 202 responsemax - 64KBmargin to stay safely under the limitCOG_PUSH_DEFAULT_CHUNK_SIZEis only used as a fallback when the registry doesn't advertise limitsuploadSessionstruct carries location + chunk constraints from initiation through to uploadRegistry config
getDefaultChunkSize()andgetMultipartThreshold()env var helpers intopkg/registry/config.goOCI-Chunk-Max-Length)Cleanup
pkg/oci/package — OCI image loading inlined intoImagePushertools/uploader/— unused S3 multipart uploader (363 lines, zero imports)Pusherinterface,OCIImagePusher,pushImageWithFallback()— consolidated into singleImagePusherDefaultFactory→defaultFactory,NewImagePusher→newImagePusherEnvironment variables
COG_PUSH_OCI1to enable OCI chunked image pushCOG_PUSH_CONCURRENCY5COG_PUSH_DEFAULT_CHUNK_SIZE99614720(95 MiB)OCI-Chunk-Max-LengthCOG_PUSH_MULTIPART_THRESHOLD52428800(50 MiB)