Skip to content

Bundle cache grows unbounded — no dep dedup, no eviction policy #95

@mgoldsborough

Description

@mgoldsborough

Problem

The bundle cache at ~/.mpak/cache/ accumulates without bound and duplicates vendored Python dependencies across every cached bundle. Real-world example from one developer's machine after a few months of use:

  • Total cache: 3.7 GB
  • _local/: 2.1 GB across 33 entries (local .mcpb installs from mpak install <path>)
  • 16 of those entries are 76 MB each and represent the same bundle rebuilt during dev iteration
  • Each entry's deps/ dominates: pandas (48 MB), numpy (23 MB), cryptography (22 MB), pydantic, fastmcp, etc., copied per-bundle

Two distinct contributors:

  1. Local-rebuild churn. prepareLocalBundle() keys cache entries by a hash of the bundle's absolute path (packages/sdk-typescript/src/mpakSDK.ts:280). A versioned filename like dist/mcp-foo-v0.1.0.mcpbv0.1.1.mcpb changes the path, so each rebuild produces a new cache entry; the old one is never reclaimed. Dominant cause of the 2.1 GB.
  2. Cross-bundle duplication. Every bundle's deps/ is extracted as its own copy. Pandas/numpy/cryptography exist N times for N bundles, even though the wheels are usually byte-identical. Grows linearly with bundle count.

There is no eviction policy, no size cap, and no GC. The only cleanup path is manual rm -rf.

Options considered

Approach Fixes churn Fixes cross-bundle dup Cost Notes
A. Retention policy + size cap yes no trivial (~50 lines) Per-localPath keep-latest-N in _local/; global LRU/size cap across cacheHome. Stopgap.
B. APFS clonefile / Linux reflinks during extract yes yes (partial) small Identical file contents share blocks at the FS layer. Needs a content-hash index to detect "seen this file before." macOS-only natively (APFS); Linux needs btrfs/xfs reflinks.
C. Shared venv (one env, all bundles) yes yes medium Disqualified — version conflicts across bundles are exactly why deps are vendored.
D. Content-addressable storage + hardlinks yes yes significant rewrite General, correct, prior art (pnpm, Nix, Bazel). Hardlink edge cases (__pycache__, cross-FS).
E. Delegate to uv (lockfile in bundle, uv resolves on install) yes yes medium Changes bundle contract; requires online install or pre-warmed wheel cache. uv already implements CAS+hardlinks for wheels.
F. Symlink farms yes yes medium Python + symlinks + __pycache__ is a known footgun.
G. Overlay/FUSE yes yes high Way too heavy for a single-machine disk cache.

Recommendation

Phased:

  1. Ship A first (low-risk, kills the dominant cause):

    • In _local/, retain only the most recent extraction per localPath from .mpak-local-meta.json. Evict older entries.
    • Add an opt-in size cap on cacheHome with LRU eviction across both _local/ and named registry caches.
    • Expose mpak cache prune and mpak cache info.
  2. Then evaluate B (FS clones) for cross-bundle dedup. Lowest-effort path to dedup that doesn't require restructuring storage. macOS gets it via clonefile(2); Linux via FICLONE ioctl on btrfs/xfs; ext4 falls back to regular copy.

  3. Reach for D or E only if data justifies it. mpak is a single-machine, single-user disk cache — it does not need the generality CAS protocols provide for multi-host or build-system contexts. Delegating to uv (E) is attractive but changes the bundle contract and deserves its own design discussion.

Files

  • packages/sdk-typescript/src/cache.tsMpakBundleCache, registry-bundle cache layout
  • packages/sdk-typescript/src/mpakSDK.ts:275prepareLocalBundle, _local/<hash> entry creation
  • packages/sdk-typescript/src/helpers.ts:267hashBundlePath (path-keyed, not content-keyed)

Acceptance for phase 1

  • _local/ retains at most N entries per localPath (config, default 1)
  • Cache total size capped via LRU eviction (config, default off or generous)
  • mpak cache info reports total size, per-bundle size, oldest entry
  • mpak cache prune runs the eviction passes manually
  • No behavior change when cache is small / under cap

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions