Skip to content

feat(ext-api): OrphanTempFileCleanupHook — boot sweep of orphan gzip-chunk-*.tmp (#1296)#1309

Merged
zbnerd merged 7 commits into
developfrom
feature/ext-api-orphan-tmp-cleanup-1296
Jun 19, 2026
Merged

feat(ext-api): OrphanTempFileCleanupHook — boot sweep of orphan gzip-chunk-*.tmp (#1296)#1309
zbnerd merged 7 commits into
developfrom
feature/ext-api-orphan-tmp-cleanup-1296

Conversation

@zbnerd

@zbnerd zbnerd commented Jun 19, 2026

Copy link
Copy Markdown
Owner

Summary

  • Adds OrphanTempFileCleanupHook (Spring ApplicationRunner) that deletes orphan gzip-chunk-*.tmp files older than 1h from java.io.tmpdir on boot.
  • Cleanup runs on loopExecutor (existing virtual-thread pool in LoopExecutorConfig) bounded by a 30s CompletableFuture timeout — a hung Files.list() (NFS) cannot block boot indefinitely.
  • Wrapped in LogicExecutor.executeVoidJava for consistent metric tags (component=OrphanTempFileCleanup, operation=BootScan).
  • Fail-soft per-file delete with INFO summary log.
  • Best-effort: a Files.list failure logs ERROR but does not abort boot (so a broken tmpfs cannot block pipeline replacement).
  • 7 unit tests via @TempDir + injected Clock + Executor.

Why

Disk leak observed 2026-06-16: 293 orphan files, 2.4 GB, mtime 1–7 days. /tmp is tmpfs sized at RAM/2 (~30 GB). At 8 MB × 30K chunks/day = 240 MB/day leak rate. This hook self-heals on every unclean reboot.

First boot on this branch: scanned=139 deleted=139 bytes_freed=29616 — real orphans cleaned.

Design notes

  • 1h cutoff: active writers' temp files are < 1h old. A 6× safety margin over the 10-min S3 transfer manager timeout.
  • 30s deadline: bounds boot time even if /tmp hangs (NFS). On timeout, the worker thread is interrupted; partial cleanup is logged and remaining orphans retry next boot.
  • java.io.tmpdir (not /tmp) keeps the hook portable and test-injectable.
  • loopExecutor chosen over Spring Boot's default applicationTaskExecutor because TaskExecutionAutoConfiguration is suppressed when any Executor bean exists — LoopExecutorConfig.loopExecutor (a virtual-thread AsyncTaskExecutor) satisfies the same role.

Out of scope

  • Other snapshot temp files (urgent-chunk, manifest tmp)
  • MinIO / S3 partial-upload cleanup
  • SIGTERM-side cleanup (a JVM shutdown hook is a possible follow-up)
  • Scheduled periodic cleanup
  • Multi-instance coordination

Verification

  • ./gradlew :module-external-api:test --tests OrphanTempFileCleanupHookTest → 7/7 pass
  • ./gradlew :module-external-api:compileKotlin compileJava --continue → success
  • bootRun (port 8181) → [OrphanTempFileCleanup] scanned=139 deleted=139 bytes_freed=29616 failed=0 log present on first boot
  • Synthetic orphan (24 bytes, 2h mtime) staged in /tmp → second boot → deleted=1 bytes_freed=24 + file gone

Closes #1296

🤖 Generated with Claude Code

@chatgpt-codex-connector

Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

@zbnerd zbnerd merged commit 74a65d5 into develop Jun 19, 2026
@zbnerd zbnerd deleted the feature/ext-api-orphan-tmp-cleanup-1296 branch June 19, 2026 05:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant