Skip to content

🤖 fix: stop idle auto-compaction loop on repeated/model-not-found failures#3567

Merged
ammario merged 4 commits into
mainfrom
fix/idle-compaction-stop-on-repeated-failure
Jun 15, 2026
Merged

🤖 fix: stop idle auto-compaction loop on repeated/model-not-found failures#3567
ammario merged 4 commits into
mainfrom
fix/idle-compaction-stop-on-repeated-failure

Conversation

@ammar-agent

Copy link
Copy Markdown
Collaborator

Summary

Stops the hourly idle auto-compaction loop from re-attempting a workspace that keeps failing: it now gives up immediately when an attempt fails with a model-not-found error, and otherwise after two consecutive failures of any kind. Previously a persistently failing workspace (e.g. a misconfigured/typo'd compaction model) was re-queued every hour forever, since a failed compaction neither marks the workspace compacted nor refreshes recency, so it stayed eligible indefinitely.

Background

IdleCompactionService re-checks eligibility every hour and re-queues any idle workspace. There was no failure tracking, so a workspace whose compaction always fails would be retried on every cycle with no cap. The most common cause is a compaction model the provider rejects.

A key subtlety: the failure can surface in two different places.

  • Pre-stream failures (invalid model string, model unavailable, provider/runtime issues) come back synchronously as the sendMessage Result that executeIdleCompaction already inspects.
  • Mid-stream failures (the common model_not_found case for a well-formed but non-existent Anthropic/OpenAI model) are not returned by sendMessage — they arrive later as an async "error" stream event and are only persisted transiently to partial.json, so they can't be recovered from chat history on the next check.

Implementation

  • IdleCompactionService now tracks per-workspace consecutive failures and a sticky suppressedWorkspaceIds set. checkEligibility short-circuits suppressed workspaces (reason: "suppressed_after_failures"). A new recordOutcome(workspaceId, outcome) applies the policy: success resets the streak; model_not_found suppresses immediately; any other failure suppresses on the second consecutive occurrence.
  • WorkspaceService reports terminal idle-compaction outcomes through a listener:
    • pre-stream failure → reported from executeIdleCompaction (model-not-found classified via formatSendMessageError),
    • mid-stream failure → reported from the existing aiService.on("error") handler when the workspace is mid idle-compaction (read before stopStreamingStatus clears the marker),
    • success → reported from handleStreamCompletion.
    • error / stream-end / stream-abort are mutually exclusive per turn, so a failed turn never also reports success. Aborts are intentionally neutral (not a failure).
  • ServiceContainer wires the listener to IdleCompactionService.recordOutcome.

Suppression is in-memory and sticky for the service lifetime; restarting the app (e.g. after fixing the configured compaction model) clears it.

Validation

  • New unit tests in idleCompactionService.test.ts: suppress after two consecutive failures, suppress immediately on model_not_found, success resets the streak, and checkAllWorkspaces no longer queues a suppressed workspace.
  • New workspaceService.test.ts tests: executeIdleCompaction reports modelNotFound: true for an invalid compaction model and false for a generic pre-stream failure.
  • make static-check and targeted tests pass.

Risks

Low. Behavior is additive and scoped to the background idle-compaction loop. The outcome listener is optional (no-op until wired), and normal interactive sends are untouched. Worst case of a wrong suppression is that a workspace stops auto-compacting on idle until restart, which a user can already trigger manually.


Generated with mux • Model: anthropic:claude-opus-4-8 • Thinking: xhigh • Cost: $1.48

@ammar-agent

Copy link
Copy Markdown
Collaborator Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8a65f175c2

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/node/services/workspaceService.ts Outdated
Address Codex review: a clean provider stream-end does not guarantee the
post-stream history compaction succeeded. Drive the idle success/failure
outcome from CompactionHandler.handleCompletion (after the summary is
persisted, or on a post-stream persistence failure) via a new
onIdleCompactionOutcome callback, instead of reporting success on stream-end.
@ammar-agent

Copy link
Copy Markdown
Collaborator Author

Addressed in 083e71d.

The idle-compaction outcome is no longer reported from the provider stream-end. Instead, CompactionHandler.handleCompletion() now reports the terminal outcome via a new onIdleCompactionOutcome(success) callback:

  • success is reported only after the compacted summary is actually persisted (right after onCompactionComplete),
  • failure is reported on the post-stream persistence paths (performCompaction returns Err, empty summary, raw-JSON summary).

WorkspaceService.createSession forwards that to reportIdleCompactionOutcome (a persistence failure → { success: false, modelNotFound: false }), so these post-stream failures now count toward the suppression threshold. The unconditional success report in handleStreamCompletion was removed. Mid-stream provider errors (no stream-end/no persistence) are still reported from the "error" handler, and the three terminal events are mutually exclusive per turn, so each idle attempt reports exactly one outcome.

Added CompactionHandler tests: success → [true], empty summary → [false], manual (non-idle) compaction → not called.

@ammar-agent

Copy link
Copy Markdown
Collaborator Author

@codex review

Please take another look — addressed the post-stream persistence concern.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 083e71d465

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/node/services/workspaceService.ts Outdated
Address Codex review: the requireIdle path returns a busy-skip when the
workspace becomes active after eligibility but before sendMessage. That is an
expected race, not a failure. Skip reporting it as an outcome so two normal
user-interaction races cannot suppress idle compaction for a healthy workspace.
Factor the busy-skip message into a shared constant.
@ammar-agent

Copy link
Copy Markdown
Collaborator Author

Addressed in c1a1b76.

The requireIdle busy-skip ("Workspace is busy; idle-only send was skipped.") is now treated as neutral — executeIdleCompaction no longer calls recordOutcome for it, so normal user-interaction races can't accumulate toward suppression. I factored the message into a shared IDLE_ONLY_BUSY_SKIP_MESSAGE constant (used by the requireIdle return, the early busy throw, and the skip check) to keep the contract DRY. Extended the busy-skip test to assert no outcome is reported.

@ammar-agent

Copy link
Copy Markdown
Collaborator Author

@codex review

Please take another look — the requireIdle busy-skip is now treated as neutral.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c1a1b76bc4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/node/services/idleCompactionService.ts
Address Codex review: an in-flight retry can persist a successful compaction
after suppression was set. recordOutcome(success) now also clears the
suppressed set, so the workspace self-heals and becomes eligible again.
@ammar-agent

Copy link
Copy Markdown
Collaborator Author

Addressed in 866a615.

recordOutcome now also clears suppressedWorkspaceIds on success, so a later in-flight retry that actually persists a compaction lifts the suppression and the workspace becomes eligible again (self-healing). Added a test: two failures suppress → a subsequent success makes the workspace eligible again.

@ammar-agent

Copy link
Copy Markdown
Collaborator Author

@codex review

Please take another look — suppression is now lifted on a later success.

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Breezy!

Reviewed commit: 866a615c91

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@ammario ammario merged commit 9ec890d into main Jun 15, 2026
23 checks passed
@ammario ammario deleted the fix/idle-compaction-stop-on-repeated-failure branch June 15, 2026 21:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants