Skip to content

text_editor: fix corrupted-history failure mode and cap history size#4010

Merged
epatey merged 12 commits into
UKGovernmentBEIS:mainfrom
tadamcz:codex-bound-text-editor-history
May 26, 2026
Merged

text_editor: fix corrupted-history failure mode and cap history size#4010
epatey merged 12 commits into
UKGovernmentBEIS:mainfrom
tadamcz:codex-bound-text-editor-history

Conversation

@tadamcz
Copy link
Copy Markdown
Contributor

@tadamcz tadamcz commented May 22, 2026

Summary

Fixes text_editor undo-history persistence so corrupted history files do not cause the tool to return an error output. Also bounds undo history to the last 10 edits per file, so the undo-history file cannot grow huge.

Background

This came up during a long-running eval where text_editor str_replace eventually started returning Failed to load history from /tmp/inspect_editor_history.pkl: pickle data was truncated on every edit. The edits themselves appeared to apply, but this tool output is confusing.

The model had made 873 text_editor str_replace calls against the same file. That file had grown to roughly 4.1 MB / ~80,000 lines. The previous str_replace call before the breakage returned [ERROR] Command timed out before completing. Every str_replace call thereafter returned the error.

Likely mechanism: The pickle held ~870 snapshots of an ever-growing file, so each save was a multi-megabyte rewrite. That made the pickle dump long enough to be interrupted: the tool timeout fired between the open(..., "wb") truncate and pickle.dump() completing, and the on-disk pickle was left half-written.

Changes

In src/inspect_sandbox_tools/.../text_editor.py:

  • Atomic history writes. _save_history() now writes via a new _atomic_pickle_dump() helper: dump to a NamedTemporaryFile in the same directory, flush() + fsync(), then os.replace() over the target.
  • Recover from a corrupt history file instead of poisoning every later edit. Both _load_history() and _save_history() now treat any exception other than FileNotFoundError as "history is unusable": log a warning, delete the file via _discard_history(), and return/continue with empty history. The next edit starts a fresh history.
  • Bounded per-file history. New MAX_HISTORY_ENTRIES_PER_FILE = 10 constant and _trim_history() helper. _save_history() trims before writing, so the pickle never grows beyond ~10 snapshots per file.

Design choices

  • N=10 seems sufficient
  • Decided against a broader rewrite (e.g. append-only log, or keeping the history in memory). Not worth it.
  • On unexpected additional exceptions in _load_history()/_save_history(), discard the history and start from scratch, rather than returning a tool error. The edit history is less important than normal editing functionality being robust.

Breaking Changes

None expected.

The pickle format is unchanged (still dict[Path, list[str | -1]]), so an existing /tmp/inspect_editor_history.pkl from a previous version would load fine; if it's larger than 10 entries per file, the next save trims it down silently.

@jjallaire jjallaire requested a review from epatey May 22, 2026 14:55
epatey added 2 commits May 26, 2026 08:49
Cap undo history to 10 and ignore failures in undo history file operations.
Copy link
Copy Markdown
Collaborator

@epatey epatey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix. I'm going to make the minor changes I suggested and then work on getting the new injectables deployed and this PR merged.

pickle.dump(history, f)
f.flush()
os.fsync(f.fileno())
os.replace(f.name, file_path)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though this isn't actually a bug since f.name still exists, it reads a little funny to reference f outside of the with. If it were me, I'd indent the os.replace just to avoid the question.

@epatey epatey merged commit 033745d into UKGovernmentBEIS:main May 26, 2026
26 of 27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants