text_editor: fix corrupted-history failure mode and cap history size#4010
Merged
epatey merged 12 commits intoMay 26, 2026
Merged
Conversation
Cap undo history to 10 and ignore failures in undo history file operations.
epatey
approved these changes
May 26, 2026
Collaborator
epatey
left a comment
There was a problem hiding this comment.
Thanks for the fix. I'm going to make the minor changes I suggested and then work on getting the new injectables deployed and this PR merged.
| pickle.dump(history, f) | ||
| f.flush() | ||
| os.fsync(f.fileno()) | ||
| os.replace(f.name, file_path) |
Collaborator
There was a problem hiding this comment.
Though this isn't actually a bug since f.name still exists, it reads a little funny to reference f outside of the with. If it were me, I'd indent the os.replace just to avoid the question.
epatey
reviewed
May 26, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes
text_editorundo-history persistence so corrupted history files do not cause the tool to return an error output. Also bounds undo history to the last 10 edits per file, so the undo-history file cannot grow huge.Background
This came up during a long-running eval where
text_editor str_replaceeventually started returningFailed to load history from /tmp/inspect_editor_history.pkl: pickle data was truncatedon every edit. The edits themselves appeared to apply, but this tool output is confusing.The model had made 873
text_editor str_replacecalls against the same file. That file had grown to roughly 4.1 MB / ~80,000 lines. The previousstr_replacecall before the breakage returned[ERROR] Command timed out before completing.Everystr_replacecall thereafter returned the error.Likely mechanism: The pickle held ~870 snapshots of an ever-growing file, so each save was a multi-megabyte rewrite. That made the pickle dump long enough to be interrupted: the tool timeout fired between the
open(..., "wb")truncate andpickle.dump()completing, and the on-disk pickle was left half-written.Changes
In src/inspect_sandbox_tools/.../text_editor.py:
_save_history()now writes via a new_atomic_pickle_dump()helper: dump to aNamedTemporaryFilein the same directory,flush()+fsync(), thenos.replace()over the target._load_history()and_save_history()now treat any exception other thanFileNotFoundErroras "history is unusable": log a warning, delete the file via_discard_history(), and return/continue with empty history. The next edit starts a fresh history.MAX_HISTORY_ENTRIES_PER_FILE = 10constant and_trim_history()helper._save_history()trims before writing, so the pickle never grows beyond ~10 snapshots per file.Design choices
_load_history()/_save_history(), discard the history and start from scratch, rather than returning a tool error. The edit history is less important than normal editing functionality being robust.Breaking Changes
None expected.
The pickle format is unchanged (still
dict[Path, list[str | -1]]), so an existing/tmp/inspect_editor_history.pklfrom a previous version would load fine; if it's larger than 10 entries per file, the next save trims it down silently.