Skip to content

feat: add fuzzy text matching fallbacks for edit_file tool#2295

Closed
trungutt wants to merge 3 commits intodocker:mainfrom
trungutt:feat-edit-file-fuzzy-matching
Closed

feat: add fuzzy text matching fallbacks for edit_file tool#2295
trungutt wants to merge 3 commits intodocker:mainfrom
trungutt:feat-edit-file-fuzzy-matching

Conversation

@trungutt
Copy link
Copy Markdown
Contributor

@trungutt trungutt commented Mar 31, 2026

Summary

  • Add fuzzy text matching to edit_file so common LLM mistakes (whitespace, line continuations, escaped quotes) no longer cause "old text not found" failures
  • Uses a candidate-based approach inspired by OpenCode: each replacer yields verbatim original-content substrings, eliminating all reverse position mapping

Replaces #2145, which had critical bugs in the binary-search-based boundary mapping (review).

Motivation

When editing files containing escaped quotes (e.g., echo \"brew install failed\"), the LLM writes echo "hello" in the edit_file arguments but the file has echo \"hello\". The backslash is lost during JSON round-tripping and the exact-match strings.Contains rejects it. This is a systemic problem with any file containing escaped quotes, line continuations, or indentation differences.

Approach

When exact match fails, try 5 progressively looser strategies:

# Strategy Handles
1 Line-trimmed Extra/missing whitespace per line
2 Indentation-flexible Different indent levels
3 Line-continuation Collapsed \ + newline in Dockerfiles/shell
4 Escape-normalized \" vs " mismatch from JSON round-tripping
5 Whitespace-collapsed Multiple spaces/tabs collapsed to one

Each replacer returns verbatim substrings of the original content, verified with strings.Index. No position mapping, no binary search, no boundary bugs.

Both the builtin and ACP filesystem handlers use the shared FindAndReplace function.

Gordon and others added 2 commits March 31, 2026 11:41
Inspired by the resilient edit matching in OpenCode, add fuzzy text
matching to edit_file. When an exact oldText match fails, try
progressively looser normalization strategies before giving up:

1. Line-trimmed (strip trailing whitespace per line)
2. Indentation-flexible (strip leading whitespace per line)
3. Line-continuation normalization (collapse \ + newline + spaces)
4. Escape-normalized (\" ↔ ")
5. Whitespace-collapsed (collapse runs of whitespace to single space)

These handle common LLM mistakes such as adding spurious indentation,
collapsing shell line continuations into single lines, or unescaping
quotes. The replacement is always applied to the original content so
surrounding text is preserved.

Index mapping between normalized and original strings uses binary
search with rune-aware boundary snapping to preserve UTF-8 integrity.

Both the builtin and ACP filesystem handlers now use the shared
FindAndReplace function.
…oach

Address reviewer feedback on the fuzzy matching implementation. The
previous approach used binary search with mapNormToOrig/mapNormToOrigEnd
to reverse-map normalized positions back to original positions. This had
critical bugs:

1. Binary search assumed monotonically non-decreasing normalized lengths,
   which doesn't hold for collapseLineContinuations (adding a newline
   after a backslash can decrease the normalized length).

2. mapNormToOrigEnd's consumed-character detection was incorrect for
   character-consuming normalizations like whitespace collapsing.

Replace with an approach inspired by OpenCode (github.com/anomalyco/opencode):
each fuzzy replacer finds matching regions and returns the verbatim
original text at those regions. The caller verifies with strings.Index
and replaces directly. This eliminates all position mapping entirely.

Also add comprehensive edge-case tests covering UTF-8 content with
collapsing normalizations, multi-byte boundaries, escaped quotes in
Dockerfiles, and multiple line continuations with surrounding content.
@trungutt trungutt requested a review from a team as a code owner March 31, 2026 10:03
@trungutt trungutt marked this pull request as draft March 31, 2026 12:10
Fix 5 lint issues from CI:
- unused whitespaceRE variable
- strings.Index → strings.Cut
- classic for loops → integer range (Go 1.22+)
- len(s) == 0 → s == "" (gocritic)

Fix mixed escaping bug: when a file has both escaped and unescaped
quotes (e.g. "${DIR}" with plain quotes and echo \"hello\" with
escaped ones), the global strings.ReplaceAll approach fails. Add a
regex fallback (Case 3) that matches each " as optionally preceded
by a backslash.
@trungutt trungutt force-pushed the feat-edit-file-fuzzy-matching branch from 7ebbc9d to 843d75c Compare March 31, 2026 12:28
@trungutt trungutt closed this Mar 31, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant