fix: regenerate prompt tarball when an automation's prompt is edited#145
Conversation
|
🚀 Deploy Preview PR Created/Updated A deploy preview has been created/updated for this PR. Deploy PR: https://github.com/OpenHands/deploy/pull/4470 Once the deploy PR's CI passes, the automation service will be deployed to the feature environment. |
|
✅ Review complete. This review was performed through OpenHands Cloud Automation. You can log in and view the conversation here. |
all-hands-bot
left a comment
There was a problem hiding this comment.
The fix is correct and well-motivated: the root cause (tarball baked at creation time, prompt column is display-only) is accurately identified, the patch is minimal and scoped to the reported bug, and the tests cover the two key paths (prompt-changed → new tarball, non-prompt field changed → tarball unchanged). Two issues worth addressing before merge:
- Orphaned uploads — each prompt edit mints a new
TarballUploadrow and file without ever touching the old one. Repeated edits will quietly accumulate unreferenced records and files. HTTPExceptionraised inside a domain helper —regenerate_preset_prompt_tarballis imported byrouter.pybut independently raises an HTTP 500. This couples the domain logic to the HTTP layer and makes the helper harder to reuse (e.g., a background task caller would get an unhandledHTTPException).
This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation. View conversation
|
✅ Review complete. This review was performed through OpenHands Cloud Automation. You can log in and view the conversation here. |
all-hands-bot
left a comment
There was a problem hiding this comment.
🟡 Acceptable — Well-scoped bug fix with the right approach. A few things worth addressing before merge.
Summary
The root cause analysis is correct and the "swap prompt.txt in place" strategy is the right call: it's non-invasive, preserves plugin/repo configuration, and eliminates template-drift risk. The test coverage is solid for the happy path. A couple of implementation details need attention.
Issues
[IMPROVEMENT OPPORTUNITIES]
preset_router.py, line 326 — Dead-code flush in error path
The await session.flush() inside the except block (before raise HTTPException(...)) is never committed. FastAPI's exception handler rolls back the transaction when an HTTPException is raised, discarding the upload.status = UploadStatus.FAILED and upload.error_message mutations along with the flush itself. This looks like it persists the failure state, but it doesn't — remove the line to avoid misleading future readers.
preset_router.py, line 345 — deleted_at set unconditionally after a potentially-failed file delete
try:
file_store.delete(source_upload.storage_path)
except FileNotFoundError:
pass
except Exception as e:
logger.exception(...) # logs, then falls through
source_upload.deleted_at = utcnow() # ← always executedIf file_store.delete raises any non-FileNotFoundError exception, execution falls through to source_upload.deleted_at = utcnow(). The DB record is soft-deleted even though the physical file may still exist on disk — a silent orphan that storage-accounting queries will miss.
If the intent is "always retire the DB record regardless of file deletion outcome" (reasonable, assuming a scrubber handles orphaned files), that should be documented with a comment. If the intent is "only soft-delete when the file is confirmed gone", move deleted_at inside the try block.
[TESTING GAPS]
No test for the upload-failure path
The except Exception as e branch in regenerate_preset_prompt_tarball (lines 322–330) raises an HTTPException. There is currently no test verifying:
- A 500 is returned when
write_streamfails. - The original
tarball_pathis unchanged (no half-committed state leaks through).
A test along the lines of:
async def test_update_prompt_upload_failure_returns_500(self, async_client, async_session, preset_store):
automation = await _seed_prompt_preset_automation(async_session, preset_store, "Original")
original_path = automation.tarball_path
preset_store.write_stream = AsyncMock(side_effect=RuntimeError("storage down"))
response = await async_client.patch(
f"/api/automation/v1/{automation.id}", json={"prompt": "New"}
)
assert response.status_code == 500
await async_session.refresh(automation)
assert automation.tarball_path == original_pathWhat's good
_replace_prompt_in_tarballis a clean, pure function — easy to test and reason about in isolation.- Lazy
get_file_store()resolution is the right call: non-preset PATCH paths never pay the cost. - The guard
auto.prompt != original_promptinrouter.pycorrectly skips regeneration on no-op edits. - Tests for the happy path, name-only edit, and same-prompt no-op are exactly what's needed.
[RISK ASSESSMENT]
- [Overall PR]
⚠️ Risk Assessment: 🟡 MEDIUM
The change touches the PATCH handler for automations (a frequently-exercised path) and introduces a new file-store read + write on every prompt edit. The scope is deliberately narrow and non-preset automations are unaffected, which limits blast radius. The dead-code flush and unconditionaldeleted_atare correctness concerns but not show-stoppers. CI coverage for the failure path is missing.
VERDICT:
✅ Worth merging with the flush removed and the deleted_at behavior either fixed or explicitly documented.
KEY INSIGHT:
The await session.flush() in the error path is a silent no-op that misleads readers into thinking the failure state is persisted — remove it.
This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation. View conversation
Improve this review? If any feedback above seems incorrect or irrelevant to this repository, you can teach the reviewer to do better:
- Add a
.agents/skills/custom-codereview-guide.mdfile to your branch (or edit it if one already exists) with the/codereviewtrigger and the context the reviewer is missing.- Re-request a review — the reviewer reads guidelines from the PR branch, so your changes take effect immediately.
Was this review helpful? React with 👍 or 👎 to give feedback.
|
✅ Review complete. This review was performed through OpenHands Cloud Automation. You can log in and view the conversation here. |
all-hands-bot
left a comment
There was a problem hiding this comment.
Code Review — fix: regenerate prompt tarball when an automation's prompt is edited
🟢 Good taste — Elegant, well-scoped fix with comprehensive tests.
Summary
The root cause analysis in the PR description is accurate and thorough. A prompt-preset automation bakes its prompt into prompt.txt inside a tarball at creation time; PATCH /v1/{id} was updating only the prompt column (display metadata) while the dispatcher kept executing the original baked tarball. The fix is minimal and correctly targeted.
What is well done
_replace_prompt_in_tarball (pure helper)
- Side-effect free, easy to unit test, and well-isolated from I/O concerns.
- Correctly handles non-regular-file members (directories, symlinks) in the
else: dst.addfile(member)branch — these are absent from current automation tarballs but the code is forward-compatible. - Returns
Nonewhen noprompt.txtis found, letting callers opt out gracefully without exceptions.
regenerate_preset_prompt_tarball
- Handles all real failure modes: external tarball path, missing upload record, missing file, archive without
prompt.txt— each returnsNonecleanly. - The soft-delete gate
if file_removed: source_upload.deleted_at = utcnow()is exactly right: the DB record stays live (and the file stays discoverable) if deletion fails, preventing a hidden orphan on disk. - The comment explaining why the flushed-but-not-committed
FAILEDstatus is intentionally ephemeral is a useful maintenance hint.
router.py — update logic
- Capturing
original_prompt = auto.promptbefore thesetattrloop and comparingauto.prompt != original_promptcorrectly prevents tarball rebuilds on no-op edits. - The
isinstance(auto.prompt, str)guard is appropriate defensive coding against aprompt: nullPATCH (schema validation should catch it, but belt-and-suspenders is right here). - Non-preset automations and non-prompt field edits are left completely untouched.
Tests
TestReplacePromptInTarballtests the pure helper with a real tarball (no mocks) — correctly verifies that sibling files are byte-for-byte identical and thatsetup.shmode bits are preserved.test_update_prompt_regenerates_preset_tarballexercises the full DB-backed path and verifies the new tarball'sprompt.txtcontent directly.test_update_prompt_upload_failure_returns_500proves that a storage failure returns 500 and leaves the automation pointing at its original tarball — no half-committed state can leak through.test_update_unchanged_prompt_does_not_regenerate_tarballconfirms the no-op guard works.
Minor observation (non-blocking)
TarInfo field preservation in _replace_prompt_in_tarball: when copying non-prompt files, the new TarInfo objects preserve name, size, mode, and mtime but drop uid, gid, uname, and gname. For automation tarballs executed in a controlled sandbox environment this is benign, but it is a subtle behavioral difference from the original archive. A one-line comment acknowledging this intentional choice (e.g. # uid/gid omitted — not meaningful in the sandbox environment) would help future maintainers who inspect the archives.
[RISK ASSESSMENT]
- [Overall PR]
⚠️ Risk Assessment: 🟢 LOW- The change is purely additive to the
PATCHcode path; the dispatch path is untouched. - No DB schema changes; the fix reuses the existing
TarballUploadflow. - Non-preset automations are unaffected (guarded by
parse_internal_upload_idreturningNone). - Failure mode is a 500 with transaction rollback — no partial state can persist.
- The change is purely additive to the
VERDICT: ✅ Worth merging — the fix is correct, the tests are real and cover the failure path, and all previously raised review concerns have been addressed.
KEY INSIGHT: The "swap in place" approach (rewriting only prompt.txt) is the right call — it avoids template-drift risk and preserves plugin/repo configuration without needing to re-derive any other tarball state.
Improve this review? If any feedback above seems incorrect or irrelevant to this repository, you can teach the reviewer to do better by adding a
.agents/skills/custom-codereview-guide.mdfile to your branch. See the customization docs for the required frontmatter format.Was this review helpful? React with 👍 or 👎 to give feedback.
This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation. View conversation
Summary
After editing a prompt-based automation's prompt and dispatching it ("Run now"), the newly created conversation still runs the previous prompt. The UI shows the updated prompt, so the edit appears to have worked, but execution does not reflect it.
Steps to reproduce
Simply output "Hello World" and nothing else.).Simply output "Hello OpenHands" and nothing else., save.Expected
The conversation runs the updated prompt (
… "Hello OpenHands" …).Actual
The conversation runs the original prompt (
… "Hello World" …).Root cause
A prompt-based automation bakes its prompt into a generated tarball as a static
prompt.txtfile at creation time (POST /v1/preset/prompt→_generate_tarball). The automation row also storesprompt, but that column is display metadata only.PATCH /v1/{automation_id}(update_automation) updated only thepromptcolumn via a plainsetattrloop — it never rebuilt the tarball.POST /v1/{automation_id}/dispatchexecutes the tarball + entrypoint (dispatcher→executionextracts the tarball and runsmain.py, which readsprompt.txt). Thepromptcolumn is never read at dispatch.So after an edit, the tarball still contained the original
prompt.txt, and every dispatch re-ran the stale prompt.Fix (backend-only)
When
update_automationreceives a changed prompt for an internal-upload preset, rebuild the tarball by swappingprompt.txtin place (preserving all other files) and repointtarball_pathat the new upload. See PR description below.Acceptance criteria
tarball_path) are unaffected by a prompt edit.agent-canvasrequires no change.Notes / rollout dependency
The fix must be released and the
agent-canvasautomation version pin (config/defaults.json→versions.automation) bumped to a build that includes it. The frontend behavior is already correct.Summary
Editing a prompt-based automation's prompt now takes effect on the next run. Previously the edit updated only the stored
promptfield while the dispatcher kept executing the original prompt baked into the automation's tarball.Problem
prompt.txtinside a tarball at creation (_generate_tarball);dispatchexecutes that tarball (main.pyreadsprompt.txt).PATCH /v1/{automation_id}updated only thepromptcolumn and left the tarball untouched, soRun nowre-ran the stale prompt even though the UI showed the new one.Solution
openhands/automation/preset_router.py_replace_prompt_in_tarball(tarball_bytes, new_prompt)— pure helper that returns a copy of a preset tarball withprompt.txtswapped and every other member copied through unchanged (modes preserved); returnsNoneif there's noprompt.txt.regenerate_preset_prompt_tarball(automation, new_prompt, session)— for an internal-upload preset, reads the current tarball, rewrites the prompt, uploads the result as a new internal upload, and returns the newoh-internal://URL. The file store is resolved lazily (only after confirming it's an internal-upload preset), and the function returnsNonefor external tarballs / missing uploads / archives withoutprompt.txt.openhands/automation/router.pyupdate_automationcalls the helper when the prompt changed ("prompt" in update_data and isinstance(auto.prompt, str)) and repointstarball_pathto the rebuilt tarball.Why swap in place (vs. regenerate from the current template)
Rewriting only
prompt.txtpreserves plugin and repository configuration and the exact, already-working template — so it's correct for both prompt and plugin presets and carries no template-drift risk. It keeps the change scoped to the reported bug (it does not attempt the separate "refresh stale SDK templates" concern from the previously-closed PR #127).Compatibility
tarball_path) are unaffected.TarballUploadflow); no migration.agent-canvaschange required.Testing
tests/test_preset_router.py::TestReplacePromptInTarball(no Docker)main.py,setup.sh,plugins_config.json,repos_config.json) preserved,setup.shstays executable.Nonewhen the tarball has noprompt.txt.tests/test_router.py::TestUpdateAutomation(DB-backed)test_update_prompt_regenerates_preset_tarball: PATCH prompt →tarball_pathchanges and the regenerated tarball'sprompt.txtholds the new prompt.test_update_name_does_not_regenerate_preset_tarball: PATCH name → name updates,tarball_pathunchanged.Local verification:
ruff check,ruff format,pyrightclean; the pure helper tests pass. The DB-backed endpoint tests run in CI (they require the Postgres testcontainer / Docker, which was unavailable in the local environment).Rollout
After merge, release the package and bump
agent-canvas'sconfig/defaults.json→versions.automationto a build containing this change.Issue Number
Resolves OpenHands/agent-canvas#869