Skip to content

fix: regenerate prompt tarball when an automation's prompt is edited#145

Merged
hieptl merged 5 commits into
mainfrom
fix/regenerate-prompt-tarball-on-edit
May 29, 2026
Merged

fix: regenerate prompt tarball when an automation's prompt is edited#145
hieptl merged 5 commits into
mainfrom
fix/regenerate-prompt-tarball-on-edit

Conversation

@hieptl
Copy link
Copy Markdown
Contributor

@hieptl hieptl commented May 29, 2026

Summary

After editing a prompt-based automation's prompt and dispatching it ("Run now"), the newly created conversation still runs the previous prompt. The UI shows the updated prompt, so the edit appears to have worked, but execution does not reflect it.

Steps to reproduce

  1. Create a prompt-based automation (e.g. prompt = Simply output "Hello World" and nothing else.).
  2. Open it, edit the prompt to Simply output "Hello OpenHands" and nothing else., save.
  3. Confirm the detail page shows the updated prompt.
  4. Click Run now.
  5. Open the conversation that was created.

Expected

The conversation runs the updated prompt (… "Hello OpenHands" …).

Actual

The conversation runs the original prompt (… "Hello World" …).

Root cause

A prompt-based automation bakes its prompt into a generated tarball as a static prompt.txt file at creation time (POST /v1/preset/prompt_generate_tarball). The automation row also stores prompt, but that column is display metadata only.

  • PATCH /v1/{automation_id} (update_automation) updated only the prompt column via a plain setattr loop — it never rebuilt the tarball.
  • POST /v1/{automation_id}/dispatch executes the tarball + entrypoint (dispatcherexecution extracts the tarball and runs main.py, which reads prompt.txt). The prompt column is never read at dispatch.

So after an edit, the tarball still contained the original prompt.txt, and every dispatch re-ran the stale prompt.

Fix (backend-only)

When update_automation receives a changed prompt for an internal-upload preset, rebuild the tarball by swapping prompt.txt in place (preserving all other files) and repoint tarball_path at the new upload. See PR description below.

Acceptance criteria

  • Editing a prompt-based automation's prompt and dispatching it runs the new prompt.
  • Editing non-prompt fields (name, schedule, model, enabled) does not rebuild the tarball.
  • Plugin presets and repository configuration are preserved across a prompt edit.
  • Non-preset automations (external tarball_path) are unaffected by a prompt edit.
  • agent-canvas requires no change.

Notes / rollout dependency

The fix must be released and the agent-canvas automation version pin (config/defaults.jsonversions.automation) bumped to a build that includes it. The frontend behavior is already correct.

Summary

Editing a prompt-based automation's prompt now takes effect on the next run. Previously the edit updated only the stored prompt field while the dispatcher kept executing the original prompt baked into the automation's tarball.

Problem

  • Prompt presets bake the prompt into prompt.txt inside a tarball at creation (_generate_tarball); dispatch executes that tarball (main.py reads prompt.txt).
  • PATCH /v1/{automation_id} updated only the prompt column and left the tarball untouched, so Run now re-ran the stale prompt even though the UI showed the new one.

Solution

  • openhands/automation/preset_router.py

    • _replace_prompt_in_tarball(tarball_bytes, new_prompt) — pure helper that returns a copy of a preset tarball with prompt.txt swapped and every other member copied through unchanged (modes preserved); returns None if there's no prompt.txt.
    • regenerate_preset_prompt_tarball(automation, new_prompt, session) — for an internal-upload preset, reads the current tarball, rewrites the prompt, uploads the result as a new internal upload, and returns the new oh-internal:// URL. The file store is resolved lazily (only after confirming it's an internal-upload preset), and the function returns None for external tarballs / missing uploads / archives without prompt.txt.
  • openhands/automation/router.py

    • update_automation calls the helper when the prompt changed ("prompt" in update_data and isinstance(auto.prompt, str)) and repoints tarball_path to the rebuilt tarball.

Why swap in place (vs. regenerate from the current template)

Rewriting only prompt.txt preserves plugin and repository configuration and the exact, already-working template — so it's correct for both prompt and plugin presets and carries no template-drift risk. It keeps the change scoped to the reported bug (it does not attempt the separate "refresh stale SDK templates" concern from the previously-closed PR #127).

Compatibility

  • Non-prompt edits (name/schedule/model/enabled) never touch the tarball.
  • Non-preset automations (external tarball_path) are unaffected.
  • No DB schema change (reuses the existing TarballUpload flow); no migration.
  • No agent-canvas change required.

Testing

  • tests/test_preset_router.py::TestReplacePromptInTarball (no Docker)

    • prompt swapped + sibling files (main.py, setup.sh, plugins_config.json, repos_config.json) preserved, setup.sh stays executable.
    • returns None when the tarball has no prompt.txt.
  • tests/test_router.py::TestUpdateAutomation (DB-backed)

    • test_update_prompt_regenerates_preset_tarball: PATCH prompt → tarball_path changes and the regenerated tarball's prompt.txt holds the new prompt.
    • test_update_name_does_not_regenerate_preset_tarball: PATCH name → name updates, tarball_path unchanged.

Local verification: ruff check, ruff format, pyright clean; the pure helper tests pass. The DB-backed endpoint tests run in CI (they require the Postgres testcontainer / Docker, which was unavailable in the local environment).

Rollout

After merge, release the package and bump agent-canvas's config/defaults.jsonversions.automation to a build containing this change.

Issue Number

Resolves OpenHands/agent-canvas#869

@hieptl hieptl self-assigned this May 29, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 29, 2026

Coverage

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 29, 2026

🚀 Deploy Preview PR Created/Updated

A deploy preview has been created/updated for this PR.

Deploy PR: https://github.com/OpenHands/deploy/pull/4470
Automation SHA: d5e7001b0e9984f5228726d7185eb0394ea003d7
Last updated: May 29, 2026, 07:51:36 AM ET

Once the deploy PR's CI passes, the automation service will be deployed to the feature environment.

@hieptl hieptl requested a review from all-hands-bot May 29, 2026 09:44
Copy link
Copy Markdown
Contributor

all-hands-bot commented May 29, 2026

Review complete.

This review was performed through OpenHands Cloud Automation. You can log in and view the conversation here.

Copy link
Copy Markdown
Contributor

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix is correct and well-motivated: the root cause (tarball baked at creation time, prompt column is display-only) is accurately identified, the patch is minimal and scoped to the reported bug, and the tests cover the two key paths (prompt-changed → new tarball, non-prompt field changed → tarball unchanged). Two issues worth addressing before merge:

  1. Orphaned uploads — each prompt edit mints a new TarballUpload row and file without ever touching the old one. Repeated edits will quietly accumulate unreferenced records and files.
  2. HTTPException raised inside a domain helperregenerate_preset_prompt_tarball is imported by router.py but independently raises an HTTP 500. This couples the domain logic to the HTTP layer and makes the helper harder to reuse (e.g., a background task caller would get an unhandled HTTPException).

This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation. View conversation

Comment thread openhands/automation/preset_router.py
Comment thread openhands/automation/preset_router.py
Comment thread openhands/automation/router.py Outdated
@hieptl hieptl requested a review from all-hands-bot May 29, 2026 11:16
Copy link
Copy Markdown
Contributor

all-hands-bot commented May 29, 2026

Review complete.

This review was performed through OpenHands Cloud Automation. You can log in and view the conversation here.

Copy link
Copy Markdown
Contributor

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Acceptable — Well-scoped bug fix with the right approach. A few things worth addressing before merge.


Summary

The root cause analysis is correct and the "swap prompt.txt in place" strategy is the right call: it's non-invasive, preserves plugin/repo configuration, and eliminates template-drift risk. The test coverage is solid for the happy path. A couple of implementation details need attention.


Issues

[IMPROVEMENT OPPORTUNITIES]

preset_router.py, line 326 — Dead-code flush in error path

The await session.flush() inside the except block (before raise HTTPException(...)) is never committed. FastAPI's exception handler rolls back the transaction when an HTTPException is raised, discarding the upload.status = UploadStatus.FAILED and upload.error_message mutations along with the flush itself. This looks like it persists the failure state, but it doesn't — remove the line to avoid misleading future readers.


preset_router.py, line 345 — deleted_at set unconditionally after a potentially-failed file delete

    try:
        file_store.delete(source_upload.storage_path)
    except FileNotFoundError:
        pass
    except Exception as e:
        logger.exception(...)        # logs, then falls through
    source_upload.deleted_at = utcnow()   # ← always executed

If file_store.delete raises any non-FileNotFoundError exception, execution falls through to source_upload.deleted_at = utcnow(). The DB record is soft-deleted even though the physical file may still exist on disk — a silent orphan that storage-accounting queries will miss.

If the intent is "always retire the DB record regardless of file deletion outcome" (reasonable, assuming a scrubber handles orphaned files), that should be documented with a comment. If the intent is "only soft-delete when the file is confirmed gone", move deleted_at inside the try block.


[TESTING GAPS]

No test for the upload-failure path

The except Exception as e branch in regenerate_preset_prompt_tarball (lines 322–330) raises an HTTPException. There is currently no test verifying:

  1. A 500 is returned when write_stream fails.
  2. The original tarball_path is unchanged (no half-committed state leaks through).

A test along the lines of:

async def test_update_prompt_upload_failure_returns_500(self, async_client, async_session, preset_store):
    automation = await _seed_prompt_preset_automation(async_session, preset_store, "Original")
    original_path = automation.tarball_path
    preset_store.write_stream = AsyncMock(side_effect=RuntimeError("storage down"))
    response = await async_client.patch(
        f"/api/automation/v1/{automation.id}", json={"prompt": "New"}
    )
    assert response.status_code == 500
    await async_session.refresh(automation)
    assert automation.tarball_path == original_path

What's good

  • _replace_prompt_in_tarball is a clean, pure function — easy to test and reason about in isolation.
  • Lazy get_file_store() resolution is the right call: non-preset PATCH paths never pay the cost.
  • The guard auto.prompt != original_prompt in router.py correctly skips regeneration on no-op edits.
  • Tests for the happy path, name-only edit, and same-prompt no-op are exactly what's needed.

[RISK ASSESSMENT]

  • [Overall PR] ⚠️ Risk Assessment: 🟡 MEDIUM
    The change touches the PATCH handler for automations (a frequently-exercised path) and introduces a new file-store read + write on every prompt edit. The scope is deliberately narrow and non-preset automations are unaffected, which limits blast radius. The dead-code flush and unconditional deleted_at are correctness concerns but not show-stoppers. CI coverage for the failure path is missing.

VERDICT:
Worth merging with the flush removed and the deleted_at behavior either fixed or explicitly documented.

KEY INSIGHT:
The await session.flush() in the error path is a silent no-op that misleads readers into thinking the failure state is persisted — remove it.

This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation. View conversation


Improve this review? If any feedback above seems incorrect or irrelevant to this repository, you can teach the reviewer to do better:

  1. Add a .agents/skills/custom-codereview-guide.md file to your branch (or edit it if one already exists) with the /codereview trigger and the context the reviewer is missing.
  2. Re-request a review — the reviewer reads guidelines from the PR branch, so your changes take effect immediately.

Was this review helpful? React with 👍 or 👎 to give feedback.

Comment thread openhands/automation/preset_router.py Outdated
Comment thread openhands/automation/preset_router.py Outdated
@hieptl hieptl requested a review from all-hands-bot May 29, 2026 11:50
Copy link
Copy Markdown
Contributor

all-hands-bot commented May 29, 2026

Review complete.

This review was performed through OpenHands Cloud Automation. You can log in and view the conversation here.

Copy link
Copy Markdown
Contributor

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review — fix: regenerate prompt tarball when an automation's prompt is edited

🟢 Good taste — Elegant, well-scoped fix with comprehensive tests.


Summary

The root cause analysis in the PR description is accurate and thorough. A prompt-preset automation bakes its prompt into prompt.txt inside a tarball at creation time; PATCH /v1/{id} was updating only the prompt column (display metadata) while the dispatcher kept executing the original baked tarball. The fix is minimal and correctly targeted.


What is well done

_replace_prompt_in_tarball (pure helper)

  • Side-effect free, easy to unit test, and well-isolated from I/O concerns.
  • Correctly handles non-regular-file members (directories, symlinks) in the else: dst.addfile(member) branch — these are absent from current automation tarballs but the code is forward-compatible.
  • Returns None when no prompt.txt is found, letting callers opt out gracefully without exceptions.

regenerate_preset_prompt_tarball

  • Handles all real failure modes: external tarball path, missing upload record, missing file, archive without prompt.txt — each returns None cleanly.
  • The soft-delete gate if file_removed: source_upload.deleted_at = utcnow() is exactly right: the DB record stays live (and the file stays discoverable) if deletion fails, preventing a hidden orphan on disk.
  • The comment explaining why the flushed-but-not-committed FAILED status is intentionally ephemeral is a useful maintenance hint.

router.py — update logic

  • Capturing original_prompt = auto.prompt before the setattr loop and comparing auto.prompt != original_prompt correctly prevents tarball rebuilds on no-op edits.
  • The isinstance(auto.prompt, str) guard is appropriate defensive coding against a prompt: null PATCH (schema validation should catch it, but belt-and-suspenders is right here).
  • Non-preset automations and non-prompt field edits are left completely untouched.

Tests

  • TestReplacePromptInTarball tests the pure helper with a real tarball (no mocks) — correctly verifies that sibling files are byte-for-byte identical and that setup.sh mode bits are preserved.
  • test_update_prompt_regenerates_preset_tarball exercises the full DB-backed path and verifies the new tarball's prompt.txt content directly.
  • test_update_prompt_upload_failure_returns_500 proves that a storage failure returns 500 and leaves the automation pointing at its original tarball — no half-committed state can leak through.
  • test_update_unchanged_prompt_does_not_regenerate_tarball confirms the no-op guard works.

Minor observation (non-blocking)

TarInfo field preservation in _replace_prompt_in_tarball: when copying non-prompt files, the new TarInfo objects preserve name, size, mode, and mtime but drop uid, gid, uname, and gname. For automation tarballs executed in a controlled sandbox environment this is benign, but it is a subtle behavioral difference from the original archive. A one-line comment acknowledging this intentional choice (e.g. # uid/gid omitted — not meaningful in the sandbox environment) would help future maintainers who inspect the archives.


[RISK ASSESSMENT]

  • [Overall PR] ⚠️ Risk Assessment: 🟢 LOW
    • The change is purely additive to the PATCH code path; the dispatch path is untouched.
    • No DB schema changes; the fix reuses the existing TarballUpload flow.
    • Non-preset automations are unaffected (guarded by parse_internal_upload_id returning None).
    • Failure mode is a 500 with transaction rollback — no partial state can persist.

VERDICT:Worth merging — the fix is correct, the tests are real and cover the failure path, and all previously raised review concerns have been addressed.

KEY INSIGHT: The "swap in place" approach (rewriting only prompt.txt) is the right call — it avoids template-drift risk and preserves plugin/repo configuration without needing to re-derive any other tarball state.


Improve this review? If any feedback above seems incorrect or irrelevant to this repository, you can teach the reviewer to do better by adding a .agents/skills/custom-codereview-guide.md file to your branch. See the customization docs for the required frontmatter format.

Was this review helpful? React with 👍 or 👎 to give feedback.


This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation. View conversation

@hieptl hieptl merged commit 4cba245 into main May 29, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Automation Executes With Stale Prompt After Prompt Update

2 participants