Skip to content

Fix runtime install wait race#24

Merged
decriptor merged 4 commits into
RoonLabs:mainfrom
alceops:alce/fix-runtime-install-race
May 13, 2026
Merged

Fix runtime install wait race#24
decriptor merged 4 commits into
RoonLabs:mainfrom
alceops:alce/fix-runtime-install-race

Conversation

@alceops
Copy link
Copy Markdown
Contributor

@alceops alceops commented Apr 30, 2026

Summary

  • wait for the actual Server/RoonServer launcher and RoonDotnet runtime directory before the fresh-install runtime assertions run
  • keep VERSION as part of the readiness check, but no longer treat it as the sole install-complete sentinel
  • add timeout diagnostics that show which install artifact is still missing

Fixes #23.

Verification

  • bash -n tests/runtime.sh
  • git diff --check

I did not run the full Docker runtime workflow in this worker; the change is limited to the shell readiness predicate used by that workflow.

@gtunes-dev
Copy link
Copy Markdown
Collaborator

I don't think this is the right fix. It essentially bakes the smoke tests into the wait function (adds all the file tests to the wait) instead of maintaining the separation of concern between :

  1. wait for the installation to complete
  2. verify the post installation state

I prefer the cleanliness of my proposed fix - wait for the log signal (which already a function of the smoke harness) that indicates that the installation completed. Make no assumptions about what was installed. Then allow the smoke tests to validate the installation itself.

Just my opinion.

@alceops
Copy link
Copy Markdown
Contributor Author

alceops commented Apr 30, 2026

Thanks — agreed on keeping install-complete signaling separate from the artifact assertions. I updated the PR in 77947bd to wait on the entrypoint's final Branch: log signal before the runtime assertions/pre-switch stops, and removed the artifact-specific wait logic.

Verification here: bash -n tests/runtime.sh and git diff --check passed; I still did not run the full Docker runtime flow locally.

@gtunes-dev
Copy link
Copy Markdown
Collaborator

gtunes-dev commented Apr 30, 2026

There is a very simple approach to this, which is what I proposed in the issue I created. It's just to change wait_for_install to this. That's the entire fix. It's not branch specific, it has no opinion about what the installation completed state is. It just simply waits until the installation is done and returns. If the installation is already done, it sees that and returns immediately.

This is up to the maintainers but I think you should consider modifying your PR to do just this. And test it, too, please. I've tested it pretty extensively and it works.

wait_for_install() {
    local dir="$1"
    local timeout="${2:-180}"
    echo "    Waiting for RoonServer install to complete..."
    wait_for_log "$CONTAINER" "RoonServer installed successfully" "$timeout"
}

@alceops
Copy link
Copy Markdown
Contributor Author

alceops commented Apr 30, 2026

Updated again in fd2c3a0 to match the simpler issue proposal: wait_for_install now just waits for the existing RoonServer installed successfully log signal, and I restored the call sites rather than moving branch-specific waits into the runtime checks.

Verification now completed locally:

  • docker build -t roon-docker-runtime-test:alce-pr24 .
  • ./tests/runtime.sh roon-docker-runtime-test:alce-pr2441 passed, 0 failed
  • git diff --check
  • bash -n tests/runtime.sh

@gtunes-dev
Copy link
Copy Markdown
Collaborator

Thanks for adapting this @alceops! Very much appreciated!

@decriptor
Copy link
Copy Markdown
Collaborator

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a race condition in tests/runtime.sh where wait_for_install returned as soon as the VERSION file appeared in the tarball extraction, but later artifacts (e.g. Server/RoonServer) might still be missing, causing spurious test failures (#23). The function now waits for the entrypoint's RoonServer installed successfully log line, which is emitted only after tar finishes extracting.

Changes:

  • Replace filesystem polling for VERSION with wait_for_log on the RoonServer installed successfully message.
  • Remove the per-iteration sleep/elapsed loop and its progress output.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/runtime.sh
Comment thread tests/runtime.sh Outdated
Address Copilot review feedback. After switching the body to a log-based
wait, the dir positional was bound but never read, and the function
silently relied on the outer-scope CONTAINER. Removing the parameter
makes the signature truthful and updates all seven call sites.
@decriptor decriptor merged commit 4b98ccd into RoonLabs:main May 13, 2026
2 checks passed
@decriptor
Copy link
Copy Markdown
Collaborator

Thanks @alceops and @gtunes-dev

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Race condition in runtime test causes spurious build workflow failure

4 participants