feat: 支持开发板锁在 Cancel 时自动释放并补充使用文档#13
Open
yoinspiration wants to merge 7 commits intoarceos-hypervisor:mainfrom
Open
feat: 支持开发板锁在 Cancel 时自动释放并补充使用文档#13yoinspiration wants to merge 7 commits intoarceos-hypervisor:mainfrom
yoinspiration wants to merge 7 commits intoarceos-hypervisor:mainfrom
Conversation
Introduce lock-watcher helper and docs so cancelled workflows can safely release board file locks across organizations. Made-with: Cursor
Contributor
|
是否可以直接与 runner.sh 集合起来简化使用,与锁一样,做成使用无感的 |
- runner.sh 新增 watcher 子命令,复用 .env,需 RUNNER_LOCK_MONITOR_TOKEN - 使用说明:部署流程概览、tmux/screen/systemd 常驻、宿主机锁目录权限(2.1) Made-with: Cursor
9c6147a to
c8c9cc6
Compare
- runner.sh watcher 不传参时导出 RUNNER_RESOURCE_IDS(roc + phytiumpi),传参仍为单板 - lock-watcher.sh 支持 RUNNER_RESOURCE_IDS 空格分隔多资源,每轮循环检查所有板子 - 文档改为推荐每组织一个 watcher 监控所有板子;移除 .env.watcher 及方式二说明 Made-with: Cursor
- compose 生成时若配置 RUNNER_LOCK_MONITOR_TOKEN 则加入 lock-watcher 服务(alpine + jq) - start/stop/restart 无参时操作所有服务(含 watcher),与锁使用无感 - 文档更新:watcher 随 start 自动启动,无需 tmux/systemd Made-with: Cursor
Contributor
Author
|
已按这个建议在本 PR 里实现: |
Contributor
|
LGTM |
Document pre-job lock permission failure recovery in both READMEs and update .env.example to recommend a persistent host lock path under /var/tmp while keeping the container lock path unchanged. Made-with: Cursor
- Dockerfile: 安装 cmake/clang/libclang-dev,软链 libclang.so 供 bindgen - pre-job-lock: 创建锁目录失败或不可写时立即退出并提示修复 Made-with: Cursor
- 新增 docs/多组织部署指南.md,README 指向完整文档 - pre-job-lock: flock 阻塞时每 10s 输出等待进度;chmod 静默失败避免噪声 Made-with: Cursor
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
背景
当前多组织共享同一块开发板时,通过文件锁保证同一板卡上的 Job 串行执行。但一旦某个正在持锁的 workflow 被手动 Cancel,可能遗留“幽灵锁”,导致后续等待同一板卡的 Job 长时间卡在
Waiting for lock,Cancel 也无法真正打断等待。Fixes #12
变更内容
新增锁监控脚本
runner-wrapper/lock-watcher.sh,在宿主机周期性检查指定仓库下的 Actions Run 状态:${RESOURCE_ID}.holder读取当前持锁 run_id。status与conclusion。status=completed && conclusion=cancelled时,强制清理${RESOURCE_ID}.holder及相关.release文件,释放板卡文件锁。补充使用文档
docs/board-lock-watcher.md:.env如何启用 board 级文件锁。.env.watcher、安装jq并启动lock-watcher.sh。docs/多组织共享Runner使用说明.md中补充:只有设置了RUNNER_RESOURCE_ID_PHYTIUMPI/RUNNER_RESOURCE_ID_ROC_RK3568_PC的板子才会启用 runner-wrapper 与锁目录挂载,未配置的板子仍使用默认run.sh,不参与锁协调。行为效果
pre-job-lock.sh/post-job-lock.sh实现按 Job 串行访问。completed + cancelled后主动清理解锁文件,避免后续等待 Job 永久阻塞。使用与部署建议
.env.watcher并启动一个对应的lock-watcher.sh实例。