Symptom
Running bash ~/ultimate-utils/py_src/uutils/job_scheduler_uu/start_watcher.sh on a SNAP node prints [OK] Watcher running and the tmux session dies seconds later. tmux ls shows no job_watcher session.
Root cause
start_watcher.sh launches the daemon with ${PYTHON:-python3} and does not:
- Activate a venv that has the package deps (
dill, etc.)
- Set
PYTHONPATH=~/ultimate-utils/py_src so uutils is importable
With the default system /usr/bin/python3 on SNAP nodes, the scheduler crashes immediately:
ModuleNotFoundError: No module named 'dill'
(confirmed on skampere2 with Python 3.10.12). The tmux session exits with no visible error because the script itself returned 0 before the Python process failed — the [OK] is misleading.
Repro
cd # default system python
bash ~/ultimate-utils/py_src/uutils/job_scheduler_uu/start_watcher.sh
# stdout: "[OK] Watcher running."
tmux ls # <no job_watcher session>
Workaround
Launch the tmux session manually with proper env:
tmux new-session -d -s job_watcher "source ~/uv_envs/veribench/bin/activate && export PYTHONPATH=~/ultimate-utils/py_src && python -m uutils.job_scheduler_uu.scheduler --job-dir \$HOME/dfs/job_queue --poll 15"
Suggested fix
Inside start_watcher.sh, either:
- Activate a known-good venv (e.g.
~/uv_envs/veribench/bin/activate) and export PYTHONPATH=~/ultimate-utils/py_src before tmux new-session, or
- Validate the Python env upfront (
python -c "import uutils.job_scheduler_uu.scheduler") and abort with a clear error if imports fail, or
- After
tmux new-session, sleep 2 && tmux has-session -t job_watcher to confirm the window didn't die, and fail loudly if it did.
The third option (liveness check) is the most robust — it catches any future env issue, not just this one.
Context
Discovered during SNAP watcher setup on skampere2 (2026-04-16). Workaround used on skampere2 live. See agents-config commit bda6e7f for the related ~/dfs symlink requirement and commit 594a3cb for the two-layer job-checking protocol that surfaced this.
Symptom
Running
bash ~/ultimate-utils/py_src/uutils/job_scheduler_uu/start_watcher.shon a SNAP node prints[OK] Watcher runningand the tmux session dies seconds later.tmux lsshows nojob_watchersession.Root cause
start_watcher.shlaunches the daemon with${PYTHON:-python3}and does not:dill, etc.)PYTHONPATH=~/ultimate-utils/py_srcsouutilsis importableWith the default system
/usr/bin/python3on SNAP nodes, the scheduler crashes immediately:(confirmed on skampere2 with Python 3.10.12). The tmux session exits with no visible error because the script itself returned 0 before the Python process failed — the
[OK]is misleading.Repro
Workaround
Launch the tmux session manually with proper env:
tmux new-session -d -s job_watcher "source ~/uv_envs/veribench/bin/activate && export PYTHONPATH=~/ultimate-utils/py_src && python -m uutils.job_scheduler_uu.scheduler --job-dir \$HOME/dfs/job_queue --poll 15"Suggested fix
Inside
start_watcher.sh, either:~/uv_envs/veribench/bin/activate) andexport PYTHONPATH=~/ultimate-utils/py_srcbeforetmux new-session, orpython -c "import uutils.job_scheduler_uu.scheduler") and abort with a clear error if imports fail, ortmux new-session,sleep 2 && tmux has-session -t job_watcherto confirm the window didn't die, and fail loudly if it did.The third option (liveness check) is the most robust — it catches any future env issue, not just this one.
Context
Discovered during SNAP watcher setup on skampere2 (2026-04-16). Workaround used on skampere2 live. See agents-config commit
bda6e7ffor the related~/dfssymlink requirement and commit594a3cbfor the two-layer job-checking protocol that surfaced this.