Symptom
When the job watcher daemon starts on skampere1 (smart mode with clauded as the agent), it fails to dispatch the startup lifecycle email:
2026-04-16 12:26:25 [skampere1.stanford.edu] INFO Smart-job agent: clauded (/afs/cs.stanford.edu/u/brando9/bin/clauded)
2026-04-16 12:26:25 [skampere1.stanford.edu] WARNING Failed to dispatch lifecycle email: [Errno 8] Exec format error: 'clauded'
The watcher itself keeps running after the warning; only the email step fails.
Diagnosis
Errno 8 = ENOEXEC "Exec format error." This almost always means os.execvp(...) was called on a file whose shebang is missing, malformed, or points to a non-existent interpreter — OR the caller passed a plain string to something that expected an exec-ready binary. Given the scheduler is Python, likely one of:
- A
subprocess call uses executable='clauded' with shell=False and a string that needs shell parsing.
- The email-dispatch path piping through
clauded as a transport hits the same thing.
The name "lifecycle email" suggests the watcher is trying to use clauded to send the email via the coding-agent path rather than via SMTP/sendmail directly. If so, that's an odd coupling — consider separating transport (SMTP / mail / sendmail) from agent invocation.
Repro
Start the watcher on a SNAP node that has clauded in $PATH. Check the first few lines of ~/dfs/job_queue/logs/watcher_daemon.log for the WARNING.
Suggested fix
- Decouple lifecycle email from the coding-agent binary — use
smtplib / system mail for lifecycle notifications.
- If the coupling is intentional, make the exec path robust:
subprocess.run([executable, ...args], shell=False) with the interpreter resolved via shutil.which("clauded") and a fallback error that names the file being exec'd.
Context
Discovered during SNAP watcher audit on 2026-04-16. See agents-config commit 594a3cb for the broader two-layer job-checking context.
Symptom
When the job watcher daemon starts on skampere1 (smart mode with
claudedas the agent), it fails to dispatch the startup lifecycle email:The watcher itself keeps running after the warning; only the email step fails.
Diagnosis
Errno 8=ENOEXEC"Exec format error." This almost always meansos.execvp(...)was called on a file whose shebang is missing, malformed, or points to a non-existent interpreter — OR the caller passed a plain string to something that expected an exec-ready binary. Given the scheduler is Python, likely one of:subprocesscall usesexecutable='clauded'withshell=Falseand a string that needs shell parsing.claudedas a transport hits the same thing.The name "lifecycle email" suggests the watcher is trying to use
claudedto send the email via the coding-agent path rather than via SMTP/sendmail directly. If so, that's an odd coupling — consider separating transport (SMTP /mail/sendmail) from agent invocation.Repro
Start the watcher on a SNAP node that has
claudedin$PATH. Check the first few lines of~/dfs/job_queue/logs/watcher_daemon.logfor the WARNING.Suggested fix
smtplib/ systemmailfor lifecycle notifications.subprocess.run([executable, ...args], shell=False)with the interpreter resolved viashutil.which("clauded")and a fallback error that names the file being exec'd.Context
Discovered during SNAP watcher audit on 2026-04-16. See agents-config commit
594a3cbfor the broader two-layer job-checking context.