Skip to content

fix(controller): drain process output to avoid deadlock/silent failures (D12)#16

Merged
luisguzman-adfa merged 1 commit into
mainfrom
fix/phase1-security-d12-process-drain
Jun 19, 2026
Merged

fix(controller): drain process output to avoid deadlock/silent failures (D12)#16
luisguzman-adfa merged 1 commit into
mainfrom
fix/phase1-security-d12-process-drain

Conversation

@luisguzman-adfa

Copy link
Copy Markdown
Collaborator

Summary

Phase 1 (security hardening). Several Runtime.exec(...).waitFor() calls in DeployFragment never read the child's stdout/stderr. Once the ~64 KB OS pipe buffer fills, the child blocks writing while the parent blocks in waitFor() — a deadlock, most plausibly on the backup tar | gzip pipe when tar emits many warnings on a large rootfs. Some calls also swallowed failures in empty catch (Exception ignored) {}. Tech-debt item D12.

Changes

  • util/ProcessRunner.run(cmd) (new, shared): redirectErrorStream(true) (merge stderr into stdout) + full drain + returns {exitCode, output}. A single read cannot deadlock, and callers get the exit code instead of ignoring it. Same util/ pattern as ByteFormatter / LocalVarsYamlParser.
  • Migrated the raw exec().waitFor() sites in DeployFragment — the backup pipe, chmod -R, and the three rm -rf wipes — to ProcessRunner; the empty catches now log the failure.
  • Left unchanged the extraction path (already drains stderr and checks the exit code) and the getprop read (reads stdout).

Why no new test

ProcessRunner is process glue, not pure logic, and running a shell from a unit test would be fragile on a non-Linux dev box (consistent with M4/S3/D6). The checked exceptions it throws are exactly those exec()/waitFor() already threw, so the existing try/catch blocks cover them. Verified by inspection + the CI compile/test gate.

How to identify it in the build / behaviour

No user-visible change on success. A failing chmod/rm/backup now logs … failed (exit N): … instead of failing silently, and a backup with verbose tar stderr no longer risks hanging. Independent of the in-flight D11 PR (touches the same backup block region, so a trivial rebase may be needed when one merges).

…es (D12)

Several Runtime.exec(...).waitFor() calls in DeployFragment never read the
child's stdout/stderr. Once the ~64 KB OS pipe buffer fills the child blocks
writing while the parent blocks in waitFor() -- a deadlock, most plausibly on
the backup `tar | gzip` pipe when tar emits many warnings on a large rootfs.
Some calls also swallowed failures in empty `catch (Exception ignored) {}`
(tech-debt D12).

- New shared util/ProcessRunner.run(cmd): redirectErrorStream(true) so stderr is
  merged into stdout, drains it fully, waits, and returns {exitCode, output}.
  A single drain cannot deadlock, and callers get the exit code instead of
  ignoring it.
- Migrated the raw exec().waitFor() sites in DeployFragment (backup pipe,
  chmod -R, and the three rm -rf wipes) to ProcessRunner; the empty catches now
  log the failure.
- Left the extraction path (already drains stderr and checks the exit code) and
  the getprop read (reads stdout) unchanged.

No new unit test: ProcessRunner is process glue, not pure logic, and running a
shell from a unit test would be fragile on a non-Linux dev box (consistent with
M4/S3/D6). Verified by inspection + the CI compile/test gate.
@luisguzman-adfa luisguzman-adfa merged commit 3bf4f1b into main Jun 19, 2026
1 check passed
@luisguzman-adfa luisguzman-adfa deleted the fix/phase1-security-d12-process-drain branch June 23, 2026 15:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant