Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 9 additions & 3 deletions controller/docs/TECH_DEBT_PLAN.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,13 @@

_Last updated: 2026-06-23. Tracks remediation work against the findings below. IDs map to the register in this file (F/D/S/M) and to `FORK_DELTA_ANALYSIS.md` (K)._

> **Current status (2026-06-23):** Phase 0 (guardrails) ✅ and Phase 1 (security) ✅ core-complete. 26 JVM unit-test files; 18 clean-architecture slices; lint is a hard gate (baseline 468 grandfathered issues). **Not yet started:** Phase 2 (concurrency — still ~37 raw `new Thread(`). **Partial:** Phase 3 (slices land, but the god classes GREW — `DeployFragment` ~3,022 LOC, `MainActivity` ~2,384, `SyncFragment` ~811; `D1`/`F1`/`S14` not carved). **Phase 4:** `M15` done; `M8` (jcenter ×2) and `M9` (targetSdk 28) still open. Sections 1–6 below are the original 2026-06-16 audit snapshot.

**Lint — `androidx.fragment` lifecycle detector disabled (ACTIVE DEBT)** (PR `fix/lint-fragment-detector-hang`)
- `androidx.fragment`'s `UnsafeFragmentLifecycleObserverDetector` **hangs** `:app:lintDebug` for hours — its recursive call-graph walk degrades to near-exponential cost on the oversized `DeployFragment`/`MainActivity` methods (CI lint ~50 min, up to ~3 h on heavy branches).
- **Workaround:** `lintOptions` disables that detector's 3 issue IDs — `FragmentBackPressedCallback`, `FragmentLiveDataObserve`, `FragmentAddMenuProvider` (the only IDs it emits). All other 450+ checks (and `abortOnError true` + baseline) stay active.
- **Re-enable condition:** once `DeployFragment`/`MainActivity` are carved into smaller collaborators (Phase 3 — `D1`/`F1`/`S14`), remove the `disable` and confirm lint completes. This is the trigger to retire the workaround.
- **Root-cause analysis (2026-06-23 — concrete carve targets):** the hang scales with **method size**, not file count — the recursive call-graph walk explodes on a few oversized methods reachable from the Fragment/Activity entry points. Measured: `MainActivity.addNewTerminalSession()` **~727 LOC**, `MainActivity.onCreate()` ~329; in `DeployFragment` the `bind*ButtonLogic` methods (`bindResetButtonLogic` ~184, `bindInstallButtonLogic` ~175, `bindBackupButtonLogic` ~174, `bindBackupMenuLogic` ~130) plus `onRootfsSizeResolved` ~136 and `updateDynamicButtons` ~127 — ~19 methods ≥40 LOC summing ~1,844 LOC (61% of the file). Extracting these into per-area collaborators (install/backup/reset button controllers; a terminal-session builder) shrinks the graph reachable from `onViewCreated`/`onCreate` and is the concrete first step of `D1`/`F1` that retires this workaround.

**Phase 0 — Guardrails: DONE** (PR `chore/phase0-guardrails`, merged as #4)
- Extracted `SystemStatsUtil` and added the first JVM unit tests (`SystemStatsUtilTest`, `SyncHandshakeHelperTest`); added unit-test infra (`returnDefaultValues` + real `org.json`). Addresses **M10**.
Expand Down Expand Up @@ -93,10 +96,13 @@ _Last updated: 2026-06-23. Tracks remediation work against the findings below. I
- The import flow accepted **any** file as a backup, with no check that it is a rootfs or the right architecture. Per the ABI-separation policy (ARM64↔ARM64, 32↔32), a 32-bit rootfs must not be importable/restorable into a 64-bit app (and vice-versa), and a non-rootfs (e.g. a ZIM) must not be treated as one.
- New pure domain: `deploy/domain/ElfClass` (read 32/64 from an ELF header) + `RootfsArchive` (structural "looks like a rootfs" + pick a probe binary). Unit-tested.
- `deploy/data/RootfsArchiveValidator` lists the archive, runs the structural check, and probes one internal binary's ELF class vs the app's ABI (`Process.is64Bit()`). **Two gates, hard-block, fail-closed:** at **import** (reject + delete) and at **restore** (`TarExtractor` gains a `validateRootfs` overload that reuses its D11 listing). A *definite* wrong-arch is blocked; if arch can't be determined we don't block on arch (the structural check still applies).
- **Identity manifest (soft):** also reads the build's `installed-rootfs/iiab/.iiab-rootfs.json` (per `docs/ROOTFS_MANIFEST.md`) — when present it authoritatively gates `kind` + `arch`; when **absent** it shows a non-blocking "manifest not found" alert and falls back to the ELF/structure heuristic. (Integrity `iiab-tree-sha256-v1` / `Result.CORRUPT` is the next PR.)
- **Pending (documented):** the **integrity** treehash verification (Java ustar/pax reader mirroring `tools/iiab_tree_hash.py`), the in-app backup-writer emitting both members, and the arbitrary-file attack-vector analysis. Verify on a real device with a 32-bit and a 64-bit backup.
- **Identity manifest (soft):** also reads the build's `installed-rootfs/iiab/.iiab-rootfs.json` (per `docs/ROOTFS_MANIFEST.md`) — when present it authoritatively gates `kind` + `arch`; when **absent** it shows a non-blocking "manifest not found" alert and falls back to the ELF/structure heuristic. (Integrity `iiab-tree-sha256-v1` / `Result.CORRUPT` shipped separately in **#37** — see next entry.)
- **Integrity verification + writer: DONE** (PR **#37** `feat/rootfs-import-restore-integrity`, merged). `deploy/domain/RootfsTreeHash` (pure-JVM `iiab-tree-sha256-v1`, byte-parity with `tools/iiab_tree_hash.py`, proven by `RootfsTreeHashTest`) + `deploy/data/RootfsIntegrity` (one-pass, dependency-free ustar/GNU/pax reader; **no Apache Commons Compress** — minSdk 24 has no core-library desugaring and CC reaches into `java.nio.file` API 26+). Matrix: absent→`OK_NO_MANIFEST`; `origin:device-backup`/`algo:none`→`OK_NO_CHECKSUM` (proceed + transparency); match→`OK`; **mismatch/unreadable→`CORRUPT` (blocks at import like `WRONG_ARCH`)**. Integrity runs only at the import gate and rides identity (device backups recognized from the first header — no full pass). The backup **writer** stamps an identity manifest with `origin:device-backup` packed first (no device-side treehash). Tests + checked-in fixtures (ustar/GNU/pax/mismatch/none/absent).
- **Pending (documented):** on-device round-trip with a real 32-bit and 64-bit backup; the arbitrary-file attack-vector analysis; folding the rare builder-rootfs-manual-import verify into the import copy (one extra pass for that path only); relay the `origin`/`algo:none` addendum to `docs/ROOTFS_MANIFEST.md`.

**Phase 1 — Security hardening: COMPLETE (core).** Done: **S1** (#9), **M4** (#10), **D6** (#12), **D2** (#13), **D12** (#16), **D11** (#15), **S4** (#28), and **F15** — PR A security redesign (#30) + PR B in-app progress UX (#32). **S3 reverted** (#21, misdiagnosis). Beyond the original register: rootfs import/restore **ABI + identity validation** (#31) and **integrity `iiab-tree-sha256-v1` + device-backup writer** (#37). Remaining (lower-priority follow-ups): `S18` network-security-config (deferred), `D17` `ApkServer` HTTP/auth, `S11` rsync plaintext secrets, and the lower-risk `sh -c` pipe inputs (D2 follow-up).

**Phase 1 — Security hardening: IN PROGRESS.** Done so far: **S1** (PR #9), **M4** (PR #10), **D6** (PR #12), **D2** (PR #13), **D12** (PR #16), **D11** (PR #15), **S4** (PR #28). **S3 reverted** (see above). Remaining: **F15** (PR A in review).## 1. Executive summary
**F4 + M7 — confirmed mitigated (2026-06-23, verified by inspection):** **F4** — `MainActivity` has no leaking recurring runnables: `sizeUpdateHandler` (10s) and `serverCheckHandler` are stopped in `onPause` and re-armed in `onResume` with remove-before-post; `timeoutHandler` is one-shot. **M7** — `DashboardFragment.onResume` posts the 5s refresh loop and `onPause` calls `removeCallbacks`, so it cannot stack (Android guarantees `onPause` between `onResume` calls). Both can be marked closed in the register (section 3); optional polish: add a defensive remove-before-post in `DashboardFragment.onResume`.## 1. Executive summary

The Controller is functional and shows real security intent (it SHA256-audits native binaries at build time, scrubs the keystore in CI, and scopes most broadcasts). But it carries debt on four fronts that scale badly toward the README's "millions of users" goal:

Expand Down