Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
220 changes: 194 additions & 26 deletions README.md

Large diffs are not rendered by default.

301 changes: 301 additions & 0 deletions docs/DECISIONS.md

Large diffs are not rendered by default.

125 changes: 125 additions & 0 deletions docs/NOTE-TO-REPO-AGENT-2026-05-02.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
# Note to the repo-frontdoor agent (2026-05-02)

**You are invited** to update the repository's external-facing surface area to reflect the current state. This note is written by the executor agent that just closed Phases 0G + 1A. It tells you exactly what changed, where the canonical sources are, and what the front-door reader should see.

## What's currently outdated

The following files are the project's "front door" and currently reflect a pre-2026-05-02 state. They predate the Phase 0G unblock, the Phase 1A on-device proof, and the Phase 1A.0 + 1A.B overnight characterisation:

- `README.md` — top-level project description. Likely still says Phase 0G is pending or QAIRT-blocked.
- `PRD.md` — product requirements doc. Phase 1A status / acceptance criteria are pre-D-030.
- `MODUS-OPERANDI.md` — methodology doc. Should be cross-referenced from the new comprehensive report.

## What to bring forward

### Headline (one-line update)

Phase 0 is **closed**. Phase 1A on-device QNN inference is **proven** on actual Snapdragon 8 Elite Gen 4 hardware with **22,850 successful inferences over 6h15m at 100% success rate**. Phase 1A.B steady-state benchmark is **closed**. Phase 1A.A (real-data ELO Stage-1 training experiment) is the next step.

### Numbers to put in the README (verified, not projections)

| Metric | Value | Source |
|---|---|---|
| Snapdragon SM8750 AOT compile success rate | 5 / 5 scopes | `runtime/reports/export_probe/2026-05-02T014031Z_litert214_qairt244_FULL/summary.json` |
| Qwen2.5-1.5B 26-layer ELO frozen middle, on-device steady-state per-inference latency | **576 ms (p50), 811 ms (p95), 817 ms (max)** | `runtime/reports/phase1a/2026-05-02T1802Z-overnight-v2/summary.json` |
| Qwen2.5-1.5B single transformer block, on-device steady-state per-inference latency | **19 ms (p50), 22 ms (p95)** | same |
| Sustained-load reliability across 22,850 inferences over 6h15m | **100% success rate** (rc=0, out_size=98304 every time) | same |
| Unplugged battery life of REDMAGIC SM8750 at the proven duty cycle | **~25 hours from 100% → 15% halt** | extrapolation from observed 3.2%/hour drain over 2.5h unplugged segment |
| Thermal envelope (room ambient, no fridge needed) | battery peaked 32 °C, CPU0 peaked 58 °C startup → 28–36 °C steady state | same |
| Unit test suite | 127 / 127 pass | `pytest tests/` |
| Cumulative engineering decisions logged | 32 (D-001 through D-032) | `docs/DECISIONS.md` |

### Status badges (suggested for README)

```markdown
[![Phase 0G](https://img.shields.io/badge/Phase_0G-AOT_compile-green)]()
[![Phase 1A](https://img.shields.io/badge/Phase_1A-on--device_proven-green)]()
[![Phase 1A.B](https://img.shields.io/badge/Phase_1A.B-22850_inferences_100%25-green)]()
[![Phase 1A.A](https://img.shields.io/badge/Phase_1A.A-next-yellow)]()
[![Tests](https://img.shields.io/badge/tests-127%2F127-brightgreen)]()
```

### Authoritative documents to link from README

These are the documents that should be visible from the front door, in this priority order:

1. **`docs/REPORT-2026-05-02-comprehensive.md`** — the 8.6k-word zero-context-friendly technical report. This is what an external ML engineer or OEM platform engineer should land on. Self-contained; no other reading required.
2. **`docs/REPORT-2026-05-02-phase-0-1a-progress.md`** — shorter (~3.5k word) executive companion, for readers already familiar with edge ML.
3. **`docs/PHONE-OVERNIGHT-RUNBOOK.md`** — operator runbook for the proven overnight chain.
4. **`docs/DECISIONS.md`** — full 32-row decision log; canonical source of truth for every claim.
5. **`docs/NOTE-TO-REPO-AGENT-2026-05-02.md`** — this file (for context if the agent re-runs in a future session).
6. **`PR #4`** on GitHub — every change in this period is in one PR for review/merge.

### What to edit in README.md

I'd suggest the README looks like:

```markdown
# Polymath AI

Research infrastructure for **continuous pretraining of an LLM directly on a consumer Android phone**.

Target model: `Qwen/Qwen2.5-1.5B`. Target hardware: Snapdragon 8 Elite Gen 4 (SM8750).
Reference handset: REDMAGIC 10 Pro+. Method: ELO continuous pretraining (train layer 0 + LM head, freeze middle layers, run frozen middle on phone NPU).

## Status (2026-05-02)

Phase 0G AOT compile: **closed** (5/5 scopes ok, QAIRT 2.44 + LiteRT 2.1.4 matching pair).
Phase 1A on-device inference: **closed** (22,850 inferences, 100% success rate, 576 ms p50 for full 26-layer Qwen frozen middle on Hexagon NPU).
Phase 1A.A (real-data ELO Stage-1 training): **next**.

See `docs/REPORT-2026-05-02-comprehensive.md` for the full technical report.

## Headline number

A 1.5-billion-parameter transformer's frozen middle (26 of 28 Qwen2.5-1.5B layers) AOT-compiles to a 2.3 GB Qualcomm SM8750 context binary and runs on a consumer phone's Hexagon NPU at **0.576 seconds per forward pass, sustained for 6+ hours at 100% reliability, room-temperature ambient, no thermal throttling**.

## Quick links

- [Comprehensive technical report](docs/REPORT-2026-05-02-comprehensive.md)
- [Decision log (32 rows)](docs/DECISIONS.md)
- [Phone overnight runbook](docs/PHONE-OVERNIGHT-RUNBOOK.md)
- [Live PR](https://github.com/Zer0pa/Polymath-AI/pull/4)

## Boundary

(Verbatim self-imposed scope; sha256-anchored across artifacts.)

> Research infrastructure for in silico on-device LLM training and multilingual / multi-domain knowledge model construction. Outputs are research artifacts — model checkpoints, training telemetry, evaluation reports, throughput measurements. No regulatory certification claims. No clinical or human-subject use. No surveillance, biometric profiling, or identity inference. No model weights distributed without explicit license attestation. No training on copyrighted material without explicit corpus-license decomposition. No deployment to production without a falsifier-traced acceptance gate.

## Reproducer

90-minute clean-slate reproducer: see `docs/REPORT-2026-05-02-comprehensive.md` §11 + §15.
```

### Things you should NOT change

- **The boundary block.** Verbatim, sha256-anchored. Already correct in `polymath_ai/boundary/text.py`.
- **The decision log entries.** D-001 through D-032 are immutable historical record. Add D-033+ if you make new decisions; never edit prior rows.
- **The phase numbers.** 0A through 3A are stable. Don't renumber.
- **Anything in `runtime/reports/`.** Those are dated artifacts; they belong to the run that produced them.
- **The `polymath_ai/` source code unless it has actual new functionality** — this note is asking for a docs-only update.

### What to merge first

PR #4 (`linux/phase0g-qairt-v2.43`) carries every change this report references. The PR has been kept in a coherent state with running test suite (127/127 pass) and a working overnight chain (closed cleanly with `stop_signal_received`). After PR #4 merges, your README + PRD updates can be a separate PR with a clean diff, or appended to PR #4 if your workflow prefers a single roll-up.

### Verification before you publish

1. `pytest tests/` → expect 127/127 pass.
2. `cat polymath_ai/boundary/text.py | grep BOUNDARY_SHA256` → confirm the sha256 anchor is present and matches what the boundary scanner expects.
3. `python -c "from polymath_ai.scheduler.registry import default_registry; r = default_registry(); print(r.get('litert_qnn_sm8750').confirmed_for_socs)"` → expect `(('SM8750', 1.0),)` (the Phase 0G promotion).
4. Visit <https://huggingface.co/datasets/Architect-Prime/polymath-telemetry/tree/main/phase1a> → confirm the latest run's audit.jsonl exists (the live-monitoring proof).

## Why this matters externally

The work in this period unlocks two reusable patterns for the broader on-device-ML community, beyond Polymath's own roadmap:

1. **The "matching-pair" SDK pinning insight** — LiteRT's `third_party/qairt/workspace.bzl` hard-pins QAIRT version with a public CDN URL. Reusable by any team hitting QnnSystem-version-mismatch errors.
2. **The "extract embedded QNN context binary" pattern** — saves a multi-week NDK build for production QNN-delegated models. Tooling in `scripts/host/extract_qnn_context.py` (~80 lines, two dependencies).

Both are documented in §7 of the comprehensive report; consider linking from the README for community discoverability.

---

*Written 2026-05-02 by the executor agent. The next agent can append to this note or supersede with a fresh `NOTE-TO-REPO-AGENT-<date>.md`.*
127 changes: 127 additions & 0 deletions docs/PHONE-OVERNIGHT-RUNBOOK.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
# Phase 1A overnight chain — runbook for fridge mode

**Audience:** zero-coder operator with a REDMAGIC 10 Pro / SM8750. You connect the phone over USB, type one command, disconnect, put it in the fridge.

**What runs:** an inference loop on the phone's Hexagon NPU using the Phase 0G AOT artifacts (D-030 / D-031). Each iteration runs either a single Qwen2.5-1.5B transformer block (fast) or all 26 layers of the ELO frozen middle (slow). Telemetry — battery, thermal, memory, disk, per-inference timing — is appended to a hash-chained JSONL audit log on `/sdcard/Polymath/phase1a/`. Every 10 iterations the audit log is pushed to a private HF dataset so you can monitor live from any browser without reconnecting the phone.

## What the operator sees during execution

**Live monitoring (any browser, any device):**

```
https://huggingface.co/datasets/Architect-Prime/polymath-telemetry/blob/main/phase1a/<run_id>/audit.jsonl
```

The `<run_id>` is printed at startup (format `YYYYMMDDTHHmmSSZ_phase1a_overnight`). HF auto-renders JSONL — you'll see one row per inference batch with timing + battery + thermal data. Files refresh every ~2 minutes (every 10 iterations).

## Pre-conditions (already in place from this session)

- `/data/local/tmp/qairt-2.44/` — QAIRT 2.44.0.260225 aarch64-android (579 MB)
- `/data/local/tmp/phase1a/qwen_block.qnn.bin` (90 MB) and `qwen_frozen_subgraph.qnn.bin` (2.3 GB)
- `/data/local/tmp/phase1a/input.bin` + `input_list.txt` (synthetic FP32 zeros — 1×16×1536)
- `/sdcard/Polymath/.hf-token` (HF token for live telemetry push)
- `/sdcard/Polymath/phase1a/overnight_inference.sh` (the runner)

## Start the chain

From the host (Mac, with `adb` connected):

```bash
adb shell '
rm -f /sdcard/Polymath/phase1a/STOP /sdcard/Polymath/phase1a/audit.jsonl /sdcard/Polymath/phase1a/hf_push.log
nohup setsid sh /sdcard/Polymath/phase1a/overnight_inference.sh \
> /sdcard/Polymath/phase1a/runner.log 2>&1 &
echo "PID=$!"
sleep 3
svc power stayon ac
'
```

Verify it's running and detached:

```bash
adb shell 'ps -ef | grep overnight_inference | grep -v grep'
# PPID column should be 1 (init) — that means adb disconnect won't kill it
```

## Disconnect + put in fridge

Once `ps -ef` shows PPID=1, **you can unplug the USB cable**. The loop keeps running:
- The phone stays awake because `svc power stayon ac` is set (keeps CPU running while AC powered).
- The Hexagon NPU is reachable via `qnn-net-run` from adb-shell context, even with the screen off and ADB disconnected.

For fridge mode:
- Put the phone in **REDMAGIC Game Zone** before unplugging if available — Game Zone disables Doze for foreground processes.
- Plug the phone into a power outlet IN the fridge (charge bypass mode if the phone supports it; otherwise battery will charge to full and then trickle-charge).
- Close the fridge.

## Live monitoring (no reconnection needed)

**Quick status check** — visit this URL in any browser:
```
https://huggingface.co/datasets/Architect-Prime/polymath-telemetry/tree/main/phase1a
```

The newest directory matches your current run. Click into `audit.jsonl` to see per-iteration rows. Each row carries:
- `iter` — iteration number
- `scope` — `qwen_block` (1 layer) or `qwen_frozen_subgraph` (26 layers)
- `wall_ms` + `per_inf_ms` — wall-clock for the batch + per-inference latency
- `rc` — exit code from `qnn-net-run` (0 = ok)
- `out_size` — output bytes (98304 = 1×16×1536 FP32; anything else = problem)
- `battery.{level,temp_dC,ac_powered}` — phone health
- `thermal.{cpu-N-N-N,battery,skin-msm-therm}` — every available thermal zone
- `memory.{avail_kb,total_kb}` — RAM headroom
- `disk.{data_free_kb,sdcard_free_kb}` — storage headroom
- `prev_event_hash` — sha256 of the previous row, for tamper-detection

If the row count stops growing for >5 minutes, something stalled. If it hasn't pushed to HF in >10 minutes, network or HF API is down.

## Auto-stop conditions (graceful)

The loop monitors itself and halts on any of:
- **`/sdcard/Polymath/phase1a/STOP` file exists** (your kill switch)
- **Battery temperature > 45.0°C** (records `thermal_halt` event then exits)
- **Battery level < 15%** (records `low_battery_halt` event then exits)
- **Required QNN binary missing** (records `fatal_missing_artifact` event then exits)

A graceful halt always writes a `phase1a_overnight_end` event as the last row, so you can tell apart "still running but slow" vs "stopped on its own".

## Stopping it manually (kill switch)

From the host (after re-connecting USB):
```bash
adb shell 'touch /sdcard/Polymath/phase1a/STOP'
```

The loop checks for `STOP` once per iteration (~12 s cycle). It will halt within one cycle, write the `stop_signal_received` event, and exit.

## Reconnecting in the morning

```bash
adb shell 'tail -3 /sdcard/Polymath/phase1a/audit.jsonl | tr "," "\n" | grep -E "ts|event_type|iter|wall_ms|per_inf|level|temp_dC"'
adb pull /sdcard/Polymath/phase1a/audit.jsonl /tmp/overnight_audit.jsonl
adb pull /sdcard/Polymath/phase1a/runner.log /tmp/overnight_runner.log
```

Then summarise locally with:
```bash
wc -l /tmp/overnight_audit.jsonl # total events
grep -c inference_batch /tmp/overnight_audit.jsonl # inference iterations
grep -E "thermal_halt|low_battery_halt|stop_signal" /tmp/overnight_audit.jsonl
```

## What this run actually proves overnight

- **Steady-state per-inference latency** on Hexagon for both qwen_block and qwen_frozen_subgraph (the 10x wall-clock from the smoke test was dominated by 2.3 GB mmap; thousands of iterations factor that out).
- **Thermal sustainability** of continuous Hexagon-NPU inference — does the SM8750 throttle under sustained load, especially in cool fridge ambient?
- **Battery / charge-bypass behavior** — if the phone is plugged in inside the fridge, does the AC supply keep the battery at full without thermal stress, or does it cycle?
- **Reliability of the inference primitive** — across thousands of inferences, do we ever see `rc != 0` or `out_size != 98304` (i.e. silent corruption)?
- **End-to-end auditable record** — the hash-chained JSONL gives a tamper-evident log of every inference call we made through the night.

These four data points are the foundation for the Phase 1A.A ELO experiment that's queued next.

## Known constraints / caveats

- **No real tokens yet.** The input is FP32 zeros. So the outputs are `f(0)` for the random-init weights of each scope; numerically they're the layer-norm bias / projection patterns of the network. They DON'T mean anything semantically. The point of this overnight run is the system-level proof, not language modelling.
- **Termux is unused.** The original blueprint relied on Termux for telemetry, but Termux SSH was unresponsive in this session (suspected aggressive power-management of the Termux app process). The pure adb-shell + curl path is more reliable.
- **No Android NDK / LiteRT app.** Running the QNN context binary directly via `qnn-net-run --retrieve_context` works for our case because every op in our compiled subgraphs is QNN-delegated by construction. A model with mixed delegate coverage would need a different runtime.
Loading