V4: Claude Code agent patterns + FlexAttention + proposal pipeline fixes by 2imi9 · Pull Request #10 · 2imi9/LocalPilot

2imi9 · 2026-04-01T08:02:15Z

Summary

8 Claude Code agent patterns for V4 runner: batch scoring, history compaction, adaptive thinking, validation hooks, circuit breaker, exponential backoff, FileStateCache, graceful interrupt
FlexAttention sliding window on RTX 5090 — val_bpb 1.739 → 1.680, +18% tok/sec
Proposal pipeline fixes — thinking tag strip, 3-strategy parser, few-shot examples, diagnostic logging
Dual-repo FA3 logic restored — capability-based kernel selection
64 unit tests (no GPU required)
File reorganization — results/, localpilot/, experiments/

Results

Metric	SDPA (before)	FlexAttention (after)
val_bpb	1.739	1.680
tok/sec	~70k	~83k
Sliding window	No	Yes
Tests	48	64

Issues

Flash Attention 3/4 does not support Blackwell (RTX 5090, SM 12.0) #4 FA3/FA4 Blackwell support
FlexAttention sliding window replaces SDPA fallback #5 FlexAttention integration
V4 proposal pipeline generates 0 LLM proposals — falls back to random #6 Proposal pipeline empty proposals
Restore dual-repo FA3 logic lost during V4 rewrite #7 Dual-repo FA3 logic
Evaluate SageAttention3 as FlexAttention alternative for Blackwell #8 SageAttention3 evaluation
Docker training mode commented out — restore as optional flag #9 Docker training restore

Test plan

64 unit tests pass
FlexAttention smoke test — sliding window + GQA on RTX 5090
Full training run — 59 steps, val_bpb=1.680, no OOM
1-experiment V4 pipeline test
Full 64-experiment campaign

🤖 Generated with Claude Code

Proposal generation fixes: - Add _strip_thinking() to handle Qwen's <think>...</think> tags - Add _parse_proposal() with 3 fallback strategies (exact, markdown, assignment) - Restructure prompt with XML sections, few-shot examples, and explicit constraints - Lower temperature 0.9→0.7 for more reliable structured output - Add diagnostic logging: every rejection reason is printed with counts summary FlexAttention (PyTorch 2.5+): - Replace SDPA fallback with FlexAttention for sliding window + GQA support - Create cached BlockMask per window size (short=1024, long=2048) - Falls back to SDPA only if FlexAttention unavailable - Result: val_bpb 1.680 (was 1.739 with SDPA), 18% faster tok/sec Add 16 new tests (64 total): 5 for _strip_thinking, 11 for _parse_proposal. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The original train.py selected FA3 repo by compute capability: - varunneal/flash-attention-3 for Hopper (SM 9.0) - kernels-community/flash-attn3 for all other GPUs This was lost during the V4 rewrite. Restored now. Note: neither repo supports Blackwell (SM 12.0) yet — falls through to FlexAttention. See #1, #4. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4f3fbf721b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-01T08:08:01Z

train.py

+            y = _flex_attention(q, k, v, block_mask=block_mask,
+                                enable_gqa=use_gqa)


Add SDPA fallback when FlexAttention execution errors

On non-FA3 GPUs this path is always selected when flex_attention imports, but _flex_attention(...) is called without any runtime fallback. If FlexAttention is present yet unsupported for the current device/dtype/shape (a common partial-support case), forward will raise and abort training instead of using the SDPA fallback that previously kept runs alive. Wrapping this call in a guarded fallback to scaled_dot_product_attention preserves training continuity across environments.

Useful? React with 👍 / 👎.

…2 dep - Mark V3 as stable (1.1507 BPB), V4 as WIP in README - Add attention backend status table (SDPA/FlexAttention/FA3/FA4) - Document Blackwell FA status (FA3 Hopper-only, FA4 beta crashes) - Fix .gitignore: exclude *.lock but keep uv.lock tracked - Add missing jinja2 dependency to pyproject.toml - Update run commands to show V3, V4, and baseline options Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2imi9 and others added 2 commits April 1, 2026 00:32

chatgpt-codex-connector bot reviewed Apr 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

V4: Claude Code agent patterns + FlexAttention + proposal pipeline fixes#10

V4: Claude Code agent patterns + FlexAttention + proposal pipeline fixes#10
2imi9 wants to merge 3 commits intomainfrom
experiments/baseline

2imi9 commented Apr 1, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		y = _flex_attention(q, k, v, block_mask=block_mask,
		enable_gqa=use_gqa)

Conversation

2imi9 commented Apr 1, 2026

Summary

Results

Issues

Test plan

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant