Skip to content

V4: Claude Code agent patterns + FlexAttention + proposal pipeline fixes#10

Open
2imi9 wants to merge 3 commits intomainfrom
experiments/baseline
Open

V4: Claude Code agent patterns + FlexAttention + proposal pipeline fixes#10
2imi9 wants to merge 3 commits intomainfrom
experiments/baseline

Conversation

@2imi9
Copy link
Copy Markdown
Owner

@2imi9 2imi9 commented Apr 1, 2026

Summary

  • 8 Claude Code agent patterns for V4 runner: batch scoring, history compaction, adaptive thinking, validation hooks, circuit breaker, exponential backoff, FileStateCache, graceful interrupt
  • FlexAttention sliding window on RTX 5090 — val_bpb 1.739 → 1.680, +18% tok/sec
  • Proposal pipeline fixes — thinking tag strip, 3-strategy parser, few-shot examples, diagnostic logging
  • Dual-repo FA3 logic restored — capability-based kernel selection
  • 64 unit tests (no GPU required)
  • File reorganization — results/, localpilot/, experiments/

Results

Metric SDPA (before) FlexAttention (after)
val_bpb 1.739 1.680
tok/sec ~70k ~83k
Sliding window No Yes
Tests 48 64

Issues

Test plan

  • 64 unit tests pass
  • FlexAttention smoke test — sliding window + GQA on RTX 5090
  • Full training run — 59 steps, val_bpb=1.680, no OOM
  • 1-experiment V4 pipeline test
  • Full 64-experiment campaign

🤖 Generated with Claude Code

2imi9 and others added 2 commits April 1, 2026 00:32
Proposal generation fixes:
- Add _strip_thinking() to handle Qwen's <think>...</think> tags
- Add _parse_proposal() with 3 fallback strategies (exact, markdown, assignment)
- Restructure prompt with XML sections, few-shot examples, and explicit constraints
- Lower temperature 0.9→0.7 for more reliable structured output
- Add diagnostic logging: every rejection reason is printed with counts summary

FlexAttention (PyTorch 2.5+):
- Replace SDPA fallback with FlexAttention for sliding window + GQA support
- Create cached BlockMask per window size (short=1024, long=2048)
- Falls back to SDPA only if FlexAttention unavailable
- Result: val_bpb 1.680 (was 1.739 with SDPA), 18% faster tok/sec

Add 16 new tests (64 total): 5 for _strip_thinking, 11 for _parse_proposal.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The original train.py selected FA3 repo by compute capability:
- varunneal/flash-attention-3 for Hopper (SM 9.0)
- kernels-community/flash-attn3 for all other GPUs

This was lost during the V4 rewrite. Restored now.

Note: neither repo supports Blackwell (SM 12.0) yet — falls through
to FlexAttention. See #1, #4.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4f3fbf721b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +136 to +137
y = _flex_attention(q, k, v, block_mask=block_mask,
enable_gqa=use_gqa)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Add SDPA fallback when FlexAttention execution errors

On non-FA3 GPUs this path is always selected when flex_attention imports, but _flex_attention(...) is called without any runtime fallback. If FlexAttention is present yet unsupported for the current device/dtype/shape (a common partial-support case), forward will raise and abort training instead of using the SDPA fallback that previously kept runs alive. Wrapping this call in a guarded fallback to scaled_dot_product_attention preserves training continuity across environments.

Useful? React with 👍 / 👎.

…2 dep

- Mark V3 as stable (1.1507 BPB), V4 as WIP in README
- Add attention backend status table (SDPA/FlexAttention/FA3/FA4)
- Document Blackwell FA status (FA3 Hopper-only, FA4 beta crashes)
- Fix .gitignore: exclude *.lock but keep uv.lock tracked
- Add missing jinja2 dependency to pyproject.toml
- Update run commands to show V3, V4, and baseline options

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant