You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Move multi-step training into TrainingConfig with per-step IS correction (#39)
## Summary
- move multi-step training controls (`steps_per_batch`,
`feedback_repetitions`) from eval-owned settings into `TrainingConfig`
- remove eval-side sub-step loop and pass typed `training` config
through `FeedbackItem` in each `/v1/feedback` request
- execute multi-step updates inside training engines (local/modal +
tinker)
- recompute behavior-policy logprobs after each optimizer step for
off-policy importance reweighting
- include engine metadata (`steps_per_batch_applied`, per-step metrics)
and wire eval `sub_step_count` to that metadata
- update eval Hydra schema/config/docs and related tests
## Key Implementation Notes
- added strict `TrainingConfig` fields:
- `steps_per_batch`
- `feedback_repetitions`
- introduced Hydra-safe `EvalTrainingConfig` and convert to runtime
`TrainingConfig` in `build_harness_config`
- tinker engine now refreshes student logprobs between steps using
`save_weights_and_get_sampling_client_async`
## Validation
- `uv run ruff check claas/ tests/ --fix`
- `uv run pytest tests/ -q -m "not integration"`
- result: `109 passed, 26 skipped, 5 deselected`
- `uv run ty check`
- unresolved-import diagnostics for heavy runtime deps (`torch`,
`tinker`, `transformers`) are expected in this environment
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
## Release Notes
* **New Features**
* Added support for multi-step training per batch with configurable
`steps_per_batch` parameter
* Added `feedback_repetitions` configuration option for enhanced
training control
* New metric `steps_per_batch_applied` tracks actual steps executed per
batch
* **Documentation**
* Updated configuration structure to use nested training block for
training-specific parameters
* **Refactor**
* Reorganized configuration hierarchy to consolidate training settings
under dedicated training section
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Kion <kion@onepiece.localdomain>
Copy file name to clipboardExpand all lines: .claude/skills/setup-local/SKILL.md
+21-7Lines changed: 21 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
---
2
2
name: setup-local
3
-
description: Set up the full CLaaS stack (vLLM + API + OpenClaw/Telegram) directly on the host without Docker. Use when Docker is unavailable or you want a native setup.
3
+
description: Set up the full CLaaS stack (vLLM + API + OpenClaw/Telegram) locally. Uses Docker if available, falls back to native setup otherwise.
0 commit comments