Skip to content

Trajectory-balance normalization experiments#14

Open
josephdviviano wants to merge 34 commits into
mainfrom
tb_normalization
Open

Trajectory-balance normalization experiments#14
josephdviviano wants to merge 34 commits into
mainfrom
tb_normalization

Conversation

@josephdviviano

Copy link
Copy Markdown
Collaborator

Summary

  • Implements and benchmarks trajectory-length normalization for GFlowNet TB and VarGrad losses on HyperGrid environments, testing whether dividing squared scores by trajectory length T improves convergence when trajectory lengths vary.
  • Adds a full experiment pipeline: 4 custom HyperGrid reward environments (original, cosine, bitwise_xor, multiplicative_coprime), Optuna HP search (50 trials per algo×env), and confirmation runs (top-3 configs × 5 seeds × 4000 iterations).
  • Restructures the repo: renames code/ to lir/, adds conftest.py, environment.yaml, and a test suite for LIR gradient correctness, convergence, and pruning.

What changed

  • lir/gflownet/: tb_normalize.py (training loop, 4 algorithms, CLI), hypergrid.py (ModifiedHyperGrid with 4 reward functions, mode analysis, GF(2) feasibility checks), checkpoint.py (atomic run bookkeeping).
  • experiments/: optuna_sweep.py (two-phase Optuna driver), SLURM launch scripts, gflownet_results.ipynb (all figures and tables for the paper).
  • lir/test/: Gradient sign verification, fixed-point convergence, attention-mask pruning tests.
  • lir/lir__simpler.py: Core LIR training loop (moved from code/, expanded).
  • WRITEUP.md: Synthesized experiment notes with key findings on gradient clipping and logZ learning rate behavior.

Key results

  • ModLPV removes the need for extreme gradient clipping: LPV requires grad_clip ~0.01–0.04 (search boundary), while ModLPV works at ~0.2–1.2 (10–65× less aggressive).
  • ModTB requires faster logZ learning on complex environments: effective logZ lr is ~46× higher for ModTB vs TB on multiplicative_coprime (20k modes), but similar on original (256 modes).

Test plan

  • pytest passes (core LIR tests + checkpoint tests)
  • python -m lir.gflownet.tb_normalize --envs original --algos TBGFlowNet --n_iterations 10 --n-seeds 1 runs without error
  • Notebook experiments/gflownet_results.ipynb renders against existing optuna_results/

🤖 Generated with Claude Code

josephdviviano and others added 30 commits November 28, 2025 04:15
- Fix Modified TB and VarGrad losses: normalize score by trajectory
  length *before* squaring (per-step average error) instead of after
  (was effectively O(T) bias toward long trajectories)
- Fix traj_len computation: count actions via ~actions.is_dummy instead
  of counting non-sink states (semantically correct, same values)
- Add replay buffer support: sample batch_size fresh trajectories, keep
  (1-frac) on-policy, replace the rest with buffer samples, compute
  single loss with recalculated log-probs
- Add optimizer selection (adamw/adam/sgd), beta2 tuning, cosine LR
  schedule, loss clamping, algorithm filtering (--algos)
- Add --output-dir flag for per-job result isolation
- Add SLURM scripts: run_single.sh (one job), launch_sweep.sh (192-job
  grid over algos x envs x lr x beta2 x grad_clip)
- Add aggregate_results.py to merge per-job CSVs, rank HP configs, and
  plot best-per-algo comparisons
- Move tb.plan.md into .claude/, add IDEAS_TO_TRY.md for deferred
  stabilization ideas

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- _store_all_states_tensor() -> _enumerate_all_states_tensor() (upstream rename)
- check_action_validity -> debug param in DiscreteEnv.__init__
- trajectories.conditioning -> trajectories.states.conditions[0]
- EPS_REWARD_CMP 1e-12 -> 1e-6 (float32 vs float64 precision mismatch)
- Disable validate_modes in _build_env (quick-check heuristics have false negatives)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Keep fixed loss formulations (per-step normalization) and API compat
fixes; discard duplicate old class definitions from remote.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Use conda shell hook instead of source activate (args leak fix)
- Use SLURM_SUBMIT_DIR for reliable repo path resolution
- Reduce cpus-per-task from 4 to 1
- Fix _enumerate_all_states_tensor -> _store_all_states_tensor
- Add missing _calculate_log_partition() call
- Fix DiscreteEnv debug -> check_action_validity kwarg
- Enable performance_mode in set_seed to skip deterministic algos

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- run_single.sh: Save/restore $@ around conda init to prevent arg leakage
  into conda activate (fixed 90 jobs); set CUBLAS_WORKSPACE_CONFIG for
  deterministic mode; switch partition from main to long
- tb_normalize.py: Fix torchgfn 2.3.1 API — trajectories.states.conditions
  → trajectories.conditioning (fixed all 48 ModifiedTBGFlowNet jobs)
- checkpoint.py: Add fsync + retry with exponential backoff to
  _write_json_atomic for shared filesystem resilience (fixed 23 jobs)
- launch_sweep.sh: Fix job name collision — Modified{TB,LogPart} both
  truncated to "Modifi"; now uses unique short tags (TB/ModTB/LogPV/ModLP)
- relaunch_failed.sh: Script to relaunch exactly the 79 incomplete combos

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Replace upstream gfn.utils.training.validate with local version that
  uses .sum() instead of .mean() for proper L1 distance between
  probability distributions (was off by factor of n_terminating_states)
- Sample fresh from current policy instead of reusing biased training
  states (visited_terminating_states[-n:])
- Bump validation_samples from 20k to 100k for reliable L1 estimates
  across the 331k-state HyperGrid
- Add results notebook for sweep analysis

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Expanded LR/schedule grid (4 cosine + 2 linear), added smoke_test.sh
for local pre-flight validation, hardened aggregate_results.py against
corrupt configs, and removed obsolete relaunch script and old results.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- checkpoint.py: strip all resume machinery (auto-discover, partial CSV,
  hydrate completed); RunState is now a simple path container; add
  write_completed_checkpoint() as one-shot completion marker; include
  algo_names + timestamp in build_effective_config to prevent hash
  collisions across algorithms
- tb_normalize.py: collect all records in memory and write CSV once at
  end instead of incremental append; remove --resume-from CLI arg; fix
  set_seed() API (remove stale performance_mode kwarg); switch from
  cosine_schedule bool to lr_schedule enum (cosine/linear/none) with
  lr_end_factor; change normalization to divide after squaring to avoid
  1/T² gradient attenuation; add --lr-logz-multiplier CLI option
- aggregate_results.py: filter glob with timestamp regex to skip
  aggregated_results.csv; add _deduplicate() to detect and remove
  duplicate config rows; fix n_mode_states_found → n_modes_found
- launch_sweep.sh: add lr_logz_multiplier sweep dimension
- Remove tests/test_checkpointing.py (no more checkpointing)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- aggregate_results.py: add lr_logz, lr_logz_multiplier, lr_schedule to
  augmented HP columns; drop stale cosine_schedule
- tb_normalize.py: persist lr_logz_multiplier in CONFIG so it appears in
  the config snapshot
- results.ipynb: updated analysis notebook

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix n_modes_found → n_mode_states_found in aggregate_results.py
- Update results.ipynb with latest analysis
- Add .gitignore entries for artifacts (.DS_Store, sweep_results, etc.)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…rastructure

- Log unnormalized (base-equation) loss for modified algorithms alongside
  the normalized loss, enabling cross-algorithm loss comparison on the same
  scale after training.
- Add JSD metric computation with chunked sampling to avoid OOM.
- Add high-quality final validation (10M samples) for reliable comparison.
- Add Optuna HP sweep infrastructure (search + confirmation phases).
- Add parallelized SLURM launch scripts for confirmation runs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant