feat(profiling): add memory theory comparison and Mosaic analysis#4
Conversation
There was a problem hiding this comment.
Pull request overview
Adds a reusable “memory experiment” module that formalizes theoretical GPU-memory accounting and explicit theory-vs-measurement comparisons, then refactors the LoRA profiling script to use it and emit a dedicated comparison report.
Changes:
- Introduces
stellatscale.memory_experimentwith shared config, theoretical summaries, measured-summary parsing, and comparison report objects. - Adds a profiling script (
scripts/lora_memory_analysis.py) that runs dense vs frozen-LoRA profiling, parses Mosaic output, and writes comparison + theory comparison reports. - Adds focused tests covering dense/frozen-LoRA accounting and comparison-report behavior; updates dependency groups/lockfile for Mosaic + profiling.
Reviewed changes
Copilot reviewed 5 out of 6 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| uv.lock | Adds Mosaic/profiling dependency resolutions and markers. |
| pyproject.toml | Pins Mosaic source and defines mosaic / profiling dependency groups. |
| src/stellatscale/memory_experiment.py | New reusable theory + measurement parsing + comparison-report module. |
| src/stellatscale/init.py | Re-exports memory experiment public API at package top-level. |
| scripts/lora_memory_analysis.py | New end-to-end profiling + Mosaic analysis + theory comparison report generator. |
| tests/test_memory_experiment.py | New tests for accounting correctness, frozen optimizer scoping, and comparison output behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| schedule_factory = cast("Any", schedule) | ||
| schedule_phase_key = "warm" + chr(117) + "p" | ||
| profile_schedule = schedule_factory( | ||
| wait=0, active=EXPERIMENT_CONFIG.steps, repeat=1, **{schedule_phase_key: 0} |
There was a problem hiding this comment.
The profiler schedule setup is intentionally obfuscated (schedule_phase_key = "warm" + chr(117) + "p") and schedule is cast to Any to accept that key. This makes the profiling behavior hard to audit and removes type safety. Prefer calling torch.profiler.schedule directly with the explicit warmup=0 argument (and remove the Any cast) so future readers can understand the schedule and static checks still apply.
| schedule_factory = cast("Any", schedule) | |
| schedule_phase_key = "warm" + chr(117) + "p" | |
| profile_schedule = schedule_factory( | |
| wait=0, active=EXPERIMENT_CONFIG.steps, repeat=1, **{schedule_phase_key: 0} | |
| profile_schedule = schedule( | |
| wait=0, | |
| warmup=0, | |
| active=EXPERIMENT_CONFIG.steps, | |
| repeat=1, |
| for _ in range(EXPERIMENT_CONFIG.steps): | ||
| profiler.step() | ||
|
|
There was a problem hiding this comment.
profiler.step() is called at the start of each iteration. In typical torch.profiler.profile usage, step() should be called at the end of the iteration to delimit the just-recorded work. As written, the first step may be empty and the last forward/backward/optimizer block may never be closed/recorded, skewing traces and memory attribution.
067b7d2 to
c62e3e6
Compare
fa71c54 to
97463f1
Compare
Summary
This PR adds a reproducible workflow for comparing theoretical memory costs with measured runtime behavior for dense and frozen-LoRA linear layers.
It introduces a shared memory experiment module, wires that logic into the profiling script, and adds Mosaic-based visualization output plus report updates so the results can be inspected and documented end to end.
Changes
stellatscale.memory_experimentfor experiment configuration, theoretical summaries, tolerance checks, and comparison utilitiesWhy
The branch moves the memory analysis from one-off profiling code to a codified experiment pipeline. That makes the comparisons easier to reproduce, validate, and reuse in the report.
Validation