Conversation
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughESM2 model architecture refactored to rename the base model namespace from "esm" to "model" across core components, model classes, and related tests. A new Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@gagank1 : Are you working on a readme? |
|
/ok to test |
@gagank1, there was an error processing your request: See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/1/ |
|
/ok to test |
@gagank1, there was an error processing your request: See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/1/ |
|
/ok to test c34c09b |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 5
♻️ Duplicate comments (3)
bionemo-recipes/recipes/esm2_peft_te/example_8m_checkpoint/esm_nv.py (1)
405-407: Same_tied_weights_keysdict type concern.See comment on
modeling_esm_te.py.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bionemo-recipes/recipes/esm2_peft_te/example_8m_checkpoint/esm_nv.py` around lines 405 - 407, The class variable _tied_weights_keys uses the modern built-in generic dict[...] typing which may be incompatible with the rest of the codebase; change its annotation to use typing.Dict[str, str] (and ensure Dict is imported) or use typing.Mapping if immutability is desired, mirroring the fix applied in modeling_esm_te.py so the declaration becomes ClassVar[Dict[str, str]] with the same key/value entries retained.bionemo-recipes/recipes/esm2_native_te/example_8m_checkpoint/esm_nv.py (1)
405-407: Same_tied_weights_keysdict type concern as inmodeling_esm_te.py.See comment on the canonical file — HF expects
list[str], notdict[str, str].🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bionemo-recipes/recipes/esm2_native_te/example_8m_checkpoint/esm_nv.py` around lines 405 - 407, The _tied_weights_keys typed as ClassVar[dict[str, str]] should be changed to ClassVar[list[str]] to match HF expectations (same fix as in modeling_esm_te.py); replace the dict literal with a list of the relevant parameter names (e.g. ["lm_head.decoder.weight", "model.embeddings.word_embeddings.weight"]) and update any usages that assume dict semantics to use the list order or explicit pairing where needed.bionemo-recipes/recipes/esm2_accelerate_te/example_8m_checkpoint/esm_nv.py (1)
405-407: Same_tied_weights_keysdict type concern.See comment on
modeling_esm_te.py.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bionemo-recipes/recipes/esm2_accelerate_te/example_8m_checkpoint/esm_nv.py` around lines 405 - 407, _tied_weights_keys is annotated as a mutable dict ClassVar which repeats the same typing concern as in modeling_esm_te.py; change the annotation to an immutable mapping type (e.g., ClassVar[Mapping[str, str]] from typing) and, to avoid accidental mutation, assign a read-only view (e.g., types.MappingProxyType({"lm_head.decoder.weight": "model.embeddings.word_embeddings.weight"})); update the import list to include typing.Mapping and types if not present and mirror the same pattern used/fixed in modeling_esm_te.py.
🧹 Nitpick comments (4)
bionemo-recipes/vllm/launch.sh (1)
50-50:exec $DOCKER_CMDis unquoted — word splitting will break paths with spaces.If
PROJECT_ROOTcontains spaces (e.g.,/home/user/my projects/...), the-vargument will be incorrectly split into multiple tokens. Use a Bash array to avoid this:🔧 Proposed fix (array-based approach)
Replace the string-based
DOCKER_CMDwith an array throughout the script:-DOCKER_CMD="docker run -itd ..." +DOCKER_CMD=("docker" "run" "-itd" "--gpus" "all" "--network" "host" "--ipc=host" "-e" "HF_TOKEN" "--rm" "--name" "${CONTAINER}_dev") if [ "$MOUNT_DIR" = true ]; then PROJECT_ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/../.." && pwd)" - DOCKER_CMD="$DOCKER_CMD -v ${PROJECT_ROOT}:/workspace/bionemo-framework" + DOCKER_CMD+=("-v" "${PROJECT_ROOT}:/workspace/bionemo-framework") fi -DOCKER_CMD="$DOCKER_CMD $CONTAINER /bin/bash" +DOCKER_CMD+=("$CONTAINER" "/bin/bash") -exec $DOCKER_CMD +exec "${DOCKER_CMD[@]}"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bionemo-recipes/vllm/launch.sh` at line 50, The exec $DOCKER_CMD call uses an unquoted string which allows word-splitting and breaks mount paths with spaces (e.g., PROJECT_ROOT used in -v). Convert the string DOCKER_CMD into a Bash array (e.g., DOCKER_CMD=(docker run ...)) and update all places that build/append to DOCKER_CMD so they push elements into the array, then replace exec $DOCKER_CMD with exec "${DOCKER_CMD[@]}" so each argument (including the -v PROJECT_ROOT value) is preserved; update any helper code that concatenates DOCKER_CMD to use array operations instead.bionemo-recipes/vllm/Dockerfile (2)
2-3: Base image is hosted on an internal NVIDIA GitLab registry — not pullable outside NVIDIA.
gitlab-master.nvidia.com:5005/dl/dgx/vllm:main-py3.43005406-develrequires internal network/credentials access. The commented-outnvcr.io/nvidia/vllm:26.01-py3alternative on line 1 is the publicly accessible equivalent. Once an NGC release with vLLM ≥ 0.14 becomes available, switching to the public image will make this recipe usable by external contributors without additional setup.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bionemo-recipes/vllm/Dockerfile` around lines 2 - 3, The Dockerfile's FROM line currently uses an internal image tag "gitlab-master.nvidia.com:5005/dl/dgx/vllm:main-py3.43005406-devel" which is not pullable externally; replace that base image with the public equivalent "nvcr.io/nvidia/vllm:26.01-py3" (or parametrize the base via a build ARG) in the FROM instruction so external contributors can build without internal credentials, and retain a short comment noting the vLLM>=0.14 requirement and to switch back when an official public image with the needed version is available.
30-30: Pin thetransformer_engineversion for reproducible builds.
pip install --no-build-isolation transformer_engine[pytorch]with no version specifier will install whichever version is latest at build time. TE releases frequently and has had breaking API changes between major versions (e.g., 1.x → 2.x). A silent version bump can break the integration without any change to this file.🔧 Proposed fix
-RUN pip install --no-build-isolation transformer_engine[pytorch] +RUN pip install --no-build-isolation "transformer_engine[pytorch]==<verified_version>"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bionemo-recipes/vllm/Dockerfile` at line 30, The Dockerfile currently installs transformer_engine without a version pin (the RUN pip install --no-build-isolation transformer_engine[pytorch] line); change that to install a specific, tested TE release by updating that RUN to include an exact version specifier (for example: RUN pip install --no-build-isolation transformer_engine==<MAJOR.MINOR.PATCH>[pytorch]) so builds are reproducible and won’t break on upstream major/minor bumps; optionally add a short comment noting the chosen compatible version.bionemo-recipes/vllm/test_esm2_golden_values.py (1)
46-63:sys.path.insert+os.chdiris fragile for test infrastructure.
sys.path.insert(0, ...)at module level (line 46) andos.chdirinsidefresh_exportmake this script sensitive to working directory and import order. This is acceptable for a standalone validation script run manually inside a container, but consider adding a note that this is not designed to run as part of a pytest suite.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bionemo-recipes/vllm/test_esm2_golden_values.py` around lines 46 - 63, The test module mutates import paths and working directory (sys.path.insert(0, ...), os.chdir(...) inside fresh_export) which is fragile for pytest; update the file to document this by adding a clear module-level comment or docstring near sys.path.insert and a brief note on fresh_export explaining it intentionally changes cwd for export_hf_checkpoint and that the script is not intended to be run under pytest/parallel test runners (referencing sys.path.insert, ESM2_MODEL_DIR, and fresh_export by name), or alternatively guard execution with a main-check so pytest won't import/run it implicitly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@bionemo-recipes/vllm/Dockerfile`:
- Around line 1-36: Add a non-root user and switch to it in the Dockerfile so
the container does not run as root; create a user/group (e.g., bionemo), chown
the application directory (/workspace/bionemo) and any cache/venv paths to that
user, and add a final USER bionemo line. Locate the Dockerfile sections around
WORKDIR /workspace/bionemo and COPY . . and insert user creation
(useradd/addgroup or groupadd) and chown before switching context, then add USER
bionemo at the end so subsequent runtime processes run unprivileged.
In `@bionemo-recipes/vllm/launch.sh`:
- Around line 36-40: The script hardcodes "--name vllm_dev" in DOCKER_CMD which
prevents multiple instances and misrepresents the positional $CONTAINER
argument; update the usage comment to clarify that the positional argument is
the image name (or image_name) and change the DOCKER_CMD assignments in the
HEADLESS branch to use a container name derived from $CONTAINER (for example
--name "$CONTAINER" or --name "${CONTAINER}_dev") instead of "vllm_dev" so the
image argument is also used as the container name and avoids name collisions.
In `@bionemo-recipes/vllm/README.md`:
- Around line 21-23: Change the fenced code block that currently uses the wrong
language tag; locate the block containing the shell command "python
test_esm2_golden_values.py" in the README and replace the opening fence language
identifier from ```python to ```bash so the command is treated as a shell
snippet and highlighted correctly.
- Line 17: The README's "or use `launch.sh`" is misleading because `launch.sh
--mount_dir` mounts the repo to `/workspace/bionemo-framework` while the manual
`docker run` mounts to `/workspace/bionemo`, causing `python
test_esm2_golden_values.py` to fail when run from WORKDIR; fix by either (A)
update `launch.sh` to mount the project root into `/workspace/bionemo` instead
of `/workspace/bionemo-framework` (adjust mount target and any downstream path
references in launch.sh), or (B) update README.md to explicitly document that
`launch.sh --mount_dir` mounts at `/workspace/bionemo-framework` and instruct
users to cd into the correct subdirectory (where `test_esm2_golden_values.py`
lives) before running the test; mention the affected files `launch.sh`,
`README.md`, and `test_esm2_golden_values.py` so reviewers can locate the
changes.
In `@bionemo-recipes/vllm/test_esm2_golden_values.py`:
- Around line 147-210: The script currently only prints comparisons and never
fails; add assertions that enforce the golden-value tolerances using RTOL and
ATOL: for each pair in pairs (refer to the pairs list and variables a, b),
assert np.allclose(a, b, rtol=RTOL, atol=ATOL) (or equivalently assert
(np.abs(a-b) <= ATOL + RTOL * np.abs(b)).all()) and fail the test if not, and
also assert cosine_sim(a, b) exceeds a sensible threshold or that exact is True
when ATOL/RTOL are zero; add per-sequence assertions inside the per-sequence
loop to ensure each sequence max-diff <= ATOL + RTOL * max(|b_i|) so the test
fails on unacceptable drift.
---
Duplicate comments:
In `@bionemo-recipes/recipes/esm2_accelerate_te/example_8m_checkpoint/esm_nv.py`:
- Around line 405-407: _tied_weights_keys is annotated as a mutable dict
ClassVar which repeats the same typing concern as in modeling_esm_te.py; change
the annotation to an immutable mapping type (e.g., ClassVar[Mapping[str, str]]
from typing) and, to avoid accidental mutation, assign a read-only view (e.g.,
types.MappingProxyType({"lm_head.decoder.weight":
"model.embeddings.word_embeddings.weight"})); update the import list to include
typing.Mapping and types if not present and mirror the same pattern used/fixed
in modeling_esm_te.py.
In `@bionemo-recipes/recipes/esm2_native_te/example_8m_checkpoint/esm_nv.py`:
- Around line 405-407: The _tied_weights_keys typed as ClassVar[dict[str, str]]
should be changed to ClassVar[list[str]] to match HF expectations (same fix as
in modeling_esm_te.py); replace the dict literal with a list of the relevant
parameter names (e.g. ["lm_head.decoder.weight",
"model.embeddings.word_embeddings.weight"]) and update any usages that assume
dict semantics to use the list order or explicit pairing where needed.
In `@bionemo-recipes/recipes/esm2_peft_te/example_8m_checkpoint/esm_nv.py`:
- Around line 405-407: The class variable _tied_weights_keys uses the modern
built-in generic dict[...] typing which may be incompatible with the rest of the
codebase; change its annotation to use typing.Dict[str, str] (and ensure Dict is
imported) or use typing.Mapping if immutability is desired, mirroring the fix
applied in modeling_esm_te.py so the declaration becomes ClassVar[Dict[str,
str]] with the same key/value entries retained.
---
Nitpick comments:
In `@bionemo-recipes/vllm/Dockerfile`:
- Around line 2-3: The Dockerfile's FROM line currently uses an internal image
tag "gitlab-master.nvidia.com:5005/dl/dgx/vllm:main-py3.43005406-devel" which is
not pullable externally; replace that base image with the public equivalent
"nvcr.io/nvidia/vllm:26.01-py3" (or parametrize the base via a build ARG) in the
FROM instruction so external contributors can build without internal
credentials, and retain a short comment noting the vLLM>=0.14 requirement and to
switch back when an official public image with the needed version is available.
- Line 30: The Dockerfile currently installs transformer_engine without a
version pin (the RUN pip install --no-build-isolation
transformer_engine[pytorch] line); change that to install a specific, tested TE
release by updating that RUN to include an exact version specifier (for example:
RUN pip install --no-build-isolation
transformer_engine==<MAJOR.MINOR.PATCH>[pytorch]) so builds are reproducible and
won’t break on upstream major/minor bumps; optionally add a short comment noting
the chosen compatible version.
In `@bionemo-recipes/vllm/launch.sh`:
- Line 50: The exec $DOCKER_CMD call uses an unquoted string which allows
word-splitting and breaks mount paths with spaces (e.g., PROJECT_ROOT used in
-v). Convert the string DOCKER_CMD into a Bash array (e.g., DOCKER_CMD=(docker
run ...)) and update all places that build/append to DOCKER_CMD so they push
elements into the array, then replace exec $DOCKER_CMD with exec
"${DOCKER_CMD[@]}" so each argument (including the -v PROJECT_ROOT value) is
preserved; update any helper code that concatenates DOCKER_CMD to use array
operations instead.
In `@bionemo-recipes/vllm/test_esm2_golden_values.py`:
- Around line 46-63: The test module mutates import paths and working directory
(sys.path.insert(0, ...), os.chdir(...) inside fresh_export) which is fragile
for pytest; update the file to document this by adding a clear module-level
comment or docstring near sys.path.insert and a brief note on fresh_export
explaining it intentionally changes cwd for export_hf_checkpoint and that the
script is not intended to be run under pytest/parallel test runners (referencing
sys.path.insert, ESM2_MODEL_DIR, and fresh_export by name), or alternatively
guard execution with a main-check so pytest won't import/run it implicitly.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (20)
bionemo-recipes/models/esm2/convert.pybionemo-recipes/models/esm2/export.pybionemo-recipes/models/esm2/modeling_esm_te.pybionemo-recipes/models/esm2/tests/test_cp_bshd.pybionemo-recipes/models/esm2/tests/test_cp_thd.pybionemo-recipes/models/esm2/tests/test_distributed_fp8.pybionemo-recipes/models/esm2/tests/test_distributed_strategies.pybionemo-recipes/models/esm2/tests/test_modeling_esm_te.pybionemo-recipes/recipes/esm2_accelerate_te/example_8m_checkpoint/esm_nv.pybionemo-recipes/recipes/esm2_native_te/example_8m_checkpoint/esm_nv.pybionemo-recipes/recipes/esm2_native_te/tests/test_stop_and_go.pybionemo-recipes/recipes/esm2_native_te/train_ddp.pybionemo-recipes/recipes/esm2_native_te/train_ddp_cp.pybionemo-recipes/recipes/esm2_native_te/train_fsdp2.pybionemo-recipes/recipes/esm2_native_te/train_fsdp2_cp.pybionemo-recipes/recipes/esm2_peft_te/example_8m_checkpoint/esm_nv.pybionemo-recipes/vllm/Dockerfilebionemo-recipes/vllm/README.mdbionemo-recipes/vllm/launch.shbionemo-recipes/vllm/test_esm2_golden_values.py
bionemo-recipes/vllm/Dockerfile
Outdated
| # FROM nvcr.io/nvidia/vllm:26.01-py3 | ||
| FROM gitlab-master.nvidia.com:5005/dl/dgx/vllm:main-py3.43005406-devel | ||
| # using this because we need vllm >= 0.14 to work with Transformers v5. no released nvidia version with this yet. | ||
|
|
||
| # The vLLM image has CUDA 13.1 runtime and nvcc, but missing dev headers (cusparse.h, nvtx, etc.) | ||
| # Install cuda-keyring to add NVIDIA's apt repo, then install the dev headers for transformer_engine | ||
| RUN apt-get update && apt-get install -y --no-install-recommends wget && \ | ||
| wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-keyring_1.1-1_all.deb && \ | ||
| dpkg -i cuda-keyring_1.1-1_all.deb && \ | ||
| rm cuda-keyring_1.1-1_all.deb && \ | ||
| apt-get update && apt-get install -y --no-install-recommends \ | ||
| cuda-nvtx-13-1 \ | ||
| cuda-cupti-dev-13-1 \ | ||
| cuda-nvml-dev-13-1 \ | ||
| libcusparse-dev-13-1 \ | ||
| libcusolver-dev-13-1 \ | ||
| libcufft-dev-13-1 \ | ||
| libnvjitlink-dev-13-1 \ | ||
| libnvjpeg-dev-13-1 \ | ||
| libcublasmp0-dev-cuda-13 \ | ||
| libcudnn9-cuda-13 \ | ||
| && rm -rf /var/lib/apt/lists/* | ||
|
|
||
| # Install remaining dependencies | ||
| RUN --mount=type=cache,target=/root/.cache/pip \ | ||
| --mount=type=bind,source=requirements.txt,target=/requirements.txt \ | ||
| pip install -r /requirements.txt | ||
|
|
||
| # Install transformer_engine from source (force build for CUDA 13.1, not pre-built cu12 wheel) | ||
| RUN pip install --no-build-isolation transformer_engine[pytorch] | ||
|
|
||
| RUN pip install transformers[torch]==5.0.0 | ||
|
|
||
|
|
||
| WORKDIR /workspace/bionemo | ||
| COPY . . |
There was a problem hiding this comment.
Container runs as root — add a USER directive.
No USER command is present, so all processes run as root inside the container. This is flagged by Trivy (DS-0002) and violates the principle of least privilege.
🔒 Proposed fix
WORKDIR /workspace/bionemo
COPY . .
+
+RUN useradd -m -u 1000 appuser && chown -R appuser /workspace/bionemo
+USER appuser📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # FROM nvcr.io/nvidia/vllm:26.01-py3 | |
| FROM gitlab-master.nvidia.com:5005/dl/dgx/vllm:main-py3.43005406-devel | |
| # using this because we need vllm >= 0.14 to work with Transformers v5. no released nvidia version with this yet. | |
| # The vLLM image has CUDA 13.1 runtime and nvcc, but missing dev headers (cusparse.h, nvtx, etc.) | |
| # Install cuda-keyring to add NVIDIA's apt repo, then install the dev headers for transformer_engine | |
| RUN apt-get update && apt-get install -y --no-install-recommends wget && \ | |
| wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-keyring_1.1-1_all.deb && \ | |
| dpkg -i cuda-keyring_1.1-1_all.deb && \ | |
| rm cuda-keyring_1.1-1_all.deb && \ | |
| apt-get update && apt-get install -y --no-install-recommends \ | |
| cuda-nvtx-13-1 \ | |
| cuda-cupti-dev-13-1 \ | |
| cuda-nvml-dev-13-1 \ | |
| libcusparse-dev-13-1 \ | |
| libcusolver-dev-13-1 \ | |
| libcufft-dev-13-1 \ | |
| libnvjitlink-dev-13-1 \ | |
| libnvjpeg-dev-13-1 \ | |
| libcublasmp0-dev-cuda-13 \ | |
| libcudnn9-cuda-13 \ | |
| && rm -rf /var/lib/apt/lists/* | |
| # Install remaining dependencies | |
| RUN --mount=type=cache,target=/root/.cache/pip \ | |
| --mount=type=bind,source=requirements.txt,target=/requirements.txt \ | |
| pip install -r /requirements.txt | |
| # Install transformer_engine from source (force build for CUDA 13.1, not pre-built cu12 wheel) | |
| RUN pip install --no-build-isolation transformer_engine[pytorch] | |
| RUN pip install transformers[torch]==5.0.0 | |
| WORKDIR /workspace/bionemo | |
| COPY . . | |
| # FROM nvcr.io/nvidia/vllm:26.01-py3 | |
| FROM gitlab-master.nvidia.com:5005/dl/dgx/vllm:main-py3.43005406-devel | |
| # using this because we need vllm >= 0.14 to work with Transformers v5. no released nvidia version with this yet. | |
| # The vLLM image has CUDA 13.1 runtime and nvcc, but missing dev headers (cusparse.h, nvtx, etc.) | |
| # Install cuda-keyring to add NVIDIA's apt repo, then install the dev headers for transformer_engine | |
| RUN apt-get update && apt-get install -y --no-install-recommends wget && \ | |
| wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-keyring_1.1-1_all.deb && \ | |
| dpkg -i cuda-keyring_1.1-1_all.deb && \ | |
| rm cuda-keyring_1.1-1_all.deb && \ | |
| apt-get update && apt-get install -y --no-install-recommends \ | |
| cuda-nvtx-13-1 \ | |
| cuda-cupti-dev-13-1 \ | |
| cuda-nvml-dev-13-1 \ | |
| libcusparse-dev-13-1 \ | |
| libcusolver-dev-13-1 \ | |
| libcufft-dev-13-1 \ | |
| libnvjitlink-dev-13-1 \ | |
| libnvjpeg-dev-13-1 \ | |
| libcublasmp0-dev-cuda-13 \ | |
| libcudnn9-cuda-13 \ | |
| && rm -rf /var/lib/apt/lists/* | |
| # Install remaining dependencies | |
| RUN --mount=type=cache,target=/root/.cache/pip \ | |
| --mount=type=bind,source=requirements.txt,target=/requirements.txt \ | |
| pip install -r /requirements.txt | |
| # Install transformer_engine from source (force build for CUDA 13.1, not pre-built cu12 wheel) | |
| RUN pip install --no-build-isolation transformer_engine[pytorch] | |
| RUN pip install transformers[torch]==5.0.0 | |
| WORKDIR /workspace/bionemo | |
| COPY . . | |
| RUN useradd -m -u 1000 appuser && chown -R appuser /workspace/bionemo | |
| USER appuser |
🧰 Tools
🪛 Trivy (0.69.1)
[error] 1-1: Image user should not be 'root'
Specify at least 1 USER command in Dockerfile with non-root user as argument
Rule: DS-0002
(IaC/Dockerfile)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@bionemo-recipes/vllm/Dockerfile` around lines 1 - 36, Add a non-root user and
switch to it in the Dockerfile so the container does not run as root; create a
user/group (e.g., bionemo), chown the application directory (/workspace/bionemo)
and any cache/venv paths to that user, and add a final USER bionemo line. Locate
the Dockerfile sections around WORKDIR /workspace/bionemo and COPY . . and
insert user creation (useradd/addgroup or groupadd) and chown before switching
context, then add USER bionemo at the end so subsequent runtime processes run
unprivileged.
bionemo-recipes/vllm/launch.sh
Outdated
| if [ "$HEADLESS" = true ]; then | ||
| DOCKER_CMD="docker run -itd --gpus all --network host --ipc=host -e HF_TOKEN --rm --name vllm_dev" | ||
| else | ||
| DOCKER_CMD="docker run -it --gpus all --network host --ipc=host -e HF_TOKEN --rm --name vllm_dev" | ||
| fi |
There was a problem hiding this comment.
Hardcoded --name vllm_dev prevents running multiple containers and shadows the $CONTAINER argument's meaning.
The script's positional argument ($CONTAINER) is the Docker image name, but the running container name is always vllm_dev. This has two problems:
- Starting a second instance (e.g., a second headless container) will fail with
docker: Error response from daemon: Conflict. The container name "/vllm_dev" is already in use. - The usage comment says
<container_name>but it's really<image_name>, which is confusing.
Consider using $CONTAINER as both the image name and the container name (or derive the container name from it):
🔧 Proposed fix
-DOCKER_CMD="docker run -itd --gpus all --network host --ipc=host -e HF_TOKEN --rm --name vllm_dev"
+DOCKER_CMD="docker run -itd --gpus all --network host --ipc=host -e HF_TOKEN --rm --name ${CONTAINER}_dev"-DOCKER_CMD="docker run -it --gpus all --network host --ipc=host -e HF_TOKEN --rm --name vllm_dev"
+DOCKER_CMD="docker run -it --gpus all --network host --ipc=host -e HF_TOKEN --rm --name ${CONTAINER}_dev"Also update the usage comment on line 4:
-# Usage: ./launch.sh <container_name> [--mount_dir] [--headless]
+# Usage: ./launch.sh <image_name> [--mount_dir] [--headless]📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if [ "$HEADLESS" = true ]; then | |
| DOCKER_CMD="docker run -itd --gpus all --network host --ipc=host -e HF_TOKEN --rm --name vllm_dev" | |
| else | |
| DOCKER_CMD="docker run -it --gpus all --network host --ipc=host -e HF_TOKEN --rm --name vllm_dev" | |
| fi | |
| if [ "$HEADLESS" = true ]; then | |
| DOCKER_CMD="docker run -itd --gpus all --network host --ipc=host -e HF_TOKEN --rm --name ${CONTAINER}_dev" | |
| else | |
| DOCKER_CMD="docker run -it --gpus all --network host --ipc=host -e HF_TOKEN --rm --name ${CONTAINER}_dev" | |
| fi |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@bionemo-recipes/vllm/launch.sh` around lines 36 - 40, The script hardcodes
"--name vllm_dev" in DOCKER_CMD which prevents multiple instances and
misrepresents the positional $CONTAINER argument; update the usage comment to
clarify that the positional argument is the image name (or image_name) and
change the DOCKER_CMD assignments in the HEADLESS branch to use a container name
derived from $CONTAINER (for example --name "$CONTAINER" or --name
"${CONTAINER}_dev") instead of "vllm_dev" so the image argument is also used as
the container name and avoids name collisions.
bionemo-recipes/vllm/README.md
Outdated
| docker run -it --gpus all --network host --ipc=host -e HF_TOKEN --rm -v ${PWD}:/workspace/bionemo vllm /bin/bash | ||
| ``` | ||
|
|
||
| or use `launch.sh`. |
There was a problem hiding this comment.
launch.sh mounts to a different container path than the manual docker run command — the README's test instruction will break after using launch.sh --mount_dir.
The manual command (line 14) mounts ${PWD} → /workspace/bionemo, which is the container's WORKDIR, so python test_esm2_golden_values.py works directly. However, launch.sh --mount_dir mounts the project root (two levels up) to /workspace/bionemo-framework, leaving /workspace/bionemo populated only by the image's COPY . . layer. Running python test_esm2_golden_values.py from WORKDIR after using --mount_dir will fail unless the user navigates to the correct subdirectory. The "or use launch.sh" phrasing implies equivalence — either clarify the path difference or update launch.sh to mount to /workspace/bionemo instead of /workspace/bionemo-framework.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@bionemo-recipes/vllm/README.md` at line 17, The README's "or use `launch.sh`"
is misleading because `launch.sh --mount_dir` mounts the repo to
`/workspace/bionemo-framework` while the manual `docker run` mounts to
`/workspace/bionemo`, causing `python test_esm2_golden_values.py` to fail when
run from WORKDIR; fix by either (A) update `launch.sh` to mount the project root
into `/workspace/bionemo` instead of `/workspace/bionemo-framework` (adjust
mount target and any downstream path references in launch.sh), or (B) update
README.md to explicitly document that `launch.sh --mount_dir` mounts at
`/workspace/bionemo-framework` and instruct users to cd into the correct
subdirectory (where `test_esm2_golden_values.py` lives) before running the test;
mention the affected files `launch.sh`, `README.md`, and
`test_esm2_golden_values.py` so reviewers can locate the changes.
bionemo-recipes/vllm/README.md
Outdated
| ```python | ||
| python test_esm2_golden_values.py | ||
| ``` |
There was a problem hiding this comment.
Wrong code fence language identifier — should be bash, not python.
The python test_esm2_golden_values.py command is a shell invocation, not Python source code. Using ```python causes syntax highlighters to misparse it.
📝 Proposed fix
-```python
+```bash
python test_esm2_golden_values.py</details>
<details>
<summary>🤖 Prompt for AI Agents</summary>
Verify each finding against the current code and only fix it if needed.
In @bionemo-recipes/vllm/README.md around lines 21 - 23, Change the fenced code
block that currently uses the wrong language tag; locate the block containing
the shell command "python test_esm2_golden_values.py" in the README and replace
the opening fence language identifier from python to bash so the command
is treated as a shell snippet and highlighted correctly.
</details>
<!-- fingerprinting:phantom:poseidon:churro -->
<!-- This is an auto-generated comment by CodeRabbit -->
| if __name__ == "__main__": | ||
| print(f"GPUs: {torch.cuda.device_count()}") | ||
|
|
||
| # Step 0: fresh export (facebook HF -> our TE format) | ||
| print("\n[0/3] Exporting checkpoint ...") | ||
| MODEL_ID = fresh_export() | ||
|
|
||
| print(f"MODEL_ID: {MODEL_ID}") | ||
| print(f"REFERENCE_MODEL_ID: {REFERENCE_MODEL_ID}") | ||
| print(f"Sequences: {len(SEQUENCES)}") | ||
|
|
||
| # 1) vLLM on exported model | ||
| print("\n[1/3] vLLM inference (exported model) ...") | ||
| emb_vllm = vllm_embed(MODEL_ID, SEQUENCES) | ||
|
|
||
| # 2) HuggingFace on exported model | ||
| print("\n[2/3] HuggingFace inference (exported model) ...") | ||
| emb_hf_exported = hf_embed(MODEL_ID, SEQUENCES) | ||
|
|
||
| # 3) HuggingFace on reference Hub model | ||
| print("\n[3/3] HuggingFace inference (reference model) ...") | ||
| emb_hf_reference = hf_embed(REFERENCE_MODEL_ID, SEQUENCES) | ||
|
|
||
| # ---- Pairwise comparisons ---- | ||
| pairs = [ | ||
| ("vLLM (exported)", "HF (exported)", emb_vllm, emb_hf_exported), | ||
| ("vLLM (exported)", "HF (reference)", emb_vllm, emb_hf_reference), | ||
| ("HF (exported)", "HF (reference)", emb_hf_exported, emb_hf_reference), | ||
| ] | ||
|
|
||
| # ---- Summary table ---- | ||
| header = f"{'Pair':<35} {'max |diff|':>14} {'mean |diff|':>14} {'cos sim':>12} {'exact':>7}" | ||
| sep = "-" * len(header) | ||
| print(f"\n{sep}") | ||
| print(header) | ||
| print(sep) | ||
|
|
||
| for name_a, name_b, a, b in pairs: | ||
| diffs = np.abs(a.astype(np.float64) - b.astype(np.float64)) | ||
| label = f"{name_a} vs {name_b}" | ||
| exact = np.array_equal(a, b) | ||
| print( | ||
| f"{label:<35} {diffs.max():>14.8e} {diffs.mean():>14.8e} " | ||
| f"{cosine_sim(a, b):>12.10f} {'YES' if exact else 'NO':>7}" | ||
| ) | ||
|
|
||
| print(sep) | ||
| print(f"Tolerance: rtol={RTOL}, atol={ATOL} (0 = exact match required)") | ||
|
|
||
| # Per-sequence breakdown | ||
| short = {"vLLM (exported)": "vllm", "HF (exported)": "hf_exp", "HF (reference)": "hf_ref"} | ||
| print("\nPer-sequence max |diff|:") | ||
| for i in range(len(SEQUENCES)): | ||
| row = f" seq {i}:" | ||
| for name_a, name_b, a, b in pairs: | ||
| d = float(np.abs(a[i].astype(np.float64) - b[i].astype(np.float64)).max()) | ||
| row += f" {short[name_a]}_vs_{short[name_b]}={d:.8e}" | ||
| print(row) | ||
|
|
||
| print(sep) | ||
|
|
||
| # Cleanup | ||
| if torch.distributed.is_initialized(): | ||
| torch.distributed.destroy_process_group() |
There was a problem hiding this comment.
Test script has no assertions — it will never fail.
This "test" prints a comparison table but never asserts that results are within tolerance. RTOL and ATOL (line 74) are defined but unused. A silent pass regardless of output drift defeats the purpose of a golden-value test.
Add assertions after the comparison loop, for example:
Suggested assertion block
+ all_passed = True
for name_a, name_b, a, b in pairs:
diffs = np.abs(a.astype(np.float64) - b.astype(np.float64))
label = f"{name_a} vs {name_b}"
exact = np.array_equal(a, b)
print(
f"{label:<35} {diffs.max():>14.8e} {diffs.mean():>14.8e} "
f"{cosine_sim(a, b):>12.10f} {'YES' if exact else 'NO':>7}"
)
+ if not np.allclose(a, b, rtol=RTOL, atol=ATOL):
+ all_passed = False
print(sep)
- print(f"Tolerance: rtol={RTOL}, atol={ATOL} (0 = exact match required)")
+ print(f"Tolerance: rtol={RTOL}, atol={ATOL}")
+
+ assert all_passed, "Golden value comparison failed — see table above for details."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@bionemo-recipes/vllm/test_esm2_golden_values.py` around lines 147 - 210, The
script currently only prints comparisons and never fails; add assertions that
enforce the golden-value tolerances using RTOL and ATOL: for each pair in pairs
(refer to the pairs list and variables a, b), assert np.allclose(a, b,
rtol=RTOL, atol=ATOL) (or equivalently assert (np.abs(a-b) <= ATOL + RTOL *
np.abs(b)).all()) and fail the test if not, and also assert cosine_sim(a, b)
exceeds a sensible threshold or that exact is True when ATOL/RTOL are zero; add
per-sequence assertions inside the per-sequence loop to ensure each sequence
max-diff <= ATOL + RTOL * max(|b_i|) so the test fails on unacceptable drift.
| # only needed at runtime for FP8 training efficiency; users who train with FP8 pass | ||
| # padded_vocab_size explicitly. Keeping vocab_size-sized weights in the checkpoint |
There was a problem hiding this comment.
users who train with FP8 pass padded_vocab_size explicitly
i'm not sure that's a great assumption 😆. CC @jomitchellnv
There was a problem hiding this comment.
yea that's not true, users who train with FP8 don't pass this
| add_pooling_layer: Whether the base model should include a pooling layer. Set to | ||
| ``False`` for exported checkpoints that are saved from ``NVEsmForMaskedLM`` | ||
| (which does not use a pooler). This avoids missing-weight errors in vLLM. |
There was a problem hiding this comment.
why don't we just make the default false and omit the config edit in export.py?
| """NVEsmForMaskedLM is a TransformerEngine-optimized ESM model for masked language modeling.""" | ||
|
|
||
| _tied_weights_keys: ClassVar[dict[str, str]] = {"lm_head.decoder.weight": "esm.embeddings.word_embeddings.weight"} | ||
| _do_not_quantize = ("lm_head.dense", "lm_head.decoder") # Flag for testing that these layers are not quantized. |
There was a problem hiding this comment.
you're deleting _do_not_quantize? we need that
| with transformer_engine.pytorch.autocast(enabled=False): | ||
| with transformer_engine.pytorch.fp8_autocast(enabled=False): |
There was a problem hiding this comment.
wait, why -- fp8_autocast is deprecated
| with transformer_engine.pytorch.quantized_model_init(enabled=False): | ||
| self.dense = transformer_engine.pytorch.Linear( | ||
| config.hidden_size, | ||
| config.hidden_size, | ||
| params_dtype=config.dtype, | ||
| device="meta" if torch.get_default_device() == torch.device("meta") else "cuda", | ||
| init_method=lambda x: torch.nn.init.normal_(x, mean=0.0, std=config.initializer_range), | ||
| ) | ||
| self.dense = transformer_engine.pytorch.Linear( | ||
| config.hidden_size, | ||
| config.hidden_size, | ||
| params_dtype=config.dtype, | ||
| device="meta" if torch.get_default_device() == torch.device("meta") else "cuda", | ||
| init_method=lambda x: torch.nn.init.normal_(x, mean=0.0, std=config.initializer_range), | ||
| ) | ||
|
|
||
| with transformer_engine.pytorch.fp8_model_init(enabled=False): |
There was a problem hiding this comment.
why? this also reverts back to a deprecated context manager, and i think we want that dense layer not to be quantized IIUC
| sample_layers = [ | ||
| model.esm.encoder.layers[0].self_attention.core_attention, | ||
| model.esm.encoder.layers[0].self_attention.layernorm_qkv, | ||
| model.model.encoder.layers[0].self_attention.core_attention, |
There was a problem hiding this comment.
why the rename here from esm -> models?
| with transformer_engine.pytorch.autocast(enabled=False): | ||
| prediction_scores = self.lm_head(sequence_output) | ||
| prediction_scores = self.lm_head(sequence_output) |
There was a problem hiding this comment.
we can't just remove this, this is important
bionemo-recipes/vllm/Dockerfile
Outdated
There was a problem hiding this comment.
i don't think we want this folder at all. At the very least it wont work with our current CI
| mask_ratio_observed = n_masked_per_seq.float() / src_lengths | ||
| scale_factor = (1 - mask_ratio_train) / (1 - mask_ratio_observed) | ||
| reshaped_scale_factor = torch.repeat_interleave(scale_factor, src_lengths_padded, dim=0) | ||
| reshaped_scale_factor = torch.repeat_interleave(scale_factor, src_lengths, dim=0) |
There was a problem hiding this comment.
why the change here?
| # Keep the last layers of the network in higher precision to avoid numerical instability. | ||
| # Please see recipes/fp8_analysis/README.md for more details. | ||
| with transformer_engine.pytorch.autocast(enabled=False): | ||
| with transformer_engine.pytorch.fp8_autocast(enabled=False): |
There was a problem hiding this comment.
this is incorrect. autocast is the API now. It covers FP8/ FP4 autocast.
| self.micro_batch_size = micro_batch_size | ||
| self.max_seq_length = max_seq_length | ||
| self.attn_mask_type = attn_mask_type | ||
| self.add_pooling_layer = add_pooling_layer |
There was a problem hiding this comment.
why does adding an inference script require us to add a pooling layer?
| """NVEsmForMaskedLM is a TransformerEngine-optimized ESM model for masked language modeling.""" | ||
|
|
||
| _tied_weights_keys: ClassVar[dict[str, str]] = {"lm_head.decoder.weight": "esm.embeddings.word_embeddings.weight"} | ||
| _do_not_quantize = ("lm_head.dense", "lm_head.decoder") # Flag for testing that these layers are not quantized. |
| sequence_output = outputs[0] | ||
| with transformer_engine.pytorch.autocast(enabled=False): | ||
| prediction_scores = self.lm_head(sequence_output) | ||
| prediction_scores = self.lm_head(sequence_output) |
There was a problem hiding this comment.
we need this still with transformer_engine.pytorch.autocast(enabled=False):
There was a problem hiding this comment.
Otherwise, when we're doing FP8, its going to look up a context block and see that we're in FP8 and its going to cast these projections to FP8
| device="meta" if torch.get_default_device() == torch.device("meta") else "cuda", | ||
| init_method=lambda x: torch.nn.init.normal_(x, mean=0.0, std=config.initializer_range), | ||
| ) | ||
| self.dense = transformer_engine.pytorch.Linear( |
There was a problem hiding this comment.
We still need with transformer_engine.pytorch.quantized_model_init(enabled=False)
| # Keep the last layers of the network in higher precision to avoid numerical instability. | ||
| # Please see recipes/fp8_analysis/README.md for more details. | ||
| with transformer_engine.pytorch.autocast(enabled=False): | ||
| with transformer_engine.pytorch.fp8_autocast(enabled=False): |
There was a problem hiding this comment.
not correct. need to keep the older version of this autocast not fp8_autocast
jomitchellnv
left a comment
There was a problem hiding this comment.
I am wondering if we can not modify existing files inside bionemo-recipes/models/esm2 and instead just make a new recipe for VLLM stuff and place it in there. Is there some requirement from VLLM that it requires us to rename ModuleLists from ESM -> Model?
Signed-off-by: Gagan Kaushik <gkaushik@nvidia.com>
|
/ok to test a67df14 |
Signed-off-by: Gagan Kaushik <gkaushik@nvidia.com>
Signed-off-by: Gagan Kaushik <gkaushik@nvidia.com>
|
/ok to test 36cdbb2 |
|
/ok to test 36cdbb2 |
Description
This PR makes the ESM2 model compatible with vLLM. Primary issues were a naming incompatibility (vLLM expects
model.prefix and ESM2 usesesm.) andNVEsmModeldefaults toadd_pooling_layer=Truewhen loading the checkpoint even though it's exported without pooler weights.Usage
python test_esm2_golden_values.pyfrom inside the container, instructions to build and run it are provided.Type of changes
Triggering Code Rabbit AI Review
To trigger a code review from code rabbit, comment on a pull request with one of these commands:
See https://docs.coderabbit.ai/reference/review-commands for a full list of commands.
Pre-submit Checklist