Skip to content

ESM2 changes to work with vLLM#1473

Open
gagank1 wants to merge 10 commits intomainfrom
gkaushik/esm2-vllm
Open

ESM2 changes to work with vLLM#1473
gagank1 wants to merge 10 commits intomainfrom
gkaushik/esm2-vllm

Conversation

@gagank1
Copy link
Collaborator

@gagank1 gagank1 commented Feb 18, 2026

Description

This PR makes the ESM2 model compatible with vLLM. Primary issues were a naming incompatibility (vLLM expects model. prefix and ESM2 uses esm.) and NVEsmModel defaults to add_pooling_layer=True when loading the checkpoint even though it's exported without pooler weights.

Usage

python test_esm2_golden_values.py from inside the container, instructions to build and run it are provided.

Type of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Refactor
  • Documentation update
  • Other (please describe):

Triggering Code Rabbit AI Review

To trigger a code review from code rabbit, comment on a pull request with one of these commands:

See https://docs.coderabbit.ai/reference/review-commands for a full list of commands.

Pre-submit Checklist

  • I have tested these changes locally
  • I have updated the documentation accordingly
  • I have added/updated tests as needed
  • All existing tests pass successfully

@gagank1 gagank1 self-assigned this Feb 18, 2026
@copy-pr-bot
Copy link

copy-pr-bot bot commented Feb 18, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 18, 2026

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

ESM2 model architecture refactored to rename the base model namespace from "esm" to "model" across core components, model classes, and related tests. A new add_pooling_layer configuration option was introduced to control pooler presence. Additionally, vLLM integration infrastructure added with Docker setup and validation tooling.

Changes

Cohort / File(s) Summary
Core Model Architecture Refactoring
bionemo-recipes/models/esm2/convert.py, export.py, modeling_esm_te.py
Namespace renaming from "esm" to "model" in weight mappings, base model references, and state dict handling. Added add_pooling_layer configuration option (defaults to True) wired through NVEsmConfig and NVEsmModel. Updated base_model_prefix and tied weights references. Removed quantized_model_init wrapper in favor of fp8_model_init context. Changed state_dict filtering to exclude both _extra_state and .inv_freq keys for vLLM compatibility.
ESM2 Test Suite Updates
bionemo-recipes/models/esm2/tests/test_cp_bshd.py, test_cp_thd.py, test_distributed_fp8.py, test_distributed_strategies.py, test_modeling_esm_te.py
Updated model layer access paths from model.esm.encoder.layers to model.model.encoder.layers across distributed testing, context-parallel gradient validation, and state_dict verification. No functional logic changes, only attribute path corrections.
Recipe Implementation Propagation
bionemo-recipes/recipes/esm2_accelerate_te/..., esm2_native_te/..., esm2_peft_te/.../esm_nv.py
Applied namespace refactoring, pooling layer configuration, and weight initialization updates from core model to recipe implementations. Updated NVEsmPreTrainedModel, NVEsmForMaskedLM, NVEsmForTokenClassification, and NVEsmLMHead across all recipe variants. Replaced direct "esm" references with "model" and removed legacy TE quantization contexts.
Training Script Updates
bionemo-recipes/recipes/esm2_native_te/tests/test_stop_and_go.py, train_ddp.py, train_ddp_cp.py, train_fsdp2.py, train_fsdp2_cp.py
Updated contact_head deletion and encoder layer access paths from model.esm.* to model.model.* to align with new model structure. No logic changes, only path corrections for distributed training configurations (DDP, FSDP2, context parallelism).
vLLM Integration
bionemo-recipes/vllm/Dockerfile, README.md, launch.sh, test_esm2_golden_values.py
New vLLM inference infrastructure added: Dockerfile extends vLLM base with CUDA 13.1 tooling and transformer_engine from source; launch.sh provides Docker run wrapper with optional mounts and headless mode; test_esm2_golden_values.py validates ESM2 embedding consistency across vLLM, HuggingFace, and hub models with numerical comparison metrics.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Poem

🐰 Hop, hop! The model's now refined,
From "esm" paths, we've left behind,
A "model" namespace, clean and bright,
With pooling layers—flexible might!
And vLLM joins the warren's flight, 🚀

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Description check ❓ Inconclusive The description covers the basic what (making ESM2 compatible with vLLM) and usage instructions, but is missing detailed explanation of the changes, their purpose, and how they achieve vLLM compatibility. Add more comprehensive details about the specific changes made, the rationale behind the namespace refactoring from 'esm' to 'model', the pooling layer changes, and how these changes enable vLLM compatibility.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'ESM2 changes to work with vLLM' clearly describes the main objective of the PR - making ESM2 compatible with vLLM.
Docstring Coverage ✅ Passed Docstring coverage is 96.30% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch gkaushik/esm2-vllm

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@broland-hat
Copy link
Collaborator

@gagank1 : Are you working on a readme?

@gagank1
Copy link
Collaborator Author

gagank1 commented Feb 23, 2026

/ok to test

@copy-pr-bot
Copy link

copy-pr-bot bot commented Feb 23, 2026

/ok to test

@gagank1, there was an error processing your request: E1

See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/1/

@gagank1 gagank1 marked this pull request as ready for review February 23, 2026 17:06
@gagank1
Copy link
Collaborator Author

gagank1 commented Feb 23, 2026

/ok to test

@copy-pr-bot
Copy link

copy-pr-bot bot commented Feb 23, 2026

/ok to test

@gagank1, there was an error processing your request: E1

See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/1/

@gagank1
Copy link
Collaborator Author

gagank1 commented Feb 23, 2026

/ok to test c34c09b

@gagank1
Copy link
Collaborator Author

gagank1 commented Feb 23, 2026

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 23, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@gagank1 gagank1 requested a review from broland-hat February 23, 2026 17:17
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (3)
bionemo-recipes/recipes/esm2_peft_te/example_8m_checkpoint/esm_nv.py (1)

405-407: Same _tied_weights_keys dict type concern.

See comment on modeling_esm_te.py.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/recipes/esm2_peft_te/example_8m_checkpoint/esm_nv.py` around
lines 405 - 407, The class variable _tied_weights_keys uses the modern built-in
generic dict[...] typing which may be incompatible with the rest of the
codebase; change its annotation to use typing.Dict[str, str] (and ensure Dict is
imported) or use typing.Mapping if immutability is desired, mirroring the fix
applied in modeling_esm_te.py so the declaration becomes ClassVar[Dict[str,
str]] with the same key/value entries retained.
bionemo-recipes/recipes/esm2_native_te/example_8m_checkpoint/esm_nv.py (1)

405-407: Same _tied_weights_keys dict type concern as in modeling_esm_te.py.

See comment on the canonical file — HF expects list[str], not dict[str, str].

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/recipes/esm2_native_te/example_8m_checkpoint/esm_nv.py`
around lines 405 - 407, The _tied_weights_keys typed as ClassVar[dict[str, str]]
should be changed to ClassVar[list[str]] to match HF expectations (same fix as
in modeling_esm_te.py); replace the dict literal with a list of the relevant
parameter names (e.g. ["lm_head.decoder.weight",
"model.embeddings.word_embeddings.weight"]) and update any usages that assume
dict semantics to use the list order or explicit pairing where needed.
bionemo-recipes/recipes/esm2_accelerate_te/example_8m_checkpoint/esm_nv.py (1)

405-407: Same _tied_weights_keys dict type concern.

See comment on modeling_esm_te.py.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/recipes/esm2_accelerate_te/example_8m_checkpoint/esm_nv.py`
around lines 405 - 407, _tied_weights_keys is annotated as a mutable dict
ClassVar which repeats the same typing concern as in modeling_esm_te.py; change
the annotation to an immutable mapping type (e.g., ClassVar[Mapping[str, str]]
from typing) and, to avoid accidental mutation, assign a read-only view (e.g.,
types.MappingProxyType({"lm_head.decoder.weight":
"model.embeddings.word_embeddings.weight"})); update the import list to include
typing.Mapping and types if not present and mirror the same pattern used/fixed
in modeling_esm_te.py.
🧹 Nitpick comments (4)
bionemo-recipes/vllm/launch.sh (1)

50-50: exec $DOCKER_CMD is unquoted — word splitting will break paths with spaces.

If PROJECT_ROOT contains spaces (e.g., /home/user/my projects/...), the -v argument will be incorrectly split into multiple tokens. Use a Bash array to avoid this:

🔧 Proposed fix (array-based approach)

Replace the string-based DOCKER_CMD with an array throughout the script:

-DOCKER_CMD="docker run -itd ..."
+DOCKER_CMD=("docker" "run" "-itd" "--gpus" "all" "--network" "host" "--ipc=host" "-e" "HF_TOKEN" "--rm" "--name" "${CONTAINER}_dev")
 
 if [ "$MOUNT_DIR" = true ]; then
     PROJECT_ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/../.." && pwd)"
-    DOCKER_CMD="$DOCKER_CMD -v ${PROJECT_ROOT}:/workspace/bionemo-framework"
+    DOCKER_CMD+=("-v" "${PROJECT_ROOT}:/workspace/bionemo-framework")
 fi
 
-DOCKER_CMD="$DOCKER_CMD $CONTAINER /bin/bash"
+DOCKER_CMD+=("$CONTAINER" "/bin/bash")
 
-exec $DOCKER_CMD
+exec "${DOCKER_CMD[@]}"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/vllm/launch.sh` at line 50, The exec $DOCKER_CMD call uses an
unquoted string which allows word-splitting and breaks mount paths with spaces
(e.g., PROJECT_ROOT used in -v). Convert the string DOCKER_CMD into a Bash array
(e.g., DOCKER_CMD=(docker run ...)) and update all places that build/append to
DOCKER_CMD so they push elements into the array, then replace exec $DOCKER_CMD
with exec "${DOCKER_CMD[@]}" so each argument (including the -v PROJECT_ROOT
value) is preserved; update any helper code that concatenates DOCKER_CMD to use
array operations instead.
bionemo-recipes/vllm/Dockerfile (2)

2-3: Base image is hosted on an internal NVIDIA GitLab registry — not pullable outside NVIDIA.

gitlab-master.nvidia.com:5005/dl/dgx/vllm:main-py3.43005406-devel requires internal network/credentials access. The commented-out nvcr.io/nvidia/vllm:26.01-py3 alternative on line 1 is the publicly accessible equivalent. Once an NGC release with vLLM ≥ 0.14 becomes available, switching to the public image will make this recipe usable by external contributors without additional setup.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/vllm/Dockerfile` around lines 2 - 3, The Dockerfile's FROM
line currently uses an internal image tag
"gitlab-master.nvidia.com:5005/dl/dgx/vllm:main-py3.43005406-devel" which is not
pullable externally; replace that base image with the public equivalent
"nvcr.io/nvidia/vllm:26.01-py3" (or parametrize the base via a build ARG) in the
FROM instruction so external contributors can build without internal
credentials, and retain a short comment noting the vLLM>=0.14 requirement and to
switch back when an official public image with the needed version is available.

30-30: Pin the transformer_engine version for reproducible builds.

pip install --no-build-isolation transformer_engine[pytorch] with no version specifier will install whichever version is latest at build time. TE releases frequently and has had breaking API changes between major versions (e.g., 1.x → 2.x). A silent version bump can break the integration without any change to this file.

🔧 Proposed fix
-RUN pip install --no-build-isolation transformer_engine[pytorch]
+RUN pip install --no-build-isolation "transformer_engine[pytorch]==<verified_version>"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/vllm/Dockerfile` at line 30, The Dockerfile currently
installs transformer_engine without a version pin (the RUN pip install
--no-build-isolation transformer_engine[pytorch] line); change that to install a
specific, tested TE release by updating that RUN to include an exact version
specifier (for example: RUN pip install --no-build-isolation
transformer_engine==<MAJOR.MINOR.PATCH>[pytorch]) so builds are reproducible and
won’t break on upstream major/minor bumps; optionally add a short comment noting
the chosen compatible version.
bionemo-recipes/vllm/test_esm2_golden_values.py (1)

46-63: sys.path.insert + os.chdir is fragile for test infrastructure.

sys.path.insert(0, ...) at module level (line 46) and os.chdir inside fresh_export make this script sensitive to working directory and import order. This is acceptable for a standalone validation script run manually inside a container, but consider adding a note that this is not designed to run as part of a pytest suite.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/vllm/test_esm2_golden_values.py` around lines 46 - 63, The
test module mutates import paths and working directory (sys.path.insert(0, ...),
os.chdir(...) inside fresh_export) which is fragile for pytest; update the file
to document this by adding a clear module-level comment or docstring near
sys.path.insert and a brief note on fresh_export explaining it intentionally
changes cwd for export_hf_checkpoint and that the script is not intended to be
run under pytest/parallel test runners (referencing sys.path.insert,
ESM2_MODEL_DIR, and fresh_export by name), or alternatively guard execution with
a main-check so pytest won't import/run it implicitly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@bionemo-recipes/vllm/Dockerfile`:
- Around line 1-36: Add a non-root user and switch to it in the Dockerfile so
the container does not run as root; create a user/group (e.g., bionemo), chown
the application directory (/workspace/bionemo) and any cache/venv paths to that
user, and add a final USER bionemo line. Locate the Dockerfile sections around
WORKDIR /workspace/bionemo and COPY . . and insert user creation
(useradd/addgroup or groupadd) and chown before switching context, then add USER
bionemo at the end so subsequent runtime processes run unprivileged.

In `@bionemo-recipes/vllm/launch.sh`:
- Around line 36-40: The script hardcodes "--name vllm_dev" in DOCKER_CMD which
prevents multiple instances and misrepresents the positional $CONTAINER
argument; update the usage comment to clarify that the positional argument is
the image name (or image_name) and change the DOCKER_CMD assignments in the
HEADLESS branch to use a container name derived from $CONTAINER (for example
--name "$CONTAINER" or --name "${CONTAINER}_dev") instead of "vllm_dev" so the
image argument is also used as the container name and avoids name collisions.

In `@bionemo-recipes/vllm/README.md`:
- Around line 21-23: Change the fenced code block that currently uses the wrong
language tag; locate the block containing the shell command "python
test_esm2_golden_values.py" in the README and replace the opening fence language
identifier from ```python to ```bash so the command is treated as a shell
snippet and highlighted correctly.
- Line 17: The README's "or use `launch.sh`" is misleading because `launch.sh
--mount_dir` mounts the repo to `/workspace/bionemo-framework` while the manual
`docker run` mounts to `/workspace/bionemo`, causing `python
test_esm2_golden_values.py` to fail when run from WORKDIR; fix by either (A)
update `launch.sh` to mount the project root into `/workspace/bionemo` instead
of `/workspace/bionemo-framework` (adjust mount target and any downstream path
references in launch.sh), or (B) update README.md to explicitly document that
`launch.sh --mount_dir` mounts at `/workspace/bionemo-framework` and instruct
users to cd into the correct subdirectory (where `test_esm2_golden_values.py`
lives) before running the test; mention the affected files `launch.sh`,
`README.md`, and `test_esm2_golden_values.py` so reviewers can locate the
changes.

In `@bionemo-recipes/vllm/test_esm2_golden_values.py`:
- Around line 147-210: The script currently only prints comparisons and never
fails; add assertions that enforce the golden-value tolerances using RTOL and
ATOL: for each pair in pairs (refer to the pairs list and variables a, b),
assert np.allclose(a, b, rtol=RTOL, atol=ATOL) (or equivalently assert
(np.abs(a-b) <= ATOL + RTOL * np.abs(b)).all()) and fail the test if not, and
also assert cosine_sim(a, b) exceeds a sensible threshold or that exact is True
when ATOL/RTOL are zero; add per-sequence assertions inside the per-sequence
loop to ensure each sequence max-diff <= ATOL + RTOL * max(|b_i|) so the test
fails on unacceptable drift.

---

Duplicate comments:
In `@bionemo-recipes/recipes/esm2_accelerate_te/example_8m_checkpoint/esm_nv.py`:
- Around line 405-407: _tied_weights_keys is annotated as a mutable dict
ClassVar which repeats the same typing concern as in modeling_esm_te.py; change
the annotation to an immutable mapping type (e.g., ClassVar[Mapping[str, str]]
from typing) and, to avoid accidental mutation, assign a read-only view (e.g.,
types.MappingProxyType({"lm_head.decoder.weight":
"model.embeddings.word_embeddings.weight"})); update the import list to include
typing.Mapping and types if not present and mirror the same pattern used/fixed
in modeling_esm_te.py.

In `@bionemo-recipes/recipes/esm2_native_te/example_8m_checkpoint/esm_nv.py`:
- Around line 405-407: The _tied_weights_keys typed as ClassVar[dict[str, str]]
should be changed to ClassVar[list[str]] to match HF expectations (same fix as
in modeling_esm_te.py); replace the dict literal with a list of the relevant
parameter names (e.g. ["lm_head.decoder.weight",
"model.embeddings.word_embeddings.weight"]) and update any usages that assume
dict semantics to use the list order or explicit pairing where needed.

In `@bionemo-recipes/recipes/esm2_peft_te/example_8m_checkpoint/esm_nv.py`:
- Around line 405-407: The class variable _tied_weights_keys uses the modern
built-in generic dict[...] typing which may be incompatible with the rest of the
codebase; change its annotation to use typing.Dict[str, str] (and ensure Dict is
imported) or use typing.Mapping if immutability is desired, mirroring the fix
applied in modeling_esm_te.py so the declaration becomes ClassVar[Dict[str,
str]] with the same key/value entries retained.

---

Nitpick comments:
In `@bionemo-recipes/vllm/Dockerfile`:
- Around line 2-3: The Dockerfile's FROM line currently uses an internal image
tag "gitlab-master.nvidia.com:5005/dl/dgx/vllm:main-py3.43005406-devel" which is
not pullable externally; replace that base image with the public equivalent
"nvcr.io/nvidia/vllm:26.01-py3" (or parametrize the base via a build ARG) in the
FROM instruction so external contributors can build without internal
credentials, and retain a short comment noting the vLLM>=0.14 requirement and to
switch back when an official public image with the needed version is available.
- Line 30: The Dockerfile currently installs transformer_engine without a
version pin (the RUN pip install --no-build-isolation
transformer_engine[pytorch] line); change that to install a specific, tested TE
release by updating that RUN to include an exact version specifier (for example:
RUN pip install --no-build-isolation
transformer_engine==<MAJOR.MINOR.PATCH>[pytorch]) so builds are reproducible and
won’t break on upstream major/minor bumps; optionally add a short comment noting
the chosen compatible version.

In `@bionemo-recipes/vllm/launch.sh`:
- Line 50: The exec $DOCKER_CMD call uses an unquoted string which allows
word-splitting and breaks mount paths with spaces (e.g., PROJECT_ROOT used in
-v). Convert the string DOCKER_CMD into a Bash array (e.g., DOCKER_CMD=(docker
run ...)) and update all places that build/append to DOCKER_CMD so they push
elements into the array, then replace exec $DOCKER_CMD with exec
"${DOCKER_CMD[@]}" so each argument (including the -v PROJECT_ROOT value) is
preserved; update any helper code that concatenates DOCKER_CMD to use array
operations instead.

In `@bionemo-recipes/vllm/test_esm2_golden_values.py`:
- Around line 46-63: The test module mutates import paths and working directory
(sys.path.insert(0, ...), os.chdir(...) inside fresh_export) which is fragile
for pytest; update the file to document this by adding a clear module-level
comment or docstring near sys.path.insert and a brief note on fresh_export
explaining it intentionally changes cwd for export_hf_checkpoint and that the
script is not intended to be run under pytest/parallel test runners (referencing
sys.path.insert, ESM2_MODEL_DIR, and fresh_export by name), or alternatively
guard execution with a main-check so pytest won't import/run it implicitly.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 73cd13d and c34c09b.

📒 Files selected for processing (20)
  • bionemo-recipes/models/esm2/convert.py
  • bionemo-recipes/models/esm2/export.py
  • bionemo-recipes/models/esm2/modeling_esm_te.py
  • bionemo-recipes/models/esm2/tests/test_cp_bshd.py
  • bionemo-recipes/models/esm2/tests/test_cp_thd.py
  • bionemo-recipes/models/esm2/tests/test_distributed_fp8.py
  • bionemo-recipes/models/esm2/tests/test_distributed_strategies.py
  • bionemo-recipes/models/esm2/tests/test_modeling_esm_te.py
  • bionemo-recipes/recipes/esm2_accelerate_te/example_8m_checkpoint/esm_nv.py
  • bionemo-recipes/recipes/esm2_native_te/example_8m_checkpoint/esm_nv.py
  • bionemo-recipes/recipes/esm2_native_te/tests/test_stop_and_go.py
  • bionemo-recipes/recipes/esm2_native_te/train_ddp.py
  • bionemo-recipes/recipes/esm2_native_te/train_ddp_cp.py
  • bionemo-recipes/recipes/esm2_native_te/train_fsdp2.py
  • bionemo-recipes/recipes/esm2_native_te/train_fsdp2_cp.py
  • bionemo-recipes/recipes/esm2_peft_te/example_8m_checkpoint/esm_nv.py
  • bionemo-recipes/vllm/Dockerfile
  • bionemo-recipes/vllm/README.md
  • bionemo-recipes/vllm/launch.sh
  • bionemo-recipes/vllm/test_esm2_golden_values.py

Comment on lines 1 to 36
# FROM nvcr.io/nvidia/vllm:26.01-py3
FROM gitlab-master.nvidia.com:5005/dl/dgx/vllm:main-py3.43005406-devel
# using this because we need vllm >= 0.14 to work with Transformers v5. no released nvidia version with this yet.

# The vLLM image has CUDA 13.1 runtime and nvcc, but missing dev headers (cusparse.h, nvtx, etc.)
# Install cuda-keyring to add NVIDIA's apt repo, then install the dev headers for transformer_engine
RUN apt-get update && apt-get install -y --no-install-recommends wget && \
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-keyring_1.1-1_all.deb && \
dpkg -i cuda-keyring_1.1-1_all.deb && \
rm cuda-keyring_1.1-1_all.deb && \
apt-get update && apt-get install -y --no-install-recommends \
cuda-nvtx-13-1 \
cuda-cupti-dev-13-1 \
cuda-nvml-dev-13-1 \
libcusparse-dev-13-1 \
libcusolver-dev-13-1 \
libcufft-dev-13-1 \
libnvjitlink-dev-13-1 \
libnvjpeg-dev-13-1 \
libcublasmp0-dev-cuda-13 \
libcudnn9-cuda-13 \
&& rm -rf /var/lib/apt/lists/*

# Install remaining dependencies
RUN --mount=type=cache,target=/root/.cache/pip \
--mount=type=bind,source=requirements.txt,target=/requirements.txt \
pip install -r /requirements.txt

# Install transformer_engine from source (force build for CUDA 13.1, not pre-built cu12 wheel)
RUN pip install --no-build-isolation transformer_engine[pytorch]

RUN pip install transformers[torch]==5.0.0


WORKDIR /workspace/bionemo
COPY . .
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Container runs as root — add a USER directive.

No USER command is present, so all processes run as root inside the container. This is flagged by Trivy (DS-0002) and violates the principle of least privilege.

🔒 Proposed fix
 WORKDIR /workspace/bionemo
 COPY . .
+
+RUN useradd -m -u 1000 appuser && chown -R appuser /workspace/bionemo
+USER appuser
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# FROM nvcr.io/nvidia/vllm:26.01-py3
FROM gitlab-master.nvidia.com:5005/dl/dgx/vllm:main-py3.43005406-devel
# using this because we need vllm >= 0.14 to work with Transformers v5. no released nvidia version with this yet.
# The vLLM image has CUDA 13.1 runtime and nvcc, but missing dev headers (cusparse.h, nvtx, etc.)
# Install cuda-keyring to add NVIDIA's apt repo, then install the dev headers for transformer_engine
RUN apt-get update && apt-get install -y --no-install-recommends wget && \
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-keyring_1.1-1_all.deb && \
dpkg -i cuda-keyring_1.1-1_all.deb && \
rm cuda-keyring_1.1-1_all.deb && \
apt-get update && apt-get install -y --no-install-recommends \
cuda-nvtx-13-1 \
cuda-cupti-dev-13-1 \
cuda-nvml-dev-13-1 \
libcusparse-dev-13-1 \
libcusolver-dev-13-1 \
libcufft-dev-13-1 \
libnvjitlink-dev-13-1 \
libnvjpeg-dev-13-1 \
libcublasmp0-dev-cuda-13 \
libcudnn9-cuda-13 \
&& rm -rf /var/lib/apt/lists/*
# Install remaining dependencies
RUN --mount=type=cache,target=/root/.cache/pip \
--mount=type=bind,source=requirements.txt,target=/requirements.txt \
pip install -r /requirements.txt
# Install transformer_engine from source (force build for CUDA 13.1, not pre-built cu12 wheel)
RUN pip install --no-build-isolation transformer_engine[pytorch]
RUN pip install transformers[torch]==5.0.0
WORKDIR /workspace/bionemo
COPY . .
# FROM nvcr.io/nvidia/vllm:26.01-py3
FROM gitlab-master.nvidia.com:5005/dl/dgx/vllm:main-py3.43005406-devel
# using this because we need vllm >= 0.14 to work with Transformers v5. no released nvidia version with this yet.
# The vLLM image has CUDA 13.1 runtime and nvcc, but missing dev headers (cusparse.h, nvtx, etc.)
# Install cuda-keyring to add NVIDIA's apt repo, then install the dev headers for transformer_engine
RUN apt-get update && apt-get install -y --no-install-recommends wget && \
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-keyring_1.1-1_all.deb && \
dpkg -i cuda-keyring_1.1-1_all.deb && \
rm cuda-keyring_1.1-1_all.deb && \
apt-get update && apt-get install -y --no-install-recommends \
cuda-nvtx-13-1 \
cuda-cupti-dev-13-1 \
cuda-nvml-dev-13-1 \
libcusparse-dev-13-1 \
libcusolver-dev-13-1 \
libcufft-dev-13-1 \
libnvjitlink-dev-13-1 \
libnvjpeg-dev-13-1 \
libcublasmp0-dev-cuda-13 \
libcudnn9-cuda-13 \
&& rm -rf /var/lib/apt/lists/*
# Install remaining dependencies
RUN --mount=type=cache,target=/root/.cache/pip \
--mount=type=bind,source=requirements.txt,target=/requirements.txt \
pip install -r /requirements.txt
# Install transformer_engine from source (force build for CUDA 13.1, not pre-built cu12 wheel)
RUN pip install --no-build-isolation transformer_engine[pytorch]
RUN pip install transformers[torch]==5.0.0
WORKDIR /workspace/bionemo
COPY . .
RUN useradd -m -u 1000 appuser && chown -R appuser /workspace/bionemo
USER appuser
🧰 Tools
🪛 Trivy (0.69.1)

[error] 1-1: Image user should not be 'root'

Specify at least 1 USER command in Dockerfile with non-root user as argument

Rule: DS-0002

Learn more

(IaC/Dockerfile)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/vllm/Dockerfile` around lines 1 - 36, Add a non-root user and
switch to it in the Dockerfile so the container does not run as root; create a
user/group (e.g., bionemo), chown the application directory (/workspace/bionemo)
and any cache/venv paths to that user, and add a final USER bionemo line. Locate
the Dockerfile sections around WORKDIR /workspace/bionemo and COPY . . and
insert user creation (useradd/addgroup or groupadd) and chown before switching
context, then add USER bionemo at the end so subsequent runtime processes run
unprivileged.

Comment on lines 36 to 40
if [ "$HEADLESS" = true ]; then
DOCKER_CMD="docker run -itd --gpus all --network host --ipc=host -e HF_TOKEN --rm --name vllm_dev"
else
DOCKER_CMD="docker run -it --gpus all --network host --ipc=host -e HF_TOKEN --rm --name vllm_dev"
fi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Hardcoded --name vllm_dev prevents running multiple containers and shadows the $CONTAINER argument's meaning.

The script's positional argument ($CONTAINER) is the Docker image name, but the running container name is always vllm_dev. This has two problems:

  1. Starting a second instance (e.g., a second headless container) will fail with docker: Error response from daemon: Conflict. The container name "/vllm_dev" is already in use.
  2. The usage comment says <container_name> but it's really <image_name>, which is confusing.

Consider using $CONTAINER as both the image name and the container name (or derive the container name from it):

🔧 Proposed fix
-DOCKER_CMD="docker run -itd --gpus all --network host --ipc=host -e HF_TOKEN --rm --name vllm_dev"
+DOCKER_CMD="docker run -itd --gpus all --network host --ipc=host -e HF_TOKEN --rm --name ${CONTAINER}_dev"
-DOCKER_CMD="docker run -it --gpus all --network host --ipc=host -e HF_TOKEN --rm --name vllm_dev"
+DOCKER_CMD="docker run -it --gpus all --network host --ipc=host -e HF_TOKEN --rm --name ${CONTAINER}_dev"

Also update the usage comment on line 4:

-# Usage: ./launch.sh <container_name> [--mount_dir] [--headless]
+# Usage: ./launch.sh <image_name> [--mount_dir] [--headless]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if [ "$HEADLESS" = true ]; then
DOCKER_CMD="docker run -itd --gpus all --network host --ipc=host -e HF_TOKEN --rm --name vllm_dev"
else
DOCKER_CMD="docker run -it --gpus all --network host --ipc=host -e HF_TOKEN --rm --name vllm_dev"
fi
if [ "$HEADLESS" = true ]; then
DOCKER_CMD="docker run -itd --gpus all --network host --ipc=host -e HF_TOKEN --rm --name ${CONTAINER}_dev"
else
DOCKER_CMD="docker run -it --gpus all --network host --ipc=host -e HF_TOKEN --rm --name ${CONTAINER}_dev"
fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/vllm/launch.sh` around lines 36 - 40, The script hardcodes
"--name vllm_dev" in DOCKER_CMD which prevents multiple instances and
misrepresents the positional $CONTAINER argument; update the usage comment to
clarify that the positional argument is the image name (or image_name) and
change the DOCKER_CMD assignments in the HEADLESS branch to use a container name
derived from $CONTAINER (for example --name "$CONTAINER" or --name
"${CONTAINER}_dev") instead of "vllm_dev" so the image argument is also used as
the container name and avoids name collisions.

docker run -it --gpus all --network host --ipc=host -e HF_TOKEN --rm -v ${PWD}:/workspace/bionemo vllm /bin/bash
```

or use `launch.sh`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

launch.sh mounts to a different container path than the manual docker run command — the README's test instruction will break after using launch.sh --mount_dir.

The manual command (line 14) mounts ${PWD}/workspace/bionemo, which is the container's WORKDIR, so python test_esm2_golden_values.py works directly. However, launch.sh --mount_dir mounts the project root (two levels up) to /workspace/bionemo-framework, leaving /workspace/bionemo populated only by the image's COPY . . layer. Running python test_esm2_golden_values.py from WORKDIR after using --mount_dir will fail unless the user navigates to the correct subdirectory. The "or use launch.sh" phrasing implies equivalence — either clarify the path difference or update launch.sh to mount to /workspace/bionemo instead of /workspace/bionemo-framework.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/vllm/README.md` at line 17, The README's "or use `launch.sh`"
is misleading because `launch.sh --mount_dir` mounts the repo to
`/workspace/bionemo-framework` while the manual `docker run` mounts to
`/workspace/bionemo`, causing `python test_esm2_golden_values.py` to fail when
run from WORKDIR; fix by either (A) update `launch.sh` to mount the project root
into `/workspace/bionemo` instead of `/workspace/bionemo-framework` (adjust
mount target and any downstream path references in launch.sh), or (B) update
README.md to explicitly document that `launch.sh --mount_dir` mounts at
`/workspace/bionemo-framework` and instruct users to cd into the correct
subdirectory (where `test_esm2_golden_values.py` lives) before running the test;
mention the affected files `launch.sh`, `README.md`, and
`test_esm2_golden_values.py` so reviewers can locate the changes.

Comment on lines 21 to 23
```python
python test_esm2_golden_values.py
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Wrong code fence language identifier — should be bash, not python.

The python test_esm2_golden_values.py command is a shell invocation, not Python source code. Using ```python causes syntax highlighters to misparse it.

📝 Proposed fix
-```python
+```bash
 python test_esm2_golden_values.py
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @bionemo-recipes/vllm/README.md around lines 21 - 23, Change the fenced code
block that currently uses the wrong language tag; locate the block containing
the shell command "python test_esm2_golden_values.py" in the README and replace
the opening fence language identifier from python to bash so the command
is treated as a shell snippet and highlighted correctly.


</details>

<!-- fingerprinting:phantom:poseidon:churro -->

<!-- This is an auto-generated comment by CodeRabbit -->

Comment on lines 147 to 210
if __name__ == "__main__":
print(f"GPUs: {torch.cuda.device_count()}")

# Step 0: fresh export (facebook HF -> our TE format)
print("\n[0/3] Exporting checkpoint ...")
MODEL_ID = fresh_export()

print(f"MODEL_ID: {MODEL_ID}")
print(f"REFERENCE_MODEL_ID: {REFERENCE_MODEL_ID}")
print(f"Sequences: {len(SEQUENCES)}")

# 1) vLLM on exported model
print("\n[1/3] vLLM inference (exported model) ...")
emb_vllm = vllm_embed(MODEL_ID, SEQUENCES)

# 2) HuggingFace on exported model
print("\n[2/3] HuggingFace inference (exported model) ...")
emb_hf_exported = hf_embed(MODEL_ID, SEQUENCES)

# 3) HuggingFace on reference Hub model
print("\n[3/3] HuggingFace inference (reference model) ...")
emb_hf_reference = hf_embed(REFERENCE_MODEL_ID, SEQUENCES)

# ---- Pairwise comparisons ----
pairs = [
("vLLM (exported)", "HF (exported)", emb_vllm, emb_hf_exported),
("vLLM (exported)", "HF (reference)", emb_vllm, emb_hf_reference),
("HF (exported)", "HF (reference)", emb_hf_exported, emb_hf_reference),
]

# ---- Summary table ----
header = f"{'Pair':<35} {'max |diff|':>14} {'mean |diff|':>14} {'cos sim':>12} {'exact':>7}"
sep = "-" * len(header)
print(f"\n{sep}")
print(header)
print(sep)

for name_a, name_b, a, b in pairs:
diffs = np.abs(a.astype(np.float64) - b.astype(np.float64))
label = f"{name_a} vs {name_b}"
exact = np.array_equal(a, b)
print(
f"{label:<35} {diffs.max():>14.8e} {diffs.mean():>14.8e} "
f"{cosine_sim(a, b):>12.10f} {'YES' if exact else 'NO':>7}"
)

print(sep)
print(f"Tolerance: rtol={RTOL}, atol={ATOL} (0 = exact match required)")

# Per-sequence breakdown
short = {"vLLM (exported)": "vllm", "HF (exported)": "hf_exp", "HF (reference)": "hf_ref"}
print("\nPer-sequence max |diff|:")
for i in range(len(SEQUENCES)):
row = f" seq {i}:"
for name_a, name_b, a, b in pairs:
d = float(np.abs(a[i].astype(np.float64) - b[i].astype(np.float64)).max())
row += f" {short[name_a]}_vs_{short[name_b]}={d:.8e}"
print(row)

print(sep)

# Cleanup
if torch.distributed.is_initialized():
torch.distributed.destroy_process_group()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Test script has no assertions — it will never fail.

This "test" prints a comparison table but never asserts that results are within tolerance. RTOL and ATOL (line 74) are defined but unused. A silent pass regardless of output drift defeats the purpose of a golden-value test.

Add assertions after the comparison loop, for example:

Suggested assertion block
+    all_passed = True
     for name_a, name_b, a, b in pairs:
         diffs = np.abs(a.astype(np.float64) - b.astype(np.float64))
         label = f"{name_a}  vs  {name_b}"
         exact = np.array_equal(a, b)
         print(
             f"{label:<35} {diffs.max():>14.8e} {diffs.mean():>14.8e} "
             f"{cosine_sim(a, b):>12.10f} {'YES' if exact else 'NO':>7}"
         )
+        if not np.allclose(a, b, rtol=RTOL, atol=ATOL):
+            all_passed = False
 
     print(sep)
-    print(f"Tolerance: rtol={RTOL}, atol={ATOL} (0 = exact match required)")
+    print(f"Tolerance: rtol={RTOL}, atol={ATOL}")
+
+    assert all_passed, "Golden value comparison failed — see table above for details."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/vllm/test_esm2_golden_values.py` around lines 147 - 210, The
script currently only prints comparisons and never fails; add assertions that
enforce the golden-value tolerances using RTOL and ATOL: for each pair in pairs
(refer to the pairs list and variables a, b), assert np.allclose(a, b,
rtol=RTOL, atol=ATOL) (or equivalently assert (np.abs(a-b) <= ATOL + RTOL *
np.abs(b)).all()) and fail the test if not, and also assert cosine_sim(a, b)
exceeds a sensible threshold or that exact is True when ATOL/RTOL are zero; add
per-sequence assertions inside the per-sequence loop to ensure each sequence
max-diff <= ATOL + RTOL * max(|b_i|) so the test fails on unacceptable drift.

Comment on lines 77 to 78
# only needed at runtime for FP8 training efficiency; users who train with FP8 pass
# padded_vocab_size explicitly. Keeping vocab_size-sized weights in the checkpoint
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

users who train with FP8 pass padded_vocab_size explicitly

i'm not sure that's a great assumption 😆. CC @jomitchellnv

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea that's not true, users who train with FP8 don't pass this

Comment on lines 104 to 106
add_pooling_layer: Whether the base model should include a pooling layer. Set to
``False`` for exported checkpoints that are saved from ``NVEsmForMaskedLM``
(which does not use a pooler). This avoids missing-weight errors in vLLM.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don't we just make the default false and omit the config edit in export.py?

"""NVEsmForMaskedLM is a TransformerEngine-optimized ESM model for masked language modeling."""

_tied_weights_keys: ClassVar[dict[str, str]] = {"lm_head.decoder.weight": "esm.embeddings.word_embeddings.weight"}
_do_not_quantize = ("lm_head.dense", "lm_head.decoder") # Flag for testing that these layers are not quantized.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're deleting _do_not_quantize? we need that

Comment on lines 514 to 525
with transformer_engine.pytorch.autocast(enabled=False):
with transformer_engine.pytorch.fp8_autocast(enabled=False):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait, why -- fp8_autocast is deprecated

Comment on lines 486 to 505
with transformer_engine.pytorch.quantized_model_init(enabled=False):
self.dense = transformer_engine.pytorch.Linear(
config.hidden_size,
config.hidden_size,
params_dtype=config.dtype,
device="meta" if torch.get_default_device() == torch.device("meta") else "cuda",
init_method=lambda x: torch.nn.init.normal_(x, mean=0.0, std=config.initializer_range),
)
self.dense = transformer_engine.pytorch.Linear(
config.hidden_size,
config.hidden_size,
params_dtype=config.dtype,
device="meta" if torch.get_default_device() == torch.device("meta") else "cuda",
init_method=lambda x: torch.nn.init.normal_(x, mean=0.0, std=config.initializer_range),
)

with transformer_engine.pytorch.fp8_model_init(enabled=False):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why? this also reverts back to a deprecated context manager, and i think we want that dense layer not to be quantized IIUC

sample_layers = [
model.esm.encoder.layers[0].self_attention.core_attention,
model.esm.encoder.layers[0].self_attention.layernorm_qkv,
model.model.encoder.layers[0].self_attention.core_attention,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the rename here from esm -> models?

Comment on lines 454 to 466
with transformer_engine.pytorch.autocast(enabled=False):
prediction_scores = self.lm_head(sequence_output)
prediction_scores = self.lm_head(sequence_output)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can't just remove this, this is important

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think we want this folder at all. At the very least it wont work with our current CI

mask_ratio_observed = n_masked_per_seq.float() / src_lengths
scale_factor = (1 - mask_ratio_train) / (1 - mask_ratio_observed)
reshaped_scale_factor = torch.repeat_interleave(scale_factor, src_lengths_padded, dim=0)
reshaped_scale_factor = torch.repeat_interleave(scale_factor, src_lengths, dim=0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the change here?

# Keep the last layers of the network in higher precision to avoid numerical instability.
# Please see recipes/fp8_analysis/README.md for more details.
with transformer_engine.pytorch.autocast(enabled=False):
with transformer_engine.pytorch.fp8_autocast(enabled=False):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is incorrect. autocast is the API now. It covers FP8/ FP4 autocast.

self.micro_batch_size = micro_batch_size
self.max_seq_length = max_seq_length
self.attn_mask_type = attn_mask_type
self.add_pooling_layer = add_pooling_layer
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does adding an inference script require us to add a pooling layer?

"""NVEsmForMaskedLM is a TransformerEngine-optimized ESM model for masked language modeling."""

_tied_weights_keys: ClassVar[dict[str, str]] = {"lm_head.decoder.weight": "esm.embeddings.word_embeddings.weight"}
_do_not_quantize = ("lm_head.dense", "lm_head.decoder") # Flag for testing that these layers are not quantized.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need this

sequence_output = outputs[0]
with transformer_engine.pytorch.autocast(enabled=False):
prediction_scores = self.lm_head(sequence_output)
prediction_scores = self.lm_head(sequence_output)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need this still with transformer_engine.pytorch.autocast(enabled=False):

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise, when we're doing FP8, its going to look up a context block and see that we're in FP8 and its going to cast these projections to FP8

device="meta" if torch.get_default_device() == torch.device("meta") else "cuda",
init_method=lambda x: torch.nn.init.normal_(x, mean=0.0, std=config.initializer_range),
)
self.dense = transformer_engine.pytorch.Linear(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still need with transformer_engine.pytorch.quantized_model_init(enabled=False)

# Keep the last layers of the network in higher precision to avoid numerical instability.
# Please see recipes/fp8_analysis/README.md for more details.
with transformer_engine.pytorch.autocast(enabled=False):
with transformer_engine.pytorch.fp8_autocast(enabled=False):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not correct. need to keep the older version of this autocast not fp8_autocast

Copy link
Collaborator

@jomitchellnv jomitchellnv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if we can not modify existing files inside bionemo-recipes/models/esm2 and instead just make a new recipe for VLLM stuff and place it in there. Is there some requirement from VLLM that it requires us to rename ModuleLists from ESM -> Model?

Signed-off-by: Gagan Kaushik <gkaushik@nvidia.com>
@gagank1
Copy link
Collaborator Author

gagank1 commented Feb 25, 2026

/ok to test a67df14

Signed-off-by: Gagan Kaushik <gkaushik@nvidia.com>
Signed-off-by: Gagan Kaushik <gkaushik@nvidia.com>
@gagank1
Copy link
Collaborator Author

gagank1 commented Feb 25, 2026

/ok to test 36cdbb2

@gagank1 gagank1 added the ciflow:all-recipes Run tests for all recipes (under bionemo-recipes) or enforce running all tests for recipes. label Feb 25, 2026
@gagank1
Copy link
Collaborator Author

gagank1 commented Feb 25, 2026

/ok to test 36cdbb2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow:all-recipes Run tests for all recipes (under bionemo-recipes) or enforce running all tests for recipes.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants