feat: Install Nvidia DOCA on the servers post provisioning#2219
feat: Install Nvidia DOCA on the servers post provisioning#2219glimchb wants to merge 1 commit intodell:develfrom
Conversation
d4ab28d to
508e321
Compare
| # Absolute path to local copy of .tgz file containing DOCA package. | ||
| # The package can be downloaded from https://developer.nvidia.com/networking/doca/ | ||
| # Optional variable. | ||
| nvidia_doca_offline_path: "" |
There was a problem hiding this comment.
Question: during testing I see mix between nvidia_doca_path and nvidia_doca_offline_path need to review this is more details
| # Usage: configure_doca.yml | ||
| doca_tmp_path: /tmp/doca | ||
| doca_core_path: /install/doca/x86_64/doca-core | ||
| doca_deps_path: /install/doca/x86_64/doca-deps |
There was a problem hiding this comment.
need to review this section, too many parameters...
| # limitations under the License. | ||
| --- | ||
|
|
||
| - name: Delete doca repo folders |
There was a problem hiding this comment.
need to review this entire file, looks like copy paste from cuda, need more attention here...
|
|
||
| - name: Check nodes having Infiniband Support | ||
| hosts: all | ||
| tasks: |
There was a problem hiding this comment.
Q: missing code to actually start DOCA installation in this file from roles nvidia_doca
| block: | ||
| - name: Install packages from doca rpm file | ||
| ansible.builtin.yum: | ||
| name: "{{ doca_filepath }}" |
There was a problem hiding this comment.
Q: need to understand NFS and nvidia_doca_path vs doca_filepath
Signed-off-by: Boris Glimcher <Boris.Glimcher@emc.com>
| - name: Include vars file of inventory role | ||
| ansible.builtin.include_vars: "{{ role_path }}/../../../input/network_config.yml" | ||
|
|
||
| # - name: Check status of doca installation |
There was a problem hiding this comment.
do we need this or can remove it ?
| os_supported_rocky: "rocky" | ||
| os_supported_rhel: "redhat" | ||
|
|
||
| doca_repo_url: "https://linux.mellanox.com/public/repo/doca/{{ nvidia_doca_version }}/rhel/{{ compute_os_version }}/x86_64" |
There was a problem hiding this comment.
correct URL example is https://linux.mellanox.com/public/repo/doca/2.5.0/rhel8.0/x86_64/
please replace rhel with variable so can be used with other distros...
| when: nvidia_doca_path | default("", true) | length > 0 | ||
|
|
||
| # - name: Validate nvidia_doca_version | ||
| # ansible.builtin.assert: |
There was a problem hiding this comment.
do we need this code or it can be removed ?
|
Can one of the admins verify this patch? |
Issues Resolved by this Pull Request
Fixes #
Description of the Solution
nvidia_doca_pathis provided ininput/provision_config.ymland Nvidia DPUs are available on the target nodes, DOCA packages will be deployed post provisioning without user intervention.network.ymlafter provisioning the servers (Assuming the provision tool did not install DOCA packages).From Nvidia documentation:
Suggested Reviewers
@sujit-jadhav