Skip to content

add additional health checks#1623

Open
aburan28 wants to merge 1 commit into
NVIDIA:mainfrom
aburan28:adam--add-other-healthchecks
Open

add additional health checks#1623
aburan28 wants to merge 1 commit into
NVIDIA:mainfrom
aburan28:adam--add-other-healthchecks

Conversation

@aburan28
Copy link
Copy Markdown

@aburan28 aburan28 commented Feb 16, 2026

check for retired pages and temperature

Signed-off-by: Adam Buran <aburan28@gmail.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Feb 16, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@aburan28 aburan28 marked this pull request as ready for review February 16, 2026 16:59
@rajatchopra
Copy link
Copy Markdown
Contributor

This PR adds very useful set of health checks which we should definitely include in the device plugin.

However there is an effort going on to obtain the GPU health from a comprehensive tool (like NvSentinel) through a common API.
cc @ArangoGutierrez @dims @lalitadithya

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants