Skip to content

[Question]: Make Sharing.MPS.FailRequestsGreaterThanOne authoritative #1695

@radu-malliu

Description

@radu-malliu

Currently, Sharing.MPS.FailRequestsGreaterThanOne is a valid config item. However, its value is ignored.

We have a use case whereby we need to be able to request all GPUs available on a node. Device plugin promises around what that request means in terms of time-on-device have no bearing, we know that this workload will have exclusivity when running. So, the only requirement is to be able to request multiple (all) replicas, in order to have access to all GPUs.

Based on this, is there interest in making the value of Sharing.MPS.FailRequestsGreaterThanOne authoritative when MPS is enabled, instead of it being ignored? Happy to PR the change if yes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionCategorizes issue or PR as a support question.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions