[WIP]OSDOCS-20033: Kueue 1.4 and DRA#113996
Conversation
|
@StephenJamesSmith: This pull request references OSDOCS-20033 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target only the "5.0.0" version, but multiple target versions were set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
🤖 Tue Jun 30 19:46:17 - Prow CI generated the docs preview: https://113996--ocpdocs-pr.netlify.app/openshift-enterprise/latest/ai_workloads/kueue/kueue-dra-integrating-dynamic-resource-allocation.html |
|
|
||
| * Validation using dra-example-driver and nvidia-dra-driver. | ||
|
|
||
| .Prerequisites |
There was a problem hiding this comment.
🤖 [error] AsciiDocDITA.BlockTitle: Block titles can only be assigned to examples, figures, and tables in DITA.
10075f1 to
04ba526
Compare
There was a problem hiding this comment.
Did a first pass —good start on the structure. A few things need to be adapted for OCP though; I left inline comments on each. Also, on OCP, the DRA config (feature gates, deviceClassMappings) goes through the Kueue CR rather than raw Configuration YAML. @PannagaRao, could you share the HackMD docs we created for Alice on how that works so we can update the examples to reflect what users actually do?
| // * ai_workloads/kueue/kueue-dra-integrating-dynamic-resource-allocation.adoc | ||
|
|
||
| :_mod-docs-content-type: PROCEDURE | ||
| [id="kueue-dra-partionable-devices_{context}"] |
There was a problem hiding this comment.
| [id="kueue-dra-partionable-devices_{context}"] | |
| [id="kueue-dra-partitionable-devices_{context}"] |
There was a problem hiding this comment.
ugh. sorry. good catch. fixed.
|
|
||
| * Verification of partition capacity reclaim after workload completion. | ||
|
|
||
| * Validation using dra-example-driver and nvidia-dra-driver. |
There was a problem hiding this comment.
dra-example-driver is an upstream test fixture, not something OCP users would install. OpenShift docs should reference the supported. NVIDIA DRA driver for OCP, not upstream test tools. Drop dra-example-driver entirely.
| * Validation using dra-example-driver and nvidia-dra-driver. | ||
|
|
||
| .Prerequisites | ||
| * Kueue is installed. |
There was a problem hiding this comment.
| * Kueue is installed. | |
| * {kueue-name} is installed. |
|
|
||
| .Prerequisites | ||
| * Kueue is installed. | ||
| * A Kubernetes cluster running version 1.34 or later. |
There was a problem hiding this comment.
It should be {product-title} 4.21 or later.
There was a problem hiding this comment.
s/A Kubernetes cluster running version 1.34 or later. / {product-title} running version 4.21 or later.
| .Prerequisites | ||
| * Kueue is installed. | ||
| * A Kubernetes cluster running version 1.34 or later. | ||
| * A DRA driver installed in the cluster, for example, `dra-example-driver`` for testing, or a vendor driver such as NVIDIA `k8s-dra-driver-gpu` for production. |
There was a problem hiding this comment.
Drop dra-example-driver completely.
There was a problem hiding this comment.
Replaced with "A DRA driver installed in the cluster, for example, nvidia-dra-driver or k8s-dra-driver-gpu."
| metadata: | ||
| name: gpu.example.com | ||
| spec: | ||
| extendedResourceName: example.com/gpu |
There was a problem hiding this comment.
| extendedResourceName: example.com/gpu | |
| extendedResourceName: nvidia.com/gpu |
| + | ||
| [source,terminal] | ||
| ---- | ||
| $ oc apply -f https://kueue.sigs.k8s.io/examples/dra/sample-dra-queues.yaml |
There was a problem hiding this comment.
I think you might need to change this as per OCP docs. We can't use the upstream example as-is.
There was a problem hiding this comment.
I think the oc apply command itself is fine. it's the upstream manifest URL that needs to change. Instead of pointing to kueue.sigs.k8s.io, inline the YAML directly in the doc and have the user create a local file, something like:
Create a file called `cluster-queue.yaml` with the following content:
+
[source,yaml]
----
<the YAML here>
----
. Run the following command to apply the configuration:
+
[source,terminal]
----
$ oc apply -f cluster-queue.yaml
----That way users don't need external network access and the YAML stays under our control if anything changes upstream.
There was a problem hiding this comment.
Made these changes.
| = Configuring the partionable devices | ||
|
|
||
| [role="_abstract"] | ||
| Use this procedure when your cluster has partitionable devices and you want quota to reflect actual device capacity rather than device count. This requires Kubernetes 1.35+ with the `DRAPartitionableDevices` feature gate enabled and a DRA driver that publishes `consumesCounters` in `ResourceSlice` objects. |
There was a problem hiding this comment.
{product-title} 4.22 or later. PD is beta in k8s 1.36 (OCP 4.22), not 1.35
There was a problem hiding this comment.
s / "This requires Kubernetes 1.35+ with the DRAPartitionableDevices feature gate enabled and a DRA driver that publishes consumesCounters in ResourceSlice objects." / "This requires {product-title} 4.22 or later."
We can rework this if we feel that mentioning "Kubernetes 1.36 with the DRAPartitionableDevices feature gate enabled and a DRA driver that publishes consumesCounters in ResourceSlice objects.` "is necessary.
| + | ||
| [source,terminal] | ||
| ---- | ||
| $ oc apply -f https://kueue.sigs.k8s.io/examples/dra/sample-dra-counter-queues.yaml |
There was a problem hiding this comment.
I would stick with inline YAML instead of upstream URL
There was a problem hiding this comment.
@sohankunkerkar Do you mean delete the oc command and use the following yaml?
|
|
||
| .Procedure | ||
|
|
||
| . Add a `deviceClassMappings`` entry to the {kueue-name} configuration that maps each `DeviceClass` to a logical resource name for quota, as shown in the following example: |
There was a problem hiding this comment.
| . Add a `deviceClassMappings`` entry to the {kueue-name} configuration that maps each `DeviceClass` to a logical resource name for quota, as shown in the following example: | |
| . Add a `deviceClassMappings` entry to the {kueue-name} configuration that maps each `DeviceClass` to a logical resource name for quota, as shown in the following example: |
04ba526 to
9e450d7
Compare
717dfe5 to
c266442
Compare
|
@sohankunkerkar @anahas-redhat Please review latest changes and |
anahas-redhat
left a comment
There was a problem hiding this comment.
@StephenJamesSmith below you can find my comments.
| @@ -0,0 +1,154 @@ | |||
| // Module included in the following assemblies: | |||
There was a problem hiding this comment.
I guess we have two documents with similar name:
- kueue-dra-partionable-devices.adoc
- kueue-dra-partitionable-devices.adoc
I'm assuming this one "kueue-dra-partionable-devices.adoc" will be the one to be excluded so, my comments will be in the second .adoc. Please, let me know otherwise.
There was a problem hiding this comment.
First document has been deleted and not included in the build.
|
|
||
| include::modules/kueue-dra-resourceclaimtemplates.adoc[leveloffset=+2] | ||
|
|
||
| // include::modules/kueue-dra-deviceclasses.adoc[leveloffset=+2] |
There was a problem hiding this comment.
Should this line be commented out?
There was a problem hiding this comment.
Yes. it's a file that wasn't needed. I'm removing it from the Assembly file.
|
|
||
| .Prerequisites | ||
| * {kueue-name} is installed. | ||
| * {product-title} running version 4.21 or later. |
There was a problem hiding this comment.
Partitionable Devices requires OCP 4.22+ (K8s 1.35 with DRAPartitionableDevices gate). The PD module itself says "4.22 or later" — contradicting this prerequisite. The prerequisite should either list both versions or note the PD exception.
There was a problem hiding this comment.
Added a Note: "To use partitionable devices, you need {product-title} 4.22 or later. "
|
|
||
| .Procedure | ||
|
|
||
| . Enable the feature gates by installing or reconfiguring {kueue-name} with both feature gates enabled, as shown in the following example: |
There was a problem hiding this comment.
I think the users won't need to enable any feature gate by themselves. Good to double-check that info with @sohankunkerkar
There was a problem hiding this comment.
That’s correct. We explicitly enable them in the operator ConfigMap, so no additional action is required from the user side.
There was a problem hiding this comment.
@anahas-redhat Should this step be removed?
There was a problem hiding this comment.
Removed the step.
|
|
||
| DRA is a Kubernetes framework that manages specialized hardware resources such as GPUs with fine-grained control. Unlike traditional resource requests, DRA allows dynamic prioritization—allocating GPUs to high-priority AI training workloads during business hours, then reallocating them to cost-optimized batch jobs overnight. | ||
|
|
||
| You can validate partitionable devices support in {kueue-name} Dynamic Resource Allocation (DRA) integration, covering partition-aware quota, admission, and scheduling. Partitionable devices, such as NVIDIA MIG, allow graphics processing units (GPUs) to be dynamically subdivided into smaller allocations. {kueue-name} must correctly handle quota accounting for these mutually exclusive partition configurations. |
There was a problem hiding this comment.
It's good that you've mentioned Partitionable Devices here. Can you also mention something about Structured Parameters and Extended Resources? They are also "features" or "resources" provided by DRA.
There was a problem hiding this comment.
I added some text for these.
| + | ||
| [source,yaml] | ||
| ---- | ||
| apiVersion: config.kueue.x-k8s.io/v1beta2 |
There was a problem hiding this comment.
On OCP with the Kueue Operator, the user cannot create or apply a Configuration object directly. The operator owns it — it generates the Configuration from the Kueue CR and writes it into a ConfigMap that the controller reads. If a user tries to oc apply that YAML, there's no CRD for config.kueue.x-k8s.io/v1beta2 Configuration — it's not a Kubernetes resource you create, it's an embedded config format.
A possible way to set deviceClassMappings on OCP is through the Kueue CR:
oc patch kueue cluster --type=merge -p '{
"spec": {
"config": {
"resources": {
"deviceClassMappings": [{
"name": "nvidia.com/gpu",
"deviceClassNames": ["gpu.nvidia.com"]
}]
}
}
}
}'
There was a problem hiding this comment.
There was a problem hiding this comment.
// Module included in the following assemblies:
//
// * ai_workloads/kueue/kueue-dra-integrating-dynamic-resource-allocation.adoc
:_mod-docs-content-type: PROCEDURE
[id="kueue-dra-partitionable-devices_{context}"]
= Configuring partitionable devices
[role="_abstract"]
You can configure {kueue-name} to manage quota for partitionable devices based on actual device capacity rather than device count. Partitionable devices, such as NVIDIA Multi-Instance GPU (MIG) capable GPUs like the A100 or H100, allow a single GPU to be dynamically subdivided into smaller partitions.
When counter-based quota is configured, {kueue-name} charges quota in capacity units such as GPU memory rather than counting whole devices. For example, a `1g.5gb` MIG partition on an A100-40GB charges `4864Mi` of GPU memory quota, while a whole GPU charges `40320Mi`.
.Prerequisites
* You have cluster administrator permissions.
* You have installed {kueue-name} by using the {kueue-op}.
* You have created a `Kueue` CR.
* Your cluster is running {product-title} 4.22 or later.
* A DRA driver that publishes `consumesCounters` in `ResourceSlice` objects is installed, for example, `nvidia-dra-driver`.
* MIG is enabled on the GPU hardware.
* You have enabled the `DRAPartitionableDevices` Kubernetes feature gate by adding the `CustomNoUpgrade` feature set to the `FeatureGate` CR named `cluster`, as shown in the following example:
+
[source,yaml]
----
apiVersion: config.openshift.io/v1
kind: FeatureGate
metadata:
name: cluster
spec:
featureSet: CustomNoUpgrade
customNoUpgrade:
enabled:
- DRAPartitionableDevices
----
+
[WARNING]
====
Enabling the `CustomNoUpgrade` feature set on your cluster cannot be undone and prevents minor version updates. This feature set is not supported on production clusters. For information about enabling feature gates, see "Enabling features using feature gates".
====
.Procedure
. Verify that your DRA driver publishes counter data by running the following command:
+
[source,terminal]
----
$ oc get resourceslices -o jsonpath='{range .items[*]}{.spec.driver}{"\t"}{range .spec.devices[*]}{.name}: {.consumesCounters}{"\n"}{end}{end}'
----
+
.Example output
[source,terminal]
----
gpu.nvidia.com gpu-0: [{"counterSet":"shared","counters":{"memory":{"value":"40Gi"}}}]
----
+
If the output does not show `consumesCounters` data, verify that your DRA driver version supports partitionable devices and that MIG is enabled on the GPU hardware.
. Configure counter-based quota by adding a `deviceClassMappings` entry with a `sources` section to the `config.resources` section of the {kueue-name} CR, as shown in the following example:
+
[source,yaml]
----
apiVersion: kueue.openshift.io/v1
kind: Kueue
metadata:
name: cluster
namespace: openshift-kueue-operator
spec:
config:
resources:
deviceClassMappings:
- name: gpu.memory # <1>
deviceClassNames: # <2>
- gpu.nvidia.com
- mig.nvidia.com
sources: # <3>
- type: Counter
counter:
name: memory # <4>
driver: gpu.nvidia.com
deviceSelector: # <5>
type: CEL
cel:
expression: "device.driver == 'gpu.nvidia.com'"
# ...
----
<1> The logical resource name used in `ClusterQueue` quotas. When counter-based sources are configured, quota is charged in capacity units rather than device count.
<2> The `DeviceClass` names that map to this resource. Include both the whole-GPU class (`gpu.nvidia.com`) and the MIG class (`mig.nvidia.com`).
<3> Defines how {kueue-name} computes the quota charge.
<4> The counter name must match a counter key published by the DRA driver in `ResourceSlice` devices.
<5> Scopes which devices are eligible for counter-based quota accounting.
+
[NOTE]
====
The {kueue-name} operator automatically enables the required {kueue-name} feature gates when it detects the `DRAPartitionableDevices` Kubernetes feature gate and `sources` are configured in `deviceClassMappings`. No manual {kueue-name} feature gate configuration is required.
====
. Create a `ClusterQueue` with counter-based quota. Set the quota in capacity units rather than device count. Create a file called `pd-queues.yaml` with the following content:
+
.Example quota configuration for partitionable devices
[source,yaml]
----
apiVersion: kueue.x-k8s.io/v1beta2
kind: ResourceFlavor
metadata:
name: "default-flavor"
---
apiVersion: kueue.x-k8s.io/v1beta2
kind: ClusterQueue
metadata:
name: "cluster-queue"
spec:
namespaceSelector: {}
resourceGroups:
- coveredResources: ["cpu", "memory", "gpu.memory"] # <1>
flavors:
- name: "default-flavor"
resources:
- name: "cpu"
nominalQuota: 40
- name: "memory"
nominalQuota: 200Gi
- name: "gpu.memory" # <2>
nominalQuota: 800Gi
---
apiVersion: kueue.x-k8s.io/v1beta2
kind: LocalQueue
metadata:
namespace: "team-a"
name: "user-queue"
spec:
clusterQueue: "cluster-queue"
----
<1> The `gpu.memory` entry must match the `name` value in `deviceClassMappings`.
<2> Sets the total GPU memory quota. For example, `800Gi` accommodates twenty A100-40GB GPUs or equivalent MIG partitions.
+
[NOTE]
====
When `ClusterQueue` objects share a cohort, ensure all queues use the same unit scale for counter resources. {kueue-name} does not validate unit consistency across `ClusterQueue` objects.
====
. Apply the quota configuration by running the following command:
+
[source,terminal]
----
$ oc apply -f pd-queues.yaml
----
. Create a workload that requests a MIG partition. Create a file called `pd-job.yaml` with the following content:
+
.Example workload requesting a MIG partition
[source,yaml]
----
apiVersion: resource.k8s.io/v1
kind: ResourceClaimTemplate
metadata:
namespace: team-a
name: gpu-partition
spec:
spec:
devices:
requests:
- name: gpu
exactly:
deviceClassName: mig.nvidia.com # <1>
count: 1
selectors:
- cel:
expression: "device.attributes['gpu.nvidia.com'].profile == '1g.5gb'" # <2>
---
apiVersion: batch/v1
kind: Job
metadata:
generateName: pd-test-job-
namespace: team-a
labels:
kueue.x-k8s.io/queue-name: user-queue # <3>
spec:
template:
spec:
containers:
- name: worker
image: registry.k8s.io/e2e-test-images/agnhost:2.53
args: ["pause"]
resources:
claims:
- name: gpu
requests:
cpu: "1"
memory: "200Mi"
resourceClaims:
- name: gpu
resourceClaimTemplateName: gpu-partition # <4>
restartPolicy: Never
----
<1> References the MIG `DeviceClass`.
<2> Selects a specific MIG partition profile. Available profiles depend on the GPU model, for example, `1g.5gb`, `2g.10gb`, `3g.20gb`, or `7g.40gb` for the A100-40GB.
<3> Identifies the local queue to submit the job to.
<4> References the `ResourceClaimTemplate` defined above. The `ResourceClaimTemplate` must exist in the same namespace as the job.
. Create the workload by running the following command:
+
[source,terminal]
----
$ oc create -f pd-job.yaml
----
.Verification
. Verify that the workload is admitted and that quota was charged in capacity units by running the following command:
+
[source,terminal]
----
$ oc -n team-a get workloads -o jsonpath='{range .items[*]}{.metadata.name}: {.status.admission.podSetAssignments[0].resourceUsage}{"\n"}{end}'
----
+
.Example output
[source,terminal]
----
job-pd-test-job-xxxxx: {"cpu":"1","gpu.memory":"4864Mi","memory":"200Mi"}
----
+
The `gpu.memory` value reflects the actual memory capacity of the requested MIG partition rather than a device count of `1`.
. If the workload is not admitted, verify the following:
+
* The `DRAPartitionableDevices` Kubernetes feature gate is enabled on the cluster.
* The `deviceClassMappings` `name` value matches the resource name in `coveredResources`.
* The `counter.name` in `sources` matches a counter key in the `ResourceSlice` objects.
* The `ClusterQueue` has sufficient GPU memory quota for the requested partition size.
* MIG is enabled on the GPU hardware.
There was a problem hiding this comment.
Replaced the procedure with the above.
| + | ||
| [source,terminal] | ||
| ---- | ||
| $ oc apply -f https://kueue.sigs.k8s.io/examples/dra/sample-dra-queues.yaml |
There was a problem hiding this comment.
Instead of pointing to an external link, can we follow the same as https://github.com/openshift/openshift-docs/pull/113996/changes#diff-50e686ab942a05f0afecd2feb3d4f0e5f49175c442dbe9a013e83a970da6d9fcR49? (where cluster-queue.yaml file was created).
| + | ||
| [source,terminal] | ||
| ---- | ||
| $ oc apply -f https://kueue.sigs.k8s.io/examples/dra/sample-dra-queues.yaml |
There was a problem hiding this comment.
Instead of pointing to an external link, can we follow the same as https://github.com/openshift/openshift-docs/pull/113996/changes#diff-50e686ab942a05f0afecd2feb3d4f0e5f49175c442dbe9a013e83a970da6d9fcR49? (where cluster-queue.yaml file was created).
| spec: | ||
| clusterQueue: "cluster-queue" | ||
| ---- | ||
|
|
There was a problem hiding this comment.
The procedure configures infrastructure (deviceClassMappings, ClusterQueue, LocalQueue) but never shows a workload YAML. A user finishes this procedure and doesn't know what a Job with a ResourceClaimTemplate looks like. The PD module includes a workload example — the RCT module could include it too.
There was a problem hiding this comment.
There was a problem hiding this comment.
What clarification you need @StephenJamesSmith ?
There was a problem hiding this comment.
@anahas-redhat I met with Sohan last week and he said he would go through your comments that I had questions about (you were not able to attend the meeting). I put these comments (@sohankunkerkar ?) to indicate the comments he should address.
There was a problem hiding this comment.
@StephenJamesSmith feel free to reschedule if you need clarification.
| [role="_abstract"] | ||
| Dynamic Resource Allocation (DRA) structured parameters is a Kubernetes feature that enables declarative management of specialized hardware such as GPUs, FPGAs, and network adapters. In the context of {kueue-name}, it provides quota management for workloads that use these devices. | ||
|
|
||
| {kueue-name} provides two approaches for managing the DRA device quota: |
There was a problem hiding this comment.
The Structured Parameters concept module introduces two DRA paths but never explains what structured parameters actually means, why a user would choose the ResourceClaimTemplate path over the Extended Resources path, or what capabilities each path provides. Without this guidance, users have no basis for choosing between the two approaches.
There was a problem hiding this comment.
There was a problem hiding this comment.
Is there any question here @StephenJamesSmith ?
There was a problem hiding this comment.
Suggestion of how this text can be (feel free to change):
[role="_abstract"]
Dynamic Resource Allocation (DRA) is a Kubernetes framework that provides structured discovery and allocation of specialized hardware such as GPUs. DRA drivers publish device information through ResourceSlice objects, and administrators group devices into named categories using DeviceClass objects.
Without {kueue-name} DRA integration, GPU requests made through DRA are invisible to quota management. {kueue-name} cannot account for these requests when admitting workloads, which can result in teams exceeding their GPU allocation.
{kueue-name} provides two approaches for managing DRA device quota:
ResourceClaimTemplate:: The default approach. Workloads explicitly reference a
ResourceClaimTemplate that defines device requirements. Administrators configure
deviceClassMappings in the Kueue CR to map each DeviceClass to a logical resource
name for quota tracking. Use this approach when workloads need fine-grained control over
device selection, such as targeting a specific GPU model or architecture using CEL selectors.
Extended resources:: A simplified alternative that allows workloads to use standard
Kubernetes resources.requests syntax, for example, nvidia.com/gpu: "1", instead of explicitly creating DRA objects. When a DeviceClass includes the
spec.extendedResourceName field, the Kubernetes scheduler automatically generates ResourceClaim objects. {kueue-name} detects this and charges quota only once, preventing double counting. Use this approach when you want the simplest possible user experience and backward compatibility with existing workload YAML.
For clusters with partitionable devices such as NVIDIA Multi-Instance GPU (MIG), {kueue-name} can also charge quota in capacity units, such as GPU memory, rather than device count.
Partitionable devices use ResourceClaimTemplates with CEL selectors to target specific partition profiles, and require administrators to configure counter-based sources in deviceClassMappings. This capability requires {product-title} 4.22 or later.
There was a problem hiding this comment.
Replaced with the above text.
| toc::[] | ||
|
|
||
| [role="_abstract"] | ||
| {kueue-name} Dynamic Resource Allocation (DRA) integration enables advanced management of specialized hardware resources like GPUs, FPGAs, and other accelerators within Kubernetes workload queuing. This integration allows for the reading and publishing of ResourceSlices, counter-based quota computation, and specific admission behaviors. |
There was a problem hiding this comment.
| {kueue-name} Dynamic Resource Allocation (DRA) integration enables advanced management of specialized hardware resources like GPUs, FPGAs, and other accelerators within Kubernetes workload queuing. This integration allows for the reading and publishing of ResourceSlices, counter-based quota computation, and specific admission behaviors. | |
| You can configure {kueue-name} to manage quota for workloads that use Dynamic Resource Allocation (DRA) to request GPUs. When DRA quota management is configured, {kueue-name} counts DRA device requests toward quota in the same way that it counts traditional resources such as CPU and memory. |
There was a problem hiding this comment.
Replaced.
| [role="_abstract"] | ||
| {kueue-name} Dynamic Resource Allocation (DRA) integration enables advanced management of specialized hardware resources like GPUs, FPGAs, and other accelerators within Kubernetes workload queuing. This integration allows for the reading and publishing of ResourceSlices, counter-based quota computation, and specific admission behaviors. | ||
|
|
||
| DRA is a Kubernetes framework that manages specialized hardware resources such as GPUs with fine-grained control. Unlike traditional resource requests, DRA allows dynamic prioritization—allocating GPUs to high-priority AI training workloads during business hours, then reallocating them to cost-optimized batch jobs overnight. |
There was a problem hiding this comment.
| DRA is a Kubernetes framework that manages specialized hardware resources such as GPUs with fine-grained control. Unlike traditional resource requests, DRA allows dynamic prioritization—allocating GPUs to high-priority AI training workloads during business hours, then reallocating them to cost-optimized batch jobs overnight. | |
| If DRA device quota is not configured, {kueue-name} does not account for GPU requests when admitting workloads, which can result in teams exceeding their GPU allocation. |
There was a problem hiding this comment.
Replaced.
| DRA is a Kubernetes framework that manages specialized hardware resources such as GPUs with fine-grained control. Unlike traditional resource requests, DRA allows dynamic prioritization—allocating GPUs to high-priority AI training workloads during business hours, then reallocating them to cost-optimized batch jobs overnight. | ||
|
|
||
| You can validate partitionable devices support in {kueue-name} Dynamic Resource Allocation (DRA) integration, covering partition-aware quota, admission, and scheduling. Partitionable devices, such as NVIDIA MIG, allow graphics processing units (GPUs) to be dynamically subdivided into smaller allocations. {kueue-name} must correctly handle quota accounting for these mutually exclusive partition configurations. | ||
|
|
There was a problem hiding this comment.
@sohankunkerkar Deleting lines 15-25 would leave the 2 bulleted items on lines 26-28 hanging. Should those be deleted too?
There was a problem hiding this comment.
Are you talking about the first two points from the Prerequisites?
There was a problem hiding this comment.
Rewrote this topic as per Sohan's input.
| .Prerequisites | ||
| * {kueue-name} is installed. | ||
| * {product-title} running version 4.21 or later. | ||
| * A DRA driver installed in the cluster, for example, `nvidia-dra-driver` or `k8s-dra-driver-gpu`. |
There was a problem hiding this comment.
| * A DRA driver installed in the cluster, for example, `nvidia-dra-driver` or `k8s-dra-driver-gpu`. | |
| .Prerequisites | |
| * You have installed {kueue-name} by using the {kueue-op}. | |
| * You have created a `Kueue` custom resource (CR). | |
| * Your cluster is running {product-title} 4.21 or later. | |
| * A DRA driver is installed in the cluster, for example, `nvidia-dra-driver`. You can verify that the DRA driver is publishing device information by running the following command: | |
| [source,terminal] | |
| ---- | |
| $ oc get resourceslices | |
| ---- | |
| + | |
| If the command returns one or more `ResourceSlice` objects, the DRA driver is running. | |
| * At least one `DeviceClass` object exists in the cluster. You can verify this by running the following command: | |
| + | |
| [source,terminal] | |
| ---- | |
| $ oc get deviceclass | |
| ---- |
There was a problem hiding this comment.
Added the above changes.
| = Configuring the extended resources path | ||
|
|
||
| [role="_abstract"] | ||
| You need to create an extended resources path that users submit workloads using the standard `resources.requests` syntax, for example, `nvidia.com/gpu: 1`, and a `DeviceClass` with `spec.extendedResourceName` that exists in the cluster. |
There was a problem hiding this comment.
| You need to create an extended resources path that users submit workloads using the standard `resources.requests` syntax, for example, `nvidia.com/gpu: 1`, and a `DeviceClass` with `spec.extendedResourceName` that exists in the cluster. | |
| You can configure {kueue-name} to manage quota for workloads that request GPUs by using the standard `resources.requests` syntax, for example, `nvidia.com/gpu: "1"`. When a `DeviceClass` includes the `spec.extendedResourceName` field, the Kubernetes scheduler automatically generates `ResourceClaim` objects. This path does not require `deviceClassMappings` configuration because {kueue-name} auto-discovers the mapping by indexing `DeviceClass` objects. |
There was a problem hiding this comment.
You can also add this:
[NOTE]
====
The {kueue-name} operator automatically enables the required {kueue-name} feature gates when it detects the `DRAExtendedResource` Kubernetes feature gate on the cluster. No manual {kueue-name} feature gate configuration is required.
There was a problem hiding this comment.
Here's my idea for this doc:
// Module included in the following assemblies:
//
// * ai_workloads/kueue/kueue-dra-integrating-dynamic-resource-allocation.adoc
:_mod-docs-content-type: PROCEDURE
[id="kueue-dra-extended-resources_{context}"]
= Configuring the extended resources path
[role="_abstract"]
You can configure {kueue-name} to manage quota for workloads that request GPUs by using the standard `resources.requests` syntax, for example, `nvidia.com/gpu: "1"`.
When a `DeviceClass` includes the `spec.extendedResourceName` field, the Kubernetes scheduler automatically generates `ResourceClaim` objects. This path does not require `deviceClassMappings` configuration because {kueue-name} auto-discovers the mapping by indexing `DeviceClass` objects.
[NOTE]
====
The {kueue-name} operator automatically enables the required {kueue-name} feature gates when it detects the `DRAExtendedResource` Kubernetes feature gate on the cluster. No manual {kueue-name} feature gate configuration is required.
====
.Prerequisites
* You have cluster administrator permissions.
* You have installed {kueue-name} by using the {kueue-op}.
* You have created a `Kueue` CR.
* A DRA driver is installed and has published `ResourceSlice` objects.
* You have enabled the `DRAExtendedResource` Kubernetes feature gate by adding the `CustomNoUpgrade` feature set to the `FeatureGate` CR named `cluster`, as shown in the following example:
+
[source,yaml]
----
apiVersion: config.openshift.io/v1
kind: FeatureGate
metadata:
name: cluster
spec:
featureSet: CustomNoUpgrade
customNoUpgrade:
enabled:
- DRAExtendedResource
----
+
[WARNING]
====
Enabling the `CustomNoUpgrade` feature set on your cluster cannot be undone and prevents minor version updates. This feature set is not supported on production clusters. For information about enabling feature gates, see "Enabling features using feature gates".
====
.Procedure
. Verify that the `DeviceClass` has `spec.extendedResourceName` set by running the following command:
+
[source,terminal]
----
$ oc get deviceclass gpu.nvidia.com -o jsonpath='{.spec.extendedResourceName}'
----
+
.Example output
[source,terminal]
----
nvidia.com/gpu
----
+
If the command does not return a value, add the `extendedResourceName` field by running the following command:
+
[source,terminal]
----
$ oc patch deviceclass gpu.nvidia.com --type=merge -p '{"spec":{"extendedResourceName":"nvidia.com/gpu"}}'
----
. Create a `ClusterQueue` that includes the GPU resource in `coveredResources`. Create a file called `er-queues.yaml` with the following content:
+
.Example quota configuration for extended resources
[source,yaml]
----
apiVersion: kueue.x-k8s.io/v1beta2
kind: ResourceFlavor
metadata:
name: "default-flavor"
---
apiVersion: kueue.x-k8s.io/v1beta2
kind: ClusterQueue
metadata:
name: "cluster-queue"
spec:
namespaceSelector: {}
resourceGroups:
- coveredResources: ["cpu", "memory", "nvidia.com/gpu"]
flavors:
- name: "default-flavor"
resources:
- name: "cpu"
nominalQuota: 40
- name: "memory"
nominalQuota: 200Gi
- name: "nvidia.com/gpu"
nominalQuota: 8
---
apiVersion: kueue.x-k8s.io/v1beta2
kind: LocalQueue
metadata:
namespace: "team-a"
name: "user-queue"
spec:
clusterQueue: "cluster-queue"
----
. Apply the quota configuration by running the following command:
+
[source,terminal]
----
$ oc apply -f er-queues.yaml
----
. Create a workload that uses the standard resource request syntax. Create a file called `er-job.yaml` with the following content:
+
.Example workload using extended resources
[source,yaml]
----
apiVersion: batch/v1
kind: Job
metadata:
generateName: er-test-job-
namespace: team-a
labels:
kueue.x-k8s.io/queue-name: user-queue # <1>
spec:
template:
spec:
containers:
- name: worker
image: registry.k8s.io/e2e-test-images/agnhost:2.53
args: ["pause"]
resources:
requests:
cpu: "1"
memory: "200Mi"
nvidia.com/gpu: "1" # <2>
restartPolicy: Never
----
<1> Identifies the local queue to submit the job to.
<2> Requests a GPU by using the standard extended resource syntax. No `ResourceClaimTemplate` or `resourceClaims` section is needed. The `DeviceClass` with `spec.extendedResourceName` causes the Kubernetes scheduler to generate a `ResourceClaim` automatically.
. Create the workload by running the following command:
+
[source,terminal]
----
$ oc create -f er-job.yaml
----
.Verification
. Verify that a workload has been created and admitted by running the following command:
+
[source,terminal]
----
$ oc -n team-a get workloads
----
+
.Example output
[source,terminal]
----
NAME QUEUE RESERVED IN ADMITTED AGE
job-er-test-job-4m2x-d3f4g user-queue cluster-queue True 10s
----
. Verify that a `ResourceClaim` was automatically created by running the following command:
+
[source,terminal]
----
$ oc -n team-a get resourceclaims
----
+
The Kubernetes scheduler creates a `ResourceClaim` for each pod that requests an extended resource backed by a `DeviceClass`.
+
If the workload is not admitted, verify the following:
+
* The `DRAExtendedResource` Kubernetes feature gate is enabled on the cluster.
* The `DeviceClass` has `spec.extendedResourceName` set.
* The `ClusterQueue` includes the extended resource name in `coveredResources`.
* The `ClusterQueue` has sufficient quota available.
+There was a problem hiding this comment.
You can change this as per openshift docs standards.
There was a problem hiding this comment.
Replaced with the above procedure.
| * You have created a `Kueue` custom resource (CR). | ||
| * Your cluster is running {product-title} 4.21 or later. | ||
| + | ||
| ==== |
There was a problem hiding this comment.
🤖 [error] AsciiDocDITA.ExampleBlock: Example blocks can not be inside of other blocks in DITA.
|
@StephenJamesSmith @sohankunkerkar the material we discussed is linked below.
Important note: not sure if this will be a separate doc or just a note but Versions 4.18–4.20 are not supported. |
|
|
||
| * xref:../../nodes/pods/nodes-pods-allocate-dra.adoc#nodes-pods-allocate-dra[Allocating GPUs to pods by using DRA] | ||
|
|
||
|
|
There was a problem hiding this comment.
The levels below seem to be not quite correct. I guess we should have something like this:
= Integrating Dynamic Resource Allocation
== DRA quota management overview
=== Configuring ResourceClaimTemplates
=== Configuring Extended Resources
=== Configuring Partitionable Devices
This way the concept module introduces all three, and the procedures sit underneath it.
There was a problem hiding this comment.
updated the levels.
|
|
||
| :_mod-docs-content-type: CONCEPT | ||
| [id="kueue-dra-structured-parameters_{context}"] | ||
| = Structured parameters |
There was a problem hiding this comment.
Suggestion: change from Structured Parameters to DRA quota management.
Why? "Structured parameters" is not an umbrella term — it's a specific Kubernetes DRA implementation (KEP #4381) that replaced "classic DRA" (KEP #3063, withdrawn in K8s 1.32). It means the scheduler can understand device attributes directly via ResourceSlices and DeviceClasses, rather than depending on opaque third-party drivers.
In that sense, all of DRA as it exists today IS structured parameters.
There was a problem hiding this comment.
Changed the title.
| [role="_abstract"] | ||
| Dynamic Resource Allocation (DRA) structured parameters is a Kubernetes feature that enables declarative management of specialized hardware such as GPUs, FPGAs, and network adapters. In the context of {kueue-name}, it provides quota management for workloads that use these devices. | ||
|
|
||
| {kueue-name} provides two approaches for managing the DRA device quota: |
There was a problem hiding this comment.
Suggestion of how this text can be (feel free to change):
[role="_abstract"]
Dynamic Resource Allocation (DRA) is a Kubernetes framework that provides structured discovery and allocation of specialized hardware such as GPUs. DRA drivers publish device information through ResourceSlice objects, and administrators group devices into named categories using DeviceClass objects.
Without {kueue-name} DRA integration, GPU requests made through DRA are invisible to quota management. {kueue-name} cannot account for these requests when admitting workloads, which can result in teams exceeding their GPU allocation.
{kueue-name} provides two approaches for managing DRA device quota:
ResourceClaimTemplate:: The default approach. Workloads explicitly reference a
ResourceClaimTemplate that defines device requirements. Administrators configure
deviceClassMappings in the Kueue CR to map each DeviceClass to a logical resource
name for quota tracking. Use this approach when workloads need fine-grained control over
device selection, such as targeting a specific GPU model or architecture using CEL selectors.
Extended resources:: A simplified alternative that allows workloads to use standard
Kubernetes resources.requests syntax, for example, nvidia.com/gpu: "1", instead of explicitly creating DRA objects. When a DeviceClass includes the
spec.extendedResourceName field, the Kubernetes scheduler automatically generates ResourceClaim objects. {kueue-name} detects this and charges quota only once, preventing double counting. Use this approach when you want the simplest possible user experience and backward compatibility with existing workload YAML.
For clusters with partitionable devices such as NVIDIA Multi-Instance GPU (MIG), {kueue-name} can also charge quota in capacity units, such as GPU memory, rather than device count.
Partitionable devices use ResourceClaimTemplates with CEL selectors to target specific partition profiles, and require administrators to configure counter-based sources in deviceClassMappings. This capability requires {product-title} 4.22 or later.
cdc7c97 to
1f42c40
Compare
1f42c40 to
9a3739c
Compare
| - Applies immediate admission penalties to prevent resource monopolization | ||
|
|
||
| For more information, see xref:../../ai_workloads/kueue/admission-fair-sharing.adoc#admission-fair-sharing[Admission fair sharing]. No newline at end of file | ||
| For more information, see xref:../../ai_workloads/kueue/admission-fair-sharing.adoc#admission-fair-sharing[Admission fair sharing]. |
There was a problem hiding this comment.
🤖 [error] OpenShiftAsciiDoc.NoXrefInModules: Do not include xrefs in modules, only assemblies (exception: release notes modules).
| Dynamic Resource Allocation (DRA) quota management for GPUs (Technology Preview):: | ||
| {kueue-name} now supports quota management for workloads that request GPUs through Dynamic Resource Allocation (DRA). When configured, {kueue-name} tracks DRA device requests toward quota alongside traditional resources such as CPU and memory, preventing teams from exceeding their allocated GPU resources. | ||
|
|
||
| For more information, see xref:../../ai_workloads/kueue/kueue-dra-integrating-dynamic-resource-allocation.adoc#ueue-dra-integrating-dynamic-resource-allocation[Integrating Dynamic Resource Allocation]. |
There was a problem hiding this comment.
🤖 [error] OpenShiftAsciiDoc.NoXrefInModules: Do not include xrefs in modules, only assemblies (exception: release notes modules).
anahas-redhat
left a comment
There was a problem hiding this comment.
@StephenJamesSmith thanks for the latest round of updates — the ER and PD modules look much better after Sohan's rewrites. A few items are still outstanding from earlier review rounds:
modules/kueue-dra-resourceclaimtemplates.adoc
1. Line 18: Still uses Configuration object instead of Kueue CR (earlier comment)
The YAML shows:
apiVersion: config.kueue.x-k8s.io/v1beta2
kind: ConfigurationThis is not a CRD on OCP — users cannot oc apply it. The operator owns the Configuration and generates it from the Kueue CR. Replace step 1 with an oc patch against the Kueue CR:
$ oc patch kueue cluster -n openshift-kueue-operator --type=merge -p '{
"spec": {
"config": {
"resources": {
"deviceClassMappings": [{
"name": "nvidia.com/gpu",
"deviceClassNames": ["gpu.nvidia.com"]
}]
}
}
}
}'
2. Line 46: Still points to upstream URL (earlier comment)
$ oc apply -f https://kueue.sigs.k8s.io/examples/dra/sample-dra-queues.yaml
This should be inlined as a local file (e.g., "Create a file called rct-queues.yaml with the following content:"), same pattern already used in the ER module (er-queues.yaml at kueue-dra-extended-resources.adoc:75) and PD module (pd-queues.yaml at kueue-dra-partitionable-devices.adoc:113).
3. After line 83: Missing workload example and verification (earlier comment)
The procedure ends after ClusterQueue/LocalQueue creation but never shows a Job + ResourceClaimTemplate example. The ER module has er-job.yaml (kueue-dra-extended-resources.adoc:108) and the PD module has pd-job.yaml (kueue-dra-partitionable-devices.adoc:155) — the RCT module needs the same. Here's an example that matches the namespace (default) and queue names (user-queue, cluster-queue) already established in step 2:
apiVersion: resource.k8s.io/v1
kind: ResourceClaimTemplate
metadata:
name: my-gpu
namespace: default
spec:
spec:
devices:
requests:
- name: gpu
exactly:
deviceClassName: gpu.nvidia.com # <1>
---
apiVersion: batch/v1
kind: Job
metadata:
generateName: rct-test-job-
namespace: default
labels:
kueue.x-k8s.io/queue-name: user-queue # <2>
spec:
template:
spec:
restartPolicy: Never
resourceClaims:
- name: gpu
resourceClaimTemplateName: my-gpu # <3>
containers:
- name: worker
image: registry.k8s.io/e2e-test-images/agnhost:2.53
args: ["pause"]
resources:
claims:
- name: gpu # <4>
requests:
cpu: "1"
memory: "200Mi"- References the
DeviceClassconfigured indeviceClassMappings. - Identifies the local queue to submit the job to.
- References the
ResourceClaimTemplatedefined above. The template must exist in the same namespace as the job. - Attaches the resource claim to this container.
Add a verification section matching the ER (kueue-dra-extended-resources.adoc:143) and PD (kueue-dra-partitionable-devices.adoc:209) modules:
.Verification
. Verify that the workload has been created and admitted:
+
$ oc -n default get workloads
. Verify that a ResourceClaim was created from the template:
+
$ oc -n default get resourceclaims
+
If the workload is not admitted, verify the following:
+
* The deviceClassMappings in the Kueue CR maps the DeviceClass name to the resource name in coveredResources.
* The ClusterQueue has sufficient quota available.
* The ResourceClaimTemplate exists in the same namespace as the job.
modules/kueue-dra-extended-resources.adoc
Line 131: Broken callout marker
kueue.x-k8s.io/queue-name: user-queue # The # has no callout number. Should be # <1>. The annotation on line 140 (# <2>) should then become # <2> — currently the numbering is off because <1> is missing.
modules/kueue-release-notes-1.4.adoc
Line 28: Broken xref — missing leading k
xref:...#ueue-dra-integrating-dynamic-resource-allocation[...]
Should be #kueue-dra-integrating-dynamic-resource-allocation.
9a3739c to
0f7218d
Compare
| ==== | ||
| Enabling the `CustomNoUpgrade` feature set on your cluster cannot be undone and prevents minor version updates. This feature set is not supported on production clusters. | ||
| ==== | ||
There was a problem hiding this comment.
It might be worth adding a step after the FeatureGate CR example for users to wait for the MCP worker rollout to complete before proceeding. Alice has also mentioned this in her comment as part of the flow. Same applies for Partitionable Devices flow as well.
| - Applies immediate admission penalties to prevent resource monopolization | ||
|
|
||
| For more information, see xref:../../ai_workloads/kueue/admission-fair-sharing.adoc#admission-fair-sharing[Admission fair sharing]. No newline at end of file | ||
| For more information, see xref:../../ai_workloads/kueue/admission-fair-sharing.adoc#admission-fair-sharing[Admission fair sharing]. |
There was a problem hiding this comment.
🤖 [error] OpenShiftAsciiDoc.NoXrefInModules: Do not include xrefs in modules, only assemblies (exception: release notes modules).
| Dynamic Resource Allocation (DRA) quota management for GPUs (Technology Preview):: | ||
| {kueue-name} now supports quota management for workloads that request GPUs through Dynamic Resource Allocation (DRA). When configured, {kueue-name} tracks DRA device requests toward quota alongside traditional resources such as CPU and memory, preventing teams from exceeding their allocated GPU resources. | ||
|
|
||
| For more information, see xref:../../ai_workloads/kueue/kueue-dra-integrating-dynamic-resource-allocation.adoc#kueue-dra-integrating-dynamic-resource-allocation[Integrating Dynamic Resource Allocation]. |
There was a problem hiding this comment.
🤖 [error] OpenShiftAsciiDoc.NoXrefInModules: Do not include xrefs in modules, only assemblies (exception: release notes modules).
| * The `deviceClassMappings` `name` value matches the resource name in `coveredResources`. | ||
| * The `counter.name` in `sources` matches a counter key in the `ResourceSlice` objects. | ||
| * The `ClusterQueue` has sufficient GPU memory quota for the requested partition size. | ||
| * MIG is enabled on the GPU hardware. |
There was a problem hiding this comment.
@sohankunkerkar Do you think we need to call out the alpha limitation that extended resources and counter sources cannot be used together on the same DeviceClass in this doc?
There was a problem hiding this comment.
[NOTE]
Extended resources and counter-based sources cannot be used together on the same `DeviceClass`. If a workload uses the extended resource syntax (for example, `nvidia.com/gpu: "1"`) and the `DeviceClass` mapping has counter sources configured, the workload is marked inadmissible. For more details, see link: [Path Interactions](https://github.com/kubernetes-sigs/kueue/blob/main/keps/2941-DRA/README.md#path-interactions) in the upstream Kueue documentation.
@StephenJamesSmith Can we add this block in kueue-dra-partitionable-devices.adoc ? I'll leave the placement to you.
There was a problem hiding this comment.
@PannagaRao Where can I find the limitations? I found this -
In the alpha phase of Kubernetes [Dynamic Resource Allocation (DRA)] combining extended resources and counter sources on the same DeviceClass is not supported. You must define separate device classes if you intend to track countable extended resources (like specific GPUs) alongside capacity or counter-based attributes within your cluster.
DRA models devices use attributes rather than just counting quantities. Extended resources track hard limits, that is the total integer counts of hardware. Counter sources manage granular, sliceable, or dynamic capacity constraints. Mixing these two different allocation models within a single DeviceClass causes scheduling conflicts, because the kube-scheduler attempts to reconcile discrete claims against dynamic capacity simultaneously.
You can mitigate these limitations by separating the classes: Create one DeviceClass for your countable extended resources and a completely separate DeviceClass for counter capacities.
There was a problem hiding this comment.
https://github.com/kubernetes-sigs/kueue/blob/main/keps/2941-DRA/README.md#path-interactions
I have added this link in the NOTE suggestion
There was a problem hiding this comment.
Two issues with this link: 1) I can only put links in the ASSEMBLY, which may not be a problem because that's probably the best place to put the NOTE. 2) We have restrictions about xrefs to external repos. I can try it, but it may get rejected in Merge review.
0f7218d to
cdc2e6d
Compare
| - Applies immediate admission penalties to prevent resource monopolization | ||
|
|
||
| For more information, see xref:../../ai_workloads/kueue/admission-fair-sharing.adoc#admission-fair-sharing[Admission fair sharing]. No newline at end of file | ||
| For more information, see xref:../../ai_workloads/kueue/admission-fair-sharing.adoc#admission-fair-sharing[Admission fair sharing]. |
There was a problem hiding this comment.
🤖 [error] OpenShiftAsciiDoc.NoXrefInModules: Do not include xrefs in modules, only assemblies (exception: release notes modules).
| Dynamic Resource Allocation (DRA) quota management for GPUs (Technology Preview):: | ||
| {kueue-name} now supports quota management for workloads that request GPUs through Dynamic Resource Allocation (DRA). When configured, {kueue-name} tracks DRA device requests toward quota alongside traditional resources such as CPU and memory, preventing teams from exceeding their allocated GPU resources. | ||
|
|
||
| For more information, see xref:../../ai_workloads/kueue/kueue-dra-integrating-dynamic-resource-allocation.adoc#kueue-dra-integrating-dynamic-resource-allocation[Integrating Dynamic Resource Allocation]. |
There was a problem hiding this comment.
🤖 [error] OpenShiftAsciiDoc.NoXrefInModules: Do not include xrefs in modules, only assemblies (exception: release notes modules).
|
@StephenJamesSmith: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
@PannagaRao @sohankunkerkar FYI: I've added the release note to this PR. Please review. Thx! |

OCPSTRAT-2380 [DS/RN] Kueue 1.4 and DRA
Version: 4.21+
Jira: https://redhat.atlassian.net/browse/OSDOCS-20033
Previews:
https://113996--ocpdocs-pr.netlify.app/openshift-enterprise/latest/ai_workloads/kueue/kueue-dra-integrating-dynamic-resource-allocation.html
https://113996--ocpdocs-pr.netlify.app/openshift-enterprise/latest/ai_workloads/kueue/release-notes.html#release-notes-1.4_release-notes
Dev: @kannon92 @PannagaRao @sohankunkerkar
QE @MaysaMacedo @anahas-redhat