Skip to content

Conversation

@MenD32
Copy link
Contributor

@MenD32 MenD32 commented Sep 22, 2025

What type of PR is this?

/kind feature

What this PR does / why we need it:

Adds support for partitionable devices when calculating DRA utilization

Which issue(s) this PR fixes:

Fixes #8053

Special notes for your reviewer:

Does this PR introduce a user-facing change?

DRA: partitionable devices support

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. area/cluster-autoscaler labels Sep 22, 2025
@k8s-ci-robot k8s-ci-robot added the area/provider/kwok Issues or PRs related to the kwok cloud provider for Cluster Autoscaler label Sep 22, 2025
@k8s-ci-robot k8s-ci-robot requested a review from kgolab September 22, 2025 11:45
@k8s-ci-robot k8s-ci-robot added area/vertical-pod-autoscaler needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Sep 22, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @MenD32. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Sep 22, 2025
@MenD32 MenD32 force-pushed the feat/partitionable-devices-support branch from ba8303d to 4fa5202 Compare September 22, 2025 11:56
@MenD32 MenD32 changed the title Feat/partitionable devices support Feat: partitionable devices support Sep 22, 2025
@MenD32
Copy link
Contributor Author

MenD32 commented Sep 22, 2025

@towca this the same PR as #8160 which I had to close because I had some issues reverting #8539.

I'm not sure if reverting was in fact the right move, but in order to add this feature I think there is no way around it...

@jackfrancis
Copy link
Contributor

@towca I had to revert #8539 in this PR because the partitionable devices feature is only available in k8s.io/api/resource/v1beta1 and has not yet been released into k8s.io/api/resource/v1

@nojnhuh can you remind me how this flywheel works?

@nojnhuh
Copy link
Contributor

nojnhuh commented Oct 2, 2025

@towca I had to revert #8539 in this PR because the partitionable devices feature is only available in k8s.io/api/resource/v1beta1 and has not yet been released into k8s.io/api/resource/v1

@nojnhuh can you remind me how this flywheel works?

Doesn't v1 already include everything necessary for partitionable devices? https://github.com/kubernetes/kubernetes/blob/v1.34.1/staging/src/k8s.io/api/resource/v1/types.go#L157-L179

@MenD32 What was the exact issue you were running into that prompted going back to v1beta1?

@MenD32
Copy link
Contributor Author

MenD32 commented Oct 2, 2025

When I tried to merge with master I had an issue with Device.Basic.ConsumesCounters so I wrongly assumed it wasn't merged into v1 and kept under v1beta1, Now I see that they changed the Device struct to put ConsumesCounters somewhere else... I'll revert the version rollback

@MenD32 MenD32 force-pushed the feat/partitionable-devices-support branch from 1b8c0c8 to 3956443 Compare October 2, 2025 09:29
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Oct 2, 2025
@towca
Copy link
Collaborator

towca commented Oct 2, 2025

@MenD32 thanks a lot for flagging this, haven't bumped into this particular issue before 😅

@jackfrancis @nojnhuh I get that we might be ok for partitionable devices specifically, but will that hold for other features? I.e. can we keep iterating on DRA KEPs in CA while only importing the v1 version of the DRA API? What if there's a KEP that requires some API changes, wouldn't that start in the next beta version? This might be especially painful because only GA APIs get enabled by default, so v1 is the only one we can "rely" on being served for 1.34+.

@nojnhuh
Copy link
Contributor

nojnhuh commented Oct 2, 2025

My understanding is that alpha/beta features that intersect with the existing v1 APIs will be added to v1 and still feature gated, e.g. changes to DeviceClass, ResourceSlice, ResourceClaim(Template).

When a brand new API is introduced, then it will likely land in an alpha/beta API version first, e.g. DeviceTaintRules initially landing in v1alpha3 in 1.33: https://github.com/kubernetes/enhancements/tree/master/keps/sig-scheduling/5055-dra-device-taints-and-tolerations#:~:text=Describe%20the%20mechanism%3A%20resource.k8s.io/v1alpha3%20API%20group

1.35 is the first release cycle where we're adding features since v1 was added, so I'll keep an eye on KEP implementations for this cycle and let you know if that's not actually what happens.

@towca
Copy link
Collaborator

towca commented Oct 3, 2025

Thanks @nojnhuh, that makes sense, I didn't consider feature-gated fields in v1. And in the brand new API case we could import it and have it behind a separate flag.

@jackfrancis
Copy link
Contributor

/release-note-edit

DRA: partitionable devices support

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Oct 3, 2025
@jackfrancis
Copy link
Contributor

/label tide/merge-method-squash

@k8s-ci-robot k8s-ci-robot added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Oct 3, 2025
@jackfrancis jackfrancis removed area/provider/cluster-api Issues or PRs related to Cluster API provider area/provider/rancher area/provider/kwok Issues or PRs related to the kwok cloud provider for Cluster Autoscaler area/vertical-pod-autoscaler labels Oct 3, 2025
@MenD32
Copy link
Contributor Author

MenD32 commented Nov 3, 2025

Hi, any news regarding the review process for this PR?

@jackfrancis
Copy link
Contributor

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Nov 5, 2025
TotalConsumedCounters := map[string]map[string]resource.Quantity{}
for _, resourceSlice := range resourceSlices {
for _, sharedCounter := range resourceSlice.Spec.SharedCounters {
if _, ok := TotalConsumedCounters[sharedCounter.Name]; !ok {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a chance that more than one resource slice from the passed in resourceSlices will have a counter CounterSet with the same name (Name property)? That would be the only reason to check for existence before initializing TotalConsumedCounters[sharedCounter.Name] = map[string]resource.Quantity{}. Also, if that's true, are we confident that they won't have any collisions with any of the names of the Counters in their Counters map[string]Counter? Otherwise we're overwriting them below.

tl;dr we may be able to simplify this and simply assign TotalConsumedCounters[sharedCounter.Name] = map[string]resource.Quantity{} without having to first check if it's already there, or if not, there may be more checks.

I did this and UT still pass:

$ git diff
diff --git a/cluster-autoscaler/simulator/dynamicresources/utils/utilization.go b/cluster-autoscaler/simulator/dynamicresources/utils/utilization.go
index c717fdfd6..98f7480a6 100644
--- a/cluster-autoscaler/simulator/dynamicresources/utils/utilization.go
+++ b/cluster-autoscaler/simulator/dynamicresources/utils/utilization.go
@@ -74,9 +74,7 @@ func calculatePoolUtil(unallocated, allocated []resourceapi.Device, resourceSlic
        TotalConsumedCounters := map[string]map[string]resource.Quantity{}
        for _, resourceSlice := range resourceSlices {
                for _, sharedCounter := range resourceSlice.Spec.SharedCounters {
-                       if _, ok := TotalConsumedCounters[sharedCounter.Name]; !ok {
-                               TotalConsumedCounters[sharedCounter.Name] = map[string]resource.Quantity{}
-                       }
+                       TotalConsumedCounters[sharedCounter.Name] = map[string]resource.Quantity{}
                        for counter, value := range sharedCounter.Counters {
                                TotalConsumedCounters[sharedCounter.Name][counter] = value.Value
                        }

Copy link
Contributor Author

@MenD32 MenD32 Nov 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My impression from the KEP is that there shouldn't be any collisions in counterset names, since these are unique within a resource pool.

I got the impression that there could be a collision of the same sharedcounter from 2 different resource pools, but this would be high improbable since it'd imply that the same exact device (same device ID) appears in multiple resource pools.

Since this code is within a pool's scope, I think I'll simplify it the way you suggested

maxUtilization = float64(allocatedDevicesWithoutCounters) / float64(devicesWithoutCounters)
}
for counterSet, counters := range TotalConsumedCounters {
for counterName, totalValue := range counters {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: is this easier to follow?

			if totalValue.IsZero() {
				continue
			}

(rather then checking for !totalValue.IsZero() two nested iterations later)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, this is wayyy cleaner. I'll change it

if allocatedSet, exists := allocatedConsumedCounters[counterSet]; exists {
if allocatedValue, exists := allocatedSet[counterName]; exists && !totalValue.IsZero() {
utilization := float64(allocatedValue.Value()) / float64(totalValue.Value())
if utilization > maxUtilization {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we explain how we're able to compare counter allocation (expressed in terms of resource.Quantity) w/ device allocation (expressed in terms of num devices / ints)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might be a problem within the code, since if only some devices within the resource pool are non-partitionable, the correct utilization calculation would be to add the highest shared counter util with the ratio of non partitionable devices.

I think this case is also very unlikely since that'd imply that devices within the same resource pool are handled differently by the deviceClass, like partitioning only half on the GPUs in the node.

fix for that should be simple.

@MenD32 MenD32 force-pushed the feat/partitionable-devices-support branch from 9c3cd1b to 874a909 Compare November 7, 2025 13:09
@MenD32 MenD32 force-pushed the feat/partitionable-devices-support branch from 874a909 to c61b8ad Compare November 7, 2025 13:52
@MenD32 MenD32 requested a review from jackfrancis November 8, 2025 09:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/cluster-autoscaler cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CA DRA: handle partitionable devices (KEP-4815)

8 participants