feat: Adding a feature `maxNodesToProcess` to HighNodeUtilization Plugin to limit the nodes processed on each execution #1706

jonahjon · 2025-06-12T21:45:15Z

Adding a new settings on the HighNodeUtilization plugin called maxNodesToProcess . This plugins purpose is to limit the amount of nodes processed during each descheduler execution.

This is useful as our company is looking to implement descheduler for improved binpacking, but without this guardrail it can potentially impact several nodes at once which could cause service disruptions.

Similar goal to this other open PR. #1616

k8s-ci-robot · 2025-06-12T21:45:24Z

Welcome @jonahjon!

It looks like this is your first PR to kubernetes-sigs/descheduler 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/descheduler has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

k8s-ci-robot · 2025-06-12T21:45:25Z

Hi @jonahjon. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot · 2025-06-12T21:45:32Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign damemi for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

googs1025 · 2025-06-13T00:39:59Z

/ok-to-test

googs1025 · 2025-06-13T00:43:50Z

pkg/framework/plugins/nodeutilization/types.go

+	// MaxNodesToProcess limits the number of nodes to process in each
+	// the descheduling execution. This is useful to limit nodes descheduled each run 
+	// when turning this plugin on within a cluster with many underutilized nodes.
+	MaxNodesToProcess int                `json:"maxNodesToProcess,omitempty"`


can we add some unit test case in TestHighNodeUtilization? 😄

Sure, i will add those.

googs1025 · 2025-06-13T00:49:42Z

I think this change is reasonable to avoid too much change when large-scale nodes run the HighNodeUtilization plugin.

ingvagabund · 2025-06-13T11:59:32Z

README.md

 |Name|Type|
 |---|---|
 |`thresholds`|map(string:int)|
 |`numberOfNodes`|int|


Worth deprecating numberOfNodes and renaming it to minNodesToProcess. Alternatively, introducing a new nodesToProcessLimits (or a better name) with min and max fields. Are min and max the only useful limits to introduce? Resp. should only a number be introduced? Or, percentages as well?

For us we're specifically looking to slow this down, so unsure if there is other use cases out there. The default settings are aggressive enough I'm not sure if min would be useful for company in any way.

Maybe just supporting max and percentage would be fine? We might care more about avoiding evicting all high utilization nodes. 🤔

type NodeProcessingLimits struct { Max int32 `json:"max,omitempty"` MaxPercentage int32 `json:"maxPercentage,omitempty"` }

(The name is just an example, not recommendation.)

I agree with @ingvagabund here, the numberOfNodes configuration relates a lot with the newly introduced maxNodesToProcess. To deprecate the first one and to make both part of the same structure (with min and max properties) would be better here.

I don't think we should be going as far as have "percentages" here except if there is a clear usecase.

ingvagabund · 2025-06-13T12:00:49Z

pkg/framework/plugins/nodeutilization/highnodeutilization.go


+    // limit the number of nodes processed each execution if `MaxNodesToProcess` is set
+	if h.args.MaxNodesToProcess > 0 && len(lowNodes) > h.args.MaxNodesToProcess {
+		lowNodes = lowNodes[:h.args.MaxNodesToProcess]


The index needs to rotate so all nodes are eventually processed.

Looking at possible ways to do this I see a couple ways to solve it, but would love feedback.

Add a new field to the struct to remove state from within the function/file

Add a new constant within the file to track index

Open to any other opinions/suggestions!

type HighNodeUtilization struct { handle frameworktypes.Handle args *HighNodeUtilizationArgs podFilter func(pod *v1.Pod) bool criteria []any resourceNames []v1.ResourceName highThresholds api.ResourceThresholds usageClient usageClient lastProcessedIndex int } ... if h.args.MaxNodesToProcess > 0 && len(lowNodes) > h.args.MaxNodesToProcess { start := h.lastProcessedIndex % len(lowNodes) rotated := append(lowNodes[start:], lowNodes[:start]...) lowNodes = rotated[:h.args.MaxNodesToProcess] h.lastProcessedIndex = (h.lastProcessedIndex + h.args.MaxNodesToProcess) % len(lowNodes) } ...

// rotateStartIdx is a variable to track the rotation index of MaxNodesToProcess var rotateStartIdx int ... if h.args.MaxNodesToProcess > 0 && len(lowNodes) > h.args.MaxNodesToProcess { start := rotateStartIdx % len(lowNodes) end := start + h.args.MaxNodesToProcess var selected []NodeInfo if end <= len(lowNodes) { selected = lowNodes[start:end] } else { selected = append(lowNodes[start:], lowNodes[:end%len(lowNodes)]...) } lowNodes = selected rotateStartIdx = (rotateStartIdx + h.args.MaxNodesToProcess) % len(lowNodes) }

An observation: it's not guaranteed the list of nodes will always be correctly ordered. I.e. even the rotated index may not select the nodes uniformly. If this leads to a sub-efficient behavior we can see if sort the nodes based on their names helps.

Option 1. looks good. So the second version of the code changes in the comment. Worth moving the code under a function so it can be unit test separately.

Also, MaxNodesToProcess needs to be at least as NumberOfNodes is if set.

k8s-ci-robot · 2025-06-25T15:27:26Z

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

ingvagabund · 2025-07-04T11:03:48Z

pkg/framework/plugins/nodeutilization/highnodeutilization.go


+    // limit the number of nodes processed each execution if `MaxNodesToProcess` is set
+	if h.args.MaxNodesToProcess > 0 && len(lowNodes) > h.args.MaxNodesToProcess {
+		lowNodes = lowNodes[:h.args.MaxNodesToProcess]


An observation: it's not guaranteed the list of nodes will always be correctly ordered. I.e. even the rotated index may not select the nodes uniformly. If this leads to a sub-efficient behavior we can see if sort the nodes based on their names helps.

ingvagabund · 2025-07-04T12:34:18Z

pkg/framework/plugins/nodeutilization/highnodeutilization.go


+    // limit the number of nodes processed each execution if `MaxNodesToProcess` is set
+	if h.args.MaxNodesToProcess > 0 && len(lowNodes) > h.args.MaxNodesToProcess {
+		lowNodes = lowNodes[:h.args.MaxNodesToProcess]


Option 1. looks good. So the second version of the code changes in the comment. Worth moving the code under a function so it can be unit test separately.

Also, MaxNodesToProcess needs to be at least as NumberOfNodes is if set.

ingvagabund · 2025-07-04T12:42:02Z

README.md

 This parameter can be configured to activate the strategy only when the number of under utilized nodes
 is above the configured value. This could be helpful in large clusters where a few nodes could go
-under utilized frequently or for a short period of time. By default, `numberOfNodes` is set to zero.
+under utilized frequently or for a short period of time. By default, `numberOfNodes` is set to zero. The parameter `maxNodesToProcess` is used to limit how many nodes should be processed by the descheduler plugin on each execution.


The current implementation for MaxNodesToProcess limits the number of underutilized nodes, not all the nodes. s/is used to limit how many nodes should be processed/is used to limit how many underutilized nodes should be processed/ to make this explicitly clear.

Can you also put the new sentence on a separate line? To limit the number of characters per line.

ingvagabund · 2025-07-04T12:44:06Z

@jonahjon again apologies for the delay. I am currently short on resources to allocate more time for upstream reviews.

ricardomaraschini · 2025-07-07T13:21:44Z

pkg/framework/plugins/nodeutilization/highnodeutilization_test.go

 			}

 			plugin, err := NewHighNodeUtilization(ctx, &HighNodeUtilizationArgs{
+				MaxNodesToProcess: 1,


I would not expect this to be necessary. Have you introduced this by mistake ?

ricardomaraschini · 2025-07-07T13:24:29Z

pkg/framework/plugins/nodeutilization/highnodeutilization_test.go

+			evictedPods:         []string{"p1", "p2", "p3", "p4"}, // Any one of these is valid
+			evictionModes:       nil,
+			// We'll set MaxNodesToProcess in the plugin args below
+		},


This new test is broken. Also, please make "maxNodesToProcess" part of the test definition so each test can set its own value.

k8s-triage-robot · 2025-10-05T13:37:21Z

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle stale
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-ci-robot · 2025-10-19T15:10:35Z

@jonahjon: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
pull-descheduler-unit-test-master-master	`4fcf6fb`	link	true	`/test pull-descheduler-unit-test-master-master`
pull-descheduler-test-e2e-k8s-master-1-34	`4fcf6fb`	link	true	`/test pull-descheduler-test-e2e-k8s-master-1-34`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

adding a feature to limit the nodes processed on each execution

188615b

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jun 12, 2025

k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jun 12, 2025

k8s-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Jun 12, 2025

k8s-ci-robot requested review from JaneLiuL and googs1025 June 12, 2025 21:45

jonahjon changed the title ~~feat: Adding a feature to limit the nodes processed on each execution~~ feat: Adding a feature maxNodesToProcess to limit the nodes processed on each execution Jun 12, 2025

jonahjon changed the title ~~feat: Adding a feature maxNodesToProcess to limit the nodes processed on each execution~~ feat: Adding a feature maxNodesToProcess to HighNodeUtilization Plugin to limit the nodes processed on each execution Jun 12, 2025

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 13, 2025

googs1025 reviewed Jun 13, 2025

View reviewed changes

ingvagabund reviewed Jun 13, 2025

View reviewed changes

jonahjon added 2 commits June 13, 2025 14:44

attempting to fix e2e tests

e3ad844

go fmt

b1cf80e

k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jun 13, 2025

jonahjon added 3 commits June 13, 2025 15:23

adding new test for MaxNodesToProcess

86f80fa

go fmt

95ff0ce

testing the test

4fcf6fb

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 25, 2025

ingvagabund reviewed Jul 4, 2025

View reviewed changes

ricardomaraschini suggested changes Jul 7, 2025

View reviewed changes

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 5, 2025

feat: Adding a feature maxNodesToProcess to HighNodeUtilization Plugin to limit the nodes processed on each execution #1706

Are you sure you want to change the base?

feat: Adding a feature maxNodesToProcess to HighNodeUtilization Plugin to limit the nodes processed on each execution #1706

Conversation

jonahjon commented Jun 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

k8s-ci-robot commented Jun 12, 2025

Uh oh!

k8s-ci-robot commented Jun 12, 2025

Uh oh!

k8s-ci-robot commented Jun 12, 2025

Uh oh!

googs1025 commented Jun 13, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

googs1025 commented Jun 13, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

k8s-ci-robot commented Jun 25, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ingvagabund commented Jul 4, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

k8s-triage-robot commented Oct 5, 2025

Uh oh!

k8s-ci-robot commented Oct 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

feat: Adding a feature `maxNodesToProcess` to HighNodeUtilization Plugin to limit the nodes processed on each execution #1706

feat: Adding a feature `maxNodesToProcess` to HighNodeUtilization Plugin to limit the nodes processed on each execution #1706

jonahjon commented Jun 12, 2025 •

edited

Loading