Skip to content

Conversation

@jonahjon
Copy link

@jonahjon jonahjon commented Jun 12, 2025

Adding a new settings on the HighNodeUtilization plugin called maxNodesToProcess . This plugins purpose is to limit the amount of nodes processed during each descheduler execution.

This is useful as our company is looking to implement descheduler for improved binpacking, but without this guardrail it can potentially impact several nodes at once which could cause service disruptions.

Similar goal to this other open PR. #1616

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jun 12, 2025
@k8s-ci-robot
Copy link
Contributor

Welcome @jonahjon!

It looks like this is your first PR to kubernetes-sigs/descheduler 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/descheduler has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jun 12, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @jonahjon. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Jun 12, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign damemi for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jonahjon jonahjon changed the title feat: Adding a feature to limit the nodes processed on each execution feat: Adding a feature maxNodesToProcess to limit the nodes processed on each execution Jun 12, 2025
@jonahjon jonahjon changed the title feat: Adding a feature maxNodesToProcess to limit the nodes processed on each execution feat: Adding a feature maxNodesToProcess to HighNodeUtilization Plugin to limit the nodes processed on each execution Jun 12, 2025
@googs1025
Copy link
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 13, 2025
// MaxNodesToProcess limits the number of nodes to process in each
// the descheduling execution. This is useful to limit nodes descheduled each run
// when turning this plugin on within a cluster with many underutilized nodes.
MaxNodesToProcess int `json:"maxNodesToProcess,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add some unit test case in TestHighNodeUtilization? 😄

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, i will add those.

@googs1025
Copy link
Member

I think this change is reasonable to avoid too much change when large-scale nodes run the HighNodeUtilization plugin.

|Name|Type|
|---|---|
|`thresholds`|map(string:int)|
|`numberOfNodes`|int|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth deprecating numberOfNodes and renaming it to minNodesToProcess. Alternatively, introducing a new nodesToProcessLimits (or a better name) with min and max fields. Are min and max the only useful limits to introduce? Resp. should only a number be introduced? Or, percentages as well?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For us we're specifically looking to slow this down, so unsure if there is other use cases out there. The default settings are aggressive enough I'm not sure if min would be useful for company in any way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just supporting max and percentage would be fine? We might care more about avoiding evicting all high utilization nodes. 🤔

type NodeProcessingLimits struct {
    Max int32  `json:"max,omitempty"`
    MaxPercentage int32 `json:"maxPercentage,omitempty"`
}

(The name is just an example, not recommendation.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @ingvagabund here, the numberOfNodes configuration relates a lot with the newly introduced maxNodesToProcess. To deprecate the first one and to make both part of the same structure (with min and max properties) would be better here.

I don't think we should be going as far as have "percentages" here except if there is a clear usecase.


// limit the number of nodes processed each execution if `MaxNodesToProcess` is set
if h.args.MaxNodesToProcess > 0 && len(lowNodes) > h.args.MaxNodesToProcess {
lowNodes = lowNodes[:h.args.MaxNodesToProcess]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The index needs to rotate so all nodes are eventually processed.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👀

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at possible ways to do this I see a couple ways to solve it, but would love feedback.

  1. Add a new field to the struct to remove state from within the function/file
  2. Add a new constant within the file to track index
  3. Open to any other opinions/suggestions!
type HighNodeUtilization struct {
	handle              frameworktypes.Handle
	args                *HighNodeUtilizationArgs
	podFilter           func(pod *v1.Pod) bool
	criteria            []any
	resourceNames       []v1.ResourceName
	highThresholds      api.ResourceThresholds
	usageClient         usageClient
	lastProcessedIndex  int
}

...

if h.args.MaxNodesToProcess > 0 && len(lowNodes) > h.args.MaxNodesToProcess {
	start := h.lastProcessedIndex % len(lowNodes)
	rotated := append(lowNodes[start:], lowNodes[:start]...)

	lowNodes = rotated[:h.args.MaxNodesToProcess]
	h.lastProcessedIndex = (h.lastProcessedIndex + h.args.MaxNodesToProcess) % len(lowNodes)
}
... 
// rotateStartIdx is a variable to track the rotation index of MaxNodesToProcess
var rotateStartIdx int

...

if h.args.MaxNodesToProcess > 0 && len(lowNodes) > h.args.MaxNodesToProcess {
    start := rotateStartIdx % len(lowNodes)
    end := start + h.args.MaxNodesToProcess

    var selected []NodeInfo
    if end <= len(lowNodes) {
        selected = lowNodes[start:end]
    } else {
        selected = append(lowNodes[start:], lowNodes[:end%len(lowNodes)]...)
    }
    lowNodes = selected

    rotateStartIdx = (rotateStartIdx + h.args.MaxNodesToProcess) % len(lowNodes)
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An observation: it's not guaranteed the list of nodes will always be correctly ordered. I.e. even the rotated index may not select the nodes uniformly. If this leads to a sub-efficient behavior we can see if sort the nodes based on their names helps.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Option 1. looks good. So the second version of the code changes in the comment. Worth moving the code under a function so it can be unit test separately.

Also, MaxNodesToProcess needs to be at least as NumberOfNodes is if set.

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jun 13, 2025
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 25, 2025
@k8s-ci-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.


// limit the number of nodes processed each execution if `MaxNodesToProcess` is set
if h.args.MaxNodesToProcess > 0 && len(lowNodes) > h.args.MaxNodesToProcess {
lowNodes = lowNodes[:h.args.MaxNodesToProcess]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An observation: it's not guaranteed the list of nodes will always be correctly ordered. I.e. even the rotated index may not select the nodes uniformly. If this leads to a sub-efficient behavior we can see if sort the nodes based on their names helps.


// limit the number of nodes processed each execution if `MaxNodesToProcess` is set
if h.args.MaxNodesToProcess > 0 && len(lowNodes) > h.args.MaxNodesToProcess {
lowNodes = lowNodes[:h.args.MaxNodesToProcess]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Option 1. looks good. So the second version of the code changes in the comment. Worth moving the code under a function so it can be unit test separately.

Also, MaxNodesToProcess needs to be at least as NumberOfNodes is if set.

This parameter can be configured to activate the strategy only when the number of under utilized nodes
is above the configured value. This could be helpful in large clusters where a few nodes could go
under utilized frequently or for a short period of time. By default, `numberOfNodes` is set to zero.
under utilized frequently or for a short period of time. By default, `numberOfNodes` is set to zero. The parameter `maxNodesToProcess` is used to limit how many nodes should be processed by the descheduler plugin on each execution.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current implementation for MaxNodesToProcess limits the number of underutilized nodes, not all the nodes. s/is used to limit how many nodes should be processed/is used to limit how many underutilized nodes should be processed/ to make this explicitly clear.

Can you also put the new sentence on a separate line? To limit the number of characters per line.

@ingvagabund
Copy link
Contributor

@jonahjon again apologies for the delay. I am currently short on resources to allocate more time for upstream reviews.

}

plugin, err := NewHighNodeUtilization(ctx, &HighNodeUtilizationArgs{
MaxNodesToProcess: 1,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not expect this to be necessary. Have you introduced this by mistake ?

evictedPods: []string{"p1", "p2", "p3", "p4"}, // Any one of these is valid
evictionModes: nil,
// We'll set MaxNodesToProcess in the plugin args below
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new test is broken. Also, please make "maxNodesToProcess" part of the test definition so each test can set its own value.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 5, 2025
@k8s-ci-robot
Copy link
Contributor

@jonahjon: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-descheduler-unit-test-master-master 4fcf6fb link true /test pull-descheduler-unit-test-master-master
pull-descheduler-test-e2e-k8s-master-1-34 4fcf6fb link true /test pull-descheduler-test-e2e-k8s-master-1-34

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants