Implementing maxAvailableComponentSets for resource models #6912

mszacillo · 2025-11-05T22:35:22Z

What type of PR is this?
/kind feature

What this PR does / why we need it:

This PR implements the maxAvailableComponentsSets within general estimator based on resource models. The packing algorithm is first-fit decreasing, which is a greedy approach to bin packing. To make the calculation of the maxSets more efficient we use binary-search for checking the max set number, and then attempting to fit all the number of replicas onto the existing nodes.

Which issue(s) this PR fixes:
Part of #6734

Does this PR introduce a user-facing change?:

`karmada-scheduler`: Implements `getMaximumSetsBasedOnResourceModels` in general estimator.

gemini-code-assist · 2025-11-05T22:35:40Z

Summary of Changes

Hello @mszacillo, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the cluster resource estimation capabilities by implementing a sophisticated algorithm to determine the maximum number of complete component sets a cluster can support. By leveraging resource models and a greedy bin-packing strategy combined with binary search, it provides a more precise and performant method for capacity planning and workload placement.

Highlights

New Feature Implementation: Introduces maxAvailableComponentsSets within the general estimator, enabling the calculation of the maximum number of full component sets based on cluster resource models.
Algorithm: Utilizes a first-fit decreasing greedy algorithm for bin packing to determine component placement on model-grade nodes.
Efficiency: Incorporates a binary search approach to efficiently find the maximum feasible number of component sets.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request implements a bin-packing based algorithm to calculate the maximum available component sets based on resource models. The implementation uses a first-fit decreasing approach, with a binary search to find the optimal number of sets. The code is well-structured and includes comprehensive tests. My feedback includes a few suggestions to improve the clarity of test comments and a potential improvement to the packing heuristic to better handle resources with different scales.

pkg/estimator/client/general.go

pkg/estimator/client/general_test.go

codecov-commenter · 2025-11-05T22:52:40Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 80.91603% with 25 lines in your changes missing coverage. Please review.
✅ Project coverage is 46.61%. Comparing base (90c05cd) to head (6b9dba0).
⚠️ Report is 15 commits behind head on master.

Files with missing lines	Patch %	Lines
pkg/estimator/client/general.go	80.91%	13 Missing and 12 partials ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #6912      +/-   ##
==========================================
+ Coverage   46.46%   46.61%   +0.15%     
==========================================
  Files         698      698              
  Lines       47824    48003     +179     
==========================================
+ Hits        22220    22376     +156     
- Misses      23933    23942       +9     
- Partials     1671     1685      +14

Flag	Coverage Δ
unittests	`46.61% <80.91%> (+0.15%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

pkg/estimator/client/general.go

zhzhuang-zju · 2025-11-07T09:53:12Z

Thanks
/assign

zhzhuang-zju · 2025-11-11T03:04:58Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a significant feature by implementing maxAvailableComponentSets based on resource models. The implementation uses a binary search over the number of sets and a first-fit decreasing algorithm for placement, which is a solid approach. The code is well-structured and includes comprehensive tests for the new logic. My review includes a few suggestions for improvement, mainly around maintainability and optimizing the packing heuristic.

pkg/estimator/client/general.go

RainbowMango

/assign

zhzhuang-zju · 2025-11-14T06:59:55Z

/gemini review

gemini-code-assist

Code Review

This pull request implements getMaximumSetsBasedOnResourceModels to estimate the maximum number of component sets that can be scheduled on a cluster, based on its resource models. The implementation uses a binary search over the number of sets, and for each guess, it checks feasibility using a first-fit-decreasing bin packing algorithm. This is a solid approach, and the PR includes a comprehensive set of unit tests. I've found a critical issue in the buildModelNodes helper function, where it incorrectly assumes that resource models and allocatable modelings slices are sorted by grade, which can lead to incorrect calculations or panics. I've also found a minor issue with a confusing comment in one of the test cases. I've provided suggestions to fix both.

pkg/estimator/client/general.go

pkg/estimator/client/general_test.go

zhzhuang-zju · 2025-11-14T09:21:13Z

Thanks @mszacillo — this is a tough task!
From a functional perspective, it is ready. But I have some concerns about the implementation:

Does the "binary search" here actually improve efficiency?
Since each iteration fully evaluates whether all upperBound component sets can be accommodated by the nodes, the total computation isn’t reduced—in fact, it might even increase due to repeated full validations.
Given the high similarity between this logic and the noderesource plugin, can we unify their calculation logic and workflow?
The only difference appears to be the data representation: the noderesource plugin uses schedulerframework.NodeInfo to represent node resources, whereas this implementation uses resource models. If we convert these resource models into schedulerframework.NodeInfo, could we fully reuse the noderesource plugin’s code?

Admittedly, the noderesource plugin might not yet be complete—for example, it may lack component sorting. But unifying the logic would make future enhancements easier to apply consistently across both components. WDYT?
Also, let’s ask @RainbowMango and @seanlaii for their opinion.

mszacillo · 2025-11-14T14:02:31Z

Thanks for the review @zhzhuang-zju !

Does the "binary search" here actually improve efficiency?

Yes, binary search helps here. I implemented this method with the understanding that we are looking for an exact upperBound of possible component sets - rather than an approximation which would have been a lot simpler.

We're looking for a max number of component sets between 0 and upperBound, with the knowledge that:

If S sets are feasible, then all sets s <= S are also feasible
If S sets are not feasible, then all sets s >= S also also not feasible

We could naively loop through all possible set numbers [0, upperBound] and do the feasibility check, but given the rules listed above, binary search fits in quite well. And O(logN) < O(N).

Since each iteration fully evaluates whether all upperBound component sets can be accommodated by the nodes, the total computation isn’t reduced—in fact, it might even increase due to repeated full validations.

Not sure I fully follow. By full validations you are referring to the feasibility fit check right? Even though each iteration fully validates feasibility, binary search reduces how many times the check is done.

If we're discussing the feasibility check time complexity (assuming k = number of components), then we are sorting each time O(klogk), and then doing a greedy packing, which results in O(k * n * d), where n is nodes and d is the resources. Which given that number of components and resources is generally small, the complexity of this is generally going to depend on the number of nodes (larger clusters taking longer to check).

That said! I'm realizing that we can move the sorting to be outside of the feasibility check, since the score we assign is dependent on the cluster. So it only needs to be computed once, and we can instead just set the replica count to match the # of sets we are running the feasibility check for. That would improve efficiency.

Given the high similarity between this logic and the noderesource plugin, can we unify their calculation logic and workflow?

I'm happy to unify calculation logic to make the code cleaner, but to me this sounds like a follow-up task. I could possibly even pick this up.

seanlaii · 2025-11-14T17:12:11Z

/assign

zhzhuang-zju · 2025-11-17T01:37:32Z

Not sure I fully follow. By full validations you are referring to the feasibility fit check right? Even though each iteration fully validates feasibility, binary search reduces how many times the check is done.

Binary search reduces the number of feasibility fit checks. However, is it possible to obtain the final result in a single feasibility fit check—for example, by assigning component sets one by one until we can no longer assign a complete component set?

I'm happy to unify calculation logic to make the code cleaner, but to me this sounds like a follow-up task. I could possibly even pick this up.

Yes, you're absolutely right—this can be addressed as a follow-up task.

mszacillo · 2025-11-18T04:28:50Z

Binary search reduces the number of feasibility fit checks. However, is it possible to obtain the final result in a single feasibility fit check—for example, by assigning component sets one by one until we can no longer assign a complete component set?

Ah, I understand now. This was something I had considered early on before implementing, but the reason why I decided against it was because this breaks one of the assumptions of FFD, which requires the items to be globally sorted in descending order. That way you pack the largest items first and can fit in the smaller easier to pack items in the left over spaces. If you go set by set, this would be more of a First-Fit algorithm, which can lead to suboptimal packing.

Generally the number of bins used by each algorithm is:

FF uses <= 1.7 * Optimal
FFD uses <= 11/9 * Optimal (which is a much tighter bound)

But if I'm being honest, we're working with relatively low dimensionality here (most cases will be 2D with only CPU + Memory defined). I spent some time trying to think of a case in which the per-set packing would result in a suboptimal scenario compared to FFD and couldn't. :) And the suboptimal packing really only starts to become a problem with higher dimension vector packing, where you'll see resource fragmentation.

TL;DR - I was optimizing for a more correct estimate rather than efficiency. If we think this algorithm is too slow in practice perhaps we can change it to a per-set packing? What do you think?

zhzhuang-zju

@mszacillo thanks, others LGTM

pkg/estimator/client/general.go

RainbowMango · 2025-11-22T03:56:16Z

@mszacillo can you help rebase this PR? As we introduced some E2E tests regarding multiple components scheduling, so that we can see the test results from here.

mszacillo · 2025-11-24T01:50:03Z

can you help rebase this PR? As we introduced some E2E tests regarding multiple components scheduling, so that we can see the test results from here.

Of course! Rebased and pushed the changes.

RainbowMango · 2025-11-24T02:25:58Z

Thanks, but the unit tests is failing:

=== Failed
=== FAIL: pkg/estimator/client TestGetMaximumSetsBasedOnResourceModels/Multi-component_JM/TM_across_grades_(CPU+Mem) (0.00s)
    general_test.go:556: 
        	Error Trace:	/home/runner/work/karmada/karmada/pkg/estimator/client/general_test.go:556
        	Error:      	Not equal: 
        	            	expected: 3
        	            	actual  : 2
        	Test:       	TestGetMaximumSetsBasedOnResourceModels/Multi-component_JM/TM_across_grades_(CPU+Mem)
    --- FAIL: TestGetMaximumSetsBasedOnResourceModels/Multi-component_JM/TM_across_grades_(CPU+Mem) (0.00s)

=== FAIL: pkg/estimator/client TestGetMaximumSetsBasedOnResourceModels (0.00s)

mszacillo · 2025-11-24T02:29:30Z

Yes I am investigating this currently. Didn't want to comment until I root caused it, but seems like there is some nondeterminism after I changed the algorithm to use a per-set packing. I wonder if I can make the test less strict.

mszacillo · 2025-11-24T04:07:32Z

@RainbowMango The recent changes to buildModelNodes made the order of the nodes nondeterministic. I'll fix this and update the branch.

mszacillo · 2025-11-24T15:13:02Z

@RainbowMango I added a sort for the grades to make sure they are ordered when constructing the nodes. This could be optimized, but do you know if there is validation for grades users can define? Do they have to start from 0 and increment by 1? Or can users define random numbered grades if they wanted - for example {2, 6, 19, 22}?

RainbowMango · 2025-11-25T02:01:55Z

The minimum amount of resources from first grade should start with 0, and the maximum amount of resources from the last grade should end with math.MaxInt64—no restriction on the increment between grades.

karmada/pkg/apis/cluster/validation/validation.go

Lines 198 to 231 in 309c677

    
           // ValidateClusterResourceModels validates cluster's resource models. 
        
           func ValidateClusterResourceModels(fldPath *field.Path, models []api.ResourceModel) *field.Error { 
        
           	for i, resourceModel := range models { 
        
           		if i != 0 && resourceModel.Grade == models[i-1].Grade { 
        
           			return field.Invalid(fldPath, models, "The grade of each models should not be the same") 
        
           		} 
        
           		if i != 0 && len(models[i-1].Ranges) != len(resourceModel.Ranges) { 
        
           			return field.Invalid(fldPath, models, "The number of resource types should be the same") 
        
           		} 
        
           		for j, resourceModelRange := range resourceModel.Ranges { 
        
           			if resourceModelRange.Max.Cmp(resourceModelRange.Min) <= 0 { 
        
           				return field.Invalid(fldPath, models, "The max value of each resource must be greater than the min value") 
        
           			} 
        
           			if i == 0 { 
        
           				if !resourceModelRange.Min.IsZero() { 
        
           					return field.Invalid(fldPath, models, "The min value of each resource in the first model should be 0") 
        
           				} 
        
           			} else if models[i-1].Ranges[j].Name != resourceModelRange.Name { 
        
           				return field.Invalid(fldPath, models, "The resource types of each models should be the same") 
        
           			} else if models[i-1].Ranges[j].Max != resourceModelRange.Min { 
        
           				return field.Invalid(fldPath, models, "Model intervals for resources must be contiguous and non-overlapping") 
        
           			} 
        
           			if i == len(models)-1 { 
        
           				if resourceModelRange.Max.Value() != math.MaxInt64 { 
        
           					return field.Invalid(fldPath, models, "The max value of each resource in the last model should be MaxInt64") 
        
           				} 
        
           			} 
        
           		} 
        
           	} 
        
           	return nil 
        
           }

RainbowMango

/lgtm
A small suggestion regarding the maps.Clone, but no big deal.

@zhzhuang-zju Do you have any further comments?

pkg/estimator/client/general.go

Signed-off-by: mszacillo <[email protected]>

zhzhuang-zju · 2025-11-26T02:25:12Z

Thanks
/lgtm

RainbowMango

/approve

karmada-bot · 2025-11-26T04:08:49Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: RainbowMango

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~pkg/estimator/OWNERS~~ [RainbowMango]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

karmada-bot added the kind/feature Categorizes issue or PR as related to a new feature. label Nov 5, 2025

karmada-bot requested review from Garrybest and XiShanYongYe-Chang November 5, 2025 22:35

karmada-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Nov 5, 2025

gemini-code-assist bot reviewed Nov 5, 2025

View reviewed changes

pkg/estimator/client/general.go Outdated Show resolved Hide resolved

pkg/estimator/client/general_test.go Outdated Show resolved Hide resolved

pkg/estimator/client/general_test.go Outdated Show resolved Hide resolved

zhzhuang-zju reviewed Nov 7, 2025

View reviewed changes

pkg/estimator/client/general.go Outdated Show resolved Hide resolved

karmada-bot assigned zhzhuang-zju Nov 7, 2025

mszacillo force-pushed the resource-models branch 2 times, most recently from 3371695 to 8308268 Compare November 7, 2025 22:00

gemini-code-assist bot reviewed Nov 11, 2025

View reviewed changes

pkg/estimator/client/general.go Outdated Show resolved Hide resolved

pkg/estimator/client/general.go Outdated Show resolved Hide resolved

pkg/estimator/client/general.go Outdated Show resolved Hide resolved

zhzhuang-zju reviewed Nov 11, 2025

View reviewed changes

pkg/estimator/client/general.go Outdated Show resolved Hide resolved

zhzhuang-zju reviewed Nov 11, 2025

View reviewed changes

pkg/estimator/client/general.go Outdated Show resolved Hide resolved

mszacillo force-pushed the resource-models branch from 8308268 to 0608885 Compare November 14, 2025 05:01

karmada-bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Nov 14, 2025

mszacillo force-pushed the resource-models branch from 0608885 to 67b0f24 Compare November 14, 2025 05:08

RainbowMango reviewed Nov 14, 2025

View reviewed changes

karmada-bot assigned RainbowMango Nov 14, 2025

gemini-code-assist bot reviewed Nov 14, 2025

View reviewed changes

pkg/estimator/client/general.go Show resolved Hide resolved

pkg/estimator/client/general_test.go Outdated Show resolved Hide resolved

karmada-bot assigned seanlaii Nov 14, 2025

mszacillo force-pushed the resource-models branch from 67b0f24 to 7f46e0b Compare November 21, 2025 03:43

zhzhuang-zju reviewed Nov 21, 2025

View reviewed changes

pkg/estimator/client/general.go Outdated Show resolved Hide resolved

pkg/estimator/client/general.go Show resolved Hide resolved

zhzhuang-zju reviewed Nov 21, 2025

View reviewed changes

pkg/estimator/client/general.go Outdated Show resolved Hide resolved

mszacillo force-pushed the resource-models branch 2 times, most recently from c76320a to 77bf038 Compare November 21, 2025 20:00

RainbowMango reviewed Nov 22, 2025

View reviewed changes

pkg/estimator/client/general.go Show resolved Hide resolved

RainbowMango mentioned this pull request Nov 22, 2025

[Umbrella] Multi-components workload scheduling - phase II #6734

Closed

22 tasks

mszacillo force-pushed the resource-models branch from 77bf038 to cdb62c2 Compare November 24, 2025 01:49

mszacillo force-pushed the resource-models branch from cdb62c2 to 336f08f Compare November 24, 2025 03:40

mszacillo force-pushed the resource-models branch from 336f08f to f51af2d Compare November 24, 2025 15:09

mszacillo force-pushed the resource-models branch from f51af2d to 8d541e8 Compare November 24, 2025 15:41

RainbowMango added this to the v1.16 milestone Nov 25, 2025

RainbowMango reviewed Nov 25, 2025

View reviewed changes

pkg/estimator/client/general.go Outdated Show resolved Hide resolved

karmada-bot added the lgtm Indicates that a PR is ready to be merged. label Nov 25, 2025

Implementing maxAvailableComponentSets for resource models

6b9dba0

Signed-off-by: mszacillo <[email protected]>

mszacillo force-pushed the resource-models branch from 8d541e8 to 6b9dba0 Compare November 25, 2025 22:27

karmada-bot removed the lgtm Indicates that a PR is ready to be merged. label Nov 25, 2025

karmada-bot added the lgtm Indicates that a PR is ready to be merged. label Nov 26, 2025

RainbowMango approved these changes Nov 26, 2025

View reviewed changes

karmada-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 26, 2025

karmada-bot merged commit 25cf101 into karmada-io:master Nov 26, 2025
24 checks passed

Implementing maxAvailableComponentSets for resource models #6912

Implementing maxAvailableComponentSets for resource models #6912

Conversation

mszacillo commented Nov 5, 2025 • edited by RainbowMango Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot commented Nov 5, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov-commenter commented Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

zhzhuang-zju commented Nov 7, 2025

Uh oh!

zhzhuang-zju commented Nov 11, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

RainbowMango left a comment

Choose a reason for hiding this comment

Uh oh!

zhzhuang-zju commented Nov 14, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

zhzhuang-zju commented Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mszacillo commented Nov 14, 2025

Uh oh!

seanlaii commented Nov 14, 2025

Uh oh!

zhzhuang-zju commented Nov 17, 2025

Uh oh!

mszacillo commented Nov 18, 2025

Uh oh!

zhzhuang-zju left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

RainbowMango commented Nov 22, 2025

Uh oh!

mszacillo commented Nov 24, 2025

Uh oh!

RainbowMango commented Nov 24, 2025

Uh oh!

mszacillo commented Nov 24, 2025

Uh oh!

mszacillo commented Nov 24, 2025

Uh oh!

mszacillo commented Nov 24, 2025

Uh oh!

RainbowMango commented Nov 25, 2025

Uh oh!

RainbowMango left a comment

mszacillo commented Nov 5, 2025 •

edited by RainbowMango

Loading

codecov-commenter commented Nov 5, 2025 •

edited

Loading

zhzhuang-zju commented Nov 14, 2025 •

edited

Loading