-
Notifications
You must be signed in to change notification settings - Fork 1k
Implementing maxAvailableComponentSets for resource models #6912
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary of ChangesHello @mszacillo, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the cluster resource estimation capabilities by implementing a sophisticated algorithm to determine the maximum number of complete component sets a cluster can support. By leveraging resource models and a greedy bin-packing strategy combined with binary search, it provides a more precise and performant method for capacity planning and workload placement. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request implements a bin-packing based algorithm to calculate the maximum available component sets based on resource models. The implementation uses a first-fit decreasing approach, with a binary search to find the optimal number of sets. The code is well-structured and includes comprehensive tests. My feedback includes a few suggestions to improve the clarity of test comments and a potential improvement to the packing heuristic to better handle resources with different scales.
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #6912 +/- ##
==========================================
+ Coverage 46.46% 46.61% +0.15%
==========================================
Files 698 698
Lines 47824 48003 +179
==========================================
+ Hits 22220 22376 +156
- Misses 23933 23942 +9
- Partials 1671 1685 +14
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Thanks |
3371695 to
8308268
Compare
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a significant feature by implementing maxAvailableComponentSets based on resource models. The implementation uses a binary search over the number of sets and a first-fit decreasing algorithm for placement, which is a solid approach. The code is well-structured and includes comprehensive tests for the new logic. My review includes a few suggestions for improvement, mainly around maintainability and optimizing the packing heuristic.
8308268 to
0608885
Compare
0608885 to
67b0f24
Compare
RainbowMango
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/assign
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request implements getMaximumSetsBasedOnResourceModels to estimate the maximum number of component sets that can be scheduled on a cluster, based on its resource models. The implementation uses a binary search over the number of sets, and for each guess, it checks feasibility using a first-fit-decreasing bin packing algorithm. This is a solid approach, and the PR includes a comprehensive set of unit tests. I've found a critical issue in the buildModelNodes helper function, where it incorrectly assumes that resource models and allocatable modelings slices are sorted by grade, which can lead to incorrect calculations or panics. I've also found a minor issue with a confusing comment in one of the test cases. I've provided suggestions to fix both.
|
Thanks @mszacillo — this is a tough task!
Admittedly, the |
|
Thanks for the review @zhzhuang-zju !
Yes, binary search helps here. I implemented this method with the understanding that we are looking for an exact upperBound of possible component sets - rather than an approximation which would have been a lot simpler. We're looking for a max number of component sets between 0 and upperBound, with the knowledge that:
We could naively loop through all possible set numbers [0, upperBound] and do the feasibility check, but given the rules listed above, binary search fits in quite well. And O(logN) < O(N).
Not sure I fully follow. By full validations you are referring to the feasibility fit check right? Even though each iteration fully validates feasibility, binary search reduces how many times the check is done. If we're discussing the feasibility check time complexity (assuming k = number of components), then we are sorting each time O(klogk), and then doing a greedy packing, which results in O(k * n * d), where n is nodes and d is the resources. Which given that number of components and resources is generally small, the complexity of this is generally going to depend on the number of nodes (larger clusters taking longer to check). That said! I'm realizing that we can move the sorting to be outside of the feasibility check, since the score we assign is dependent on the cluster. So it only needs to be computed once, and we can instead just set the replica count to match the # of sets we are running the feasibility check for. That would improve efficiency.
I'm happy to unify calculation logic to make the code cleaner, but to me this sounds like a follow-up task. I could possibly even pick this up. |
|
/assign |
Binary search reduces the number of feasibility fit checks. However, is it possible to obtain the final result in a single feasibility fit check—for example, by assigning component sets one by one until we can no longer assign a complete component set?
Yes, you're absolutely right—this can be addressed as a follow-up task. |
Ah, I understand now. This was something I had considered early on before implementing, but the reason why I decided against it was because this breaks one of the assumptions of FFD, which requires the items to be globally sorted in descending order. That way you pack the largest items first and can fit in the smaller easier to pack items in the left over spaces. If you go set by set, this would be more of a First-Fit algorithm, which can lead to suboptimal packing. Generally the number of bins used by each algorithm is: FF uses <= 1.7 * Optimal But if I'm being honest, we're working with relatively low dimensionality here (most cases will be 2D with only CPU + Memory defined). I spent some time trying to think of a case in which the per-set packing would result in a suboptimal scenario compared to FFD and couldn't. :) And the suboptimal packing really only starts to become a problem with higher dimension vector packing, where you'll see resource fragmentation.
|
67b0f24 to
7f46e0b
Compare
zhzhuang-zju
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mszacillo thanks, others LGTM
c76320a to
77bf038
Compare
|
@mszacillo can you help rebase this PR? As we introduced some E2E tests regarding multiple components scheduling, so that we can see the test results from here. |
77bf038 to
cdb62c2
Compare
Of course! Rebased and pushed the changes. |
|
Thanks, but the unit tests is failing: |
|
Yes I am investigating this currently. Didn't want to comment until I root caused it, but seems like there is some nondeterminism after I changed the algorithm to use a per-set packing. I wonder if I can make the test less strict. |
cdb62c2 to
336f08f
Compare
|
@RainbowMango The recent changes to |
336f08f to
f51af2d
Compare
|
@RainbowMango I added a sort for the grades to make sure they are ordered when constructing the nodes. This could be optimized, but do you know if there is validation for grades users can define? Do they have to start from 0 and increment by 1? Or can users define random numbered grades if they wanted - for example {2, 6, 19, 22}? |
f51af2d to
8d541e8
Compare
|
The minimum amount of resources from first grade should start with karmada/pkg/apis/cluster/validation/validation.go Lines 198 to 231 in 309c677
|
RainbowMango
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
A small suggestion regarding the maps.Clone, but no big deal.
@zhzhuang-zju Do you have any further comments?
Signed-off-by: mszacillo <[email protected]>
8d541e8 to
6b9dba0
Compare
|
Thanks |
RainbowMango
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: RainbowMango The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind feature
What this PR does / why we need it:
This PR implements the maxAvailableComponentsSets within general estimator based on resource models. The packing algorithm is first-fit decreasing, which is a greedy approach to bin packing. To make the calculation of the maxSets more efficient we use binary-search for checking the max set number, and then attempting to fit all the number of replicas onto the existing nodes.
Which issue(s) this PR fixes:
Part of #6734
Does this PR introduce a user-facing change?: