-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Optimize drs plan generation #12014
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 4.20
Are you sure you want to change the base?
Optimize drs plan generation #12014
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests.
Additional details and impacted files@@ Coverage Diff @@
## 4.20 #12014 +/- ##
=============================================
- Coverage 16.19% 4.00% -12.19%
=============================================
Files 5657 402 -5255
Lines 498467 32665 -465802
Branches 60491 5808 -54683
=============================================
- Hits 80702 1309 -79393
+ Misses 408783 31203 -377580
+ Partials 8982 153 -8829
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR optimizes the Cluster DRS (Distributed Resource Scheduler) planning algorithm to improve performance through caching and reduced computational complexity. The main changes include:
- Pre-computing VM technical compatibility checks (storage, hypervisor, UEFI) once per DRS cycle instead of per iteration
- Optimizing imbalance calculations by reusing pre-calculated metrics arrays and updating only affected hosts
- Refactoring
listHostsForMigrationOfVMinto smaller, reusable methods (getTechnicallyCompatibleHosts,applyAffinityConstraints,getCapableSuitableHosts) - Adding comprehensive test coverage for the new migration host listing functionality
Reviewed Changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| server/src/main/java/org/apache/cloudstack/cluster/ClusterDrsServiceImpl.java | Implements caching of technical compatibility checks and affinity group processing optimization; adds pre-calculation of metrics for faster iteration |
| server/src/main/java/com/cloud/server/ManagementServerImpl.java | Refactors VM migration host listing into smaller helper methods; extracts technical compatibility and affinity constraint logic |
| api/src/main/java/org/apache/cloudstack/cluster/ClusterDrsAlgorithm.java | Adds optimized imbalance calculation methods using array-based operations; introduces reusable stateless calculators |
| plugins/drs/cluster/balanced/src/main/java/org/apache/cloudstack/cluster/Balanced.java | Updates to use optimized getMetrics signature with pre-calculated data |
| plugins/drs/cluster/condensed/src/main/java/org/apache/cloudstack/cluster/Condensed.java | Updates to use optimized getMetrics signature with pre-calculated data |
| server/src/test/java/org/apache/cloudstack/cluster/ClusterDrsServiceImplTest.java | Adds tests for new DRS planning edge cases and updates existing tests for method signature changes |
| server/src/test/java/com/cloud/server/ManagementServerImplTest.java | Adds comprehensive test coverage for VM migration host listing functionality |
| api/src/main/java/com/cloud/server/ManagementService.java | Adds public API methods for technical compatibility and affinity constraints |
| plugins/drs/cluster/balanced/src/test/java/org/apache/cloudstack/cluster/BalancedTest.java | Updates tests for new getMetrics signature |
| plugins/drs/cluster/condensed/src/test/java/org/apache/cloudstack/cluster/CondensedTest.java | Updates tests for new getMetrics signature |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
server/src/main/java/org/apache/cloudstack/cluster/ClusterDrsServiceImpl.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/cloudstack/cluster/ClusterDrsServiceImpl.java
Show resolved
Hide resolved
plugins/drs/cluster/condensed/src/main/java/org/apache/cloudstack/cluster/Condensed.java
Outdated
Show resolved
Hide resolved
7aa5194 to
6d21c2c
Compare
|
@blueorangutan package |
|
@vishesh92 a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
server/src/test/java/org/apache/cloudstack/cluster/ClusterDrsServiceImplTest.java
Outdated
Show resolved
Hide resolved
plugins/drs/cluster/condensed/src/main/java/org/apache/cloudstack/cluster/Condensed.java
Show resolved
Hide resolved
|
Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 15676 |
6d21c2c to
912acbb
Compare
|
@blueorangutan package |
|
@vishesh92 a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 10 out of 10 changed files in this pull request and generated 8 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
api/src/main/java/org/apache/cloudstack/cluster/ClusterDrsAlgorithm.java
Outdated
Show resolved
Hide resolved
plugins/drs/cluster/balanced/src/main/java/org/apache/cloudstack/cluster/Balanced.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/cloudstack/cluster/ClusterDrsServiceImpl.java
Show resolved
Hide resolved
api/src/main/java/org/apache/cloudstack/cluster/ClusterDrsAlgorithm.java
Show resolved
Hide resolved
api/src/main/java/org/apache/cloudstack/cluster/ClusterDrsAlgorithm.java
Show resolved
Hide resolved
|
Packaging result [SF]: ✖️ el8 ✖️ el9 ✖️ debian ✖️ suse15. SL-JID 15678 |
912acbb to
182a02d
Compare
server/src/main/java/org/apache/cloudstack/cluster/ClusterDrsServiceImpl.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/cloudstack/cluster/ClusterDrsServiceImpl.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/cloudstack/cluster/ClusterDrsServiceImpl.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/cloudstack/cluster/ClusterDrsServiceImpl.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/cloudstack/cluster/ClusterDrsServiceImpl.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/cloudstack/cluster/ClusterDrsServiceImpl.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/cloudstack/cluster/ClusterDrsServiceImpl.java
Outdated
Show resolved
Hide resolved
|
@vishesh92 clgtm although some more modularisation can be done. Do we have more performance data? I read a factor 7 improvement but only in a very specific situation. Any chance we can add to that? |
I don't have more performance data. I tested this locally with Simulator. |
DaanHoogland
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clgtm
Description
This PR fixes #11978
Tested in a simulator based env with 15 hosts & 150 VMs for 5 iterations. Before the changes, it took around 2.5 minutes and after the changes, it takes around 25 seconds.
To optimize, I added tests for findHostForMigration and then refactored the method. Then used the parts of the refactored method for DRS plan generation. Then I refactored how calculation was being done for a migration and made it less compute intensive to make it faster.
Types of changes
Feature/Enhancement Scale or Bug Severity
Feature/Enhancement Scale
Bug Severity
Screenshots (if appropriate):
How Has This Been Tested?
How did you try to break this feature and the system with this change?