Skip to content

Conversation

@vishesh92
Copy link
Member

@vishesh92 vishesh92 commented Nov 7, 2025

Description

This PR fixes #11978

Tested in a simulator based env with 15 hosts & 150 VMs for 5 iterations. Before the changes, it took around 2.5 minutes and after the changes, it takes around 25 seconds.

To optimize, I added tests for findHostForMigration and then refactored the method. Then used the parts of the refactored method for DRS plan generation. Then I refactored how calculation was being done for a migration and made it less compute intensive to make it faster.

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)
  • Build/CI
  • Test (unit or integration test code)

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • Major
  • Minor

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Screenshots (if appropriate):

How Has This Been Tested?

How did you try to break this feature and the system with this change?

@codecov
Copy link

codecov bot commented Nov 7, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 4.00%. Comparing base (d53b6db) to head (56ab2c0).
⚠️ Report is 4 commits behind head on 4.20.

❗ There is a different number of reports uploaded between BASE (d53b6db) and HEAD (56ab2c0). Click for more details.

HEAD has 1 upload less than BASE
Flag BASE (d53b6db) HEAD (56ab2c0)
unittests 1 0
Additional details and impacted files
@@              Coverage Diff              @@
##               4.20   #12014       +/-   ##
=============================================
- Coverage     16.19%    4.00%   -12.19%     
=============================================
  Files          5657      402     -5255     
  Lines        498467    32665   -465802     
  Branches      60491     5808    -54683     
=============================================
- Hits          80702     1309    -79393     
+ Misses       408783    31203   -377580     
+ Partials       8982      153     -8829     
Flag Coverage Δ
uitests 4.00% <ø> (ø)
unittests ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR optimizes the Cluster DRS (Distributed Resource Scheduler) planning algorithm to improve performance through caching and reduced computational complexity. The main changes include:

  • Pre-computing VM technical compatibility checks (storage, hypervisor, UEFI) once per DRS cycle instead of per iteration
  • Optimizing imbalance calculations by reusing pre-calculated metrics arrays and updating only affected hosts
  • Refactoring listHostsForMigrationOfVM into smaller, reusable methods (getTechnicallyCompatibleHosts, applyAffinityConstraints, getCapableSuitableHosts)
  • Adding comprehensive test coverage for the new migration host listing functionality

Reviewed Changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
server/src/main/java/org/apache/cloudstack/cluster/ClusterDrsServiceImpl.java Implements caching of technical compatibility checks and affinity group processing optimization; adds pre-calculation of metrics for faster iteration
server/src/main/java/com/cloud/server/ManagementServerImpl.java Refactors VM migration host listing into smaller helper methods; extracts technical compatibility and affinity constraint logic
api/src/main/java/org/apache/cloudstack/cluster/ClusterDrsAlgorithm.java Adds optimized imbalance calculation methods using array-based operations; introduces reusable stateless calculators
plugins/drs/cluster/balanced/src/main/java/org/apache/cloudstack/cluster/Balanced.java Updates to use optimized getMetrics signature with pre-calculated data
plugins/drs/cluster/condensed/src/main/java/org/apache/cloudstack/cluster/Condensed.java Updates to use optimized getMetrics signature with pre-calculated data
server/src/test/java/org/apache/cloudstack/cluster/ClusterDrsServiceImplTest.java Adds tests for new DRS planning edge cases and updates existing tests for method signature changes
server/src/test/java/com/cloud/server/ManagementServerImplTest.java Adds comprehensive test coverage for VM migration host listing functionality
api/src/main/java/com/cloud/server/ManagementService.java Adds public API methods for technical compatibility and affinity constraints
plugins/drs/cluster/balanced/src/test/java/org/apache/cloudstack/cluster/BalancedTest.java Updates tests for new getMetrics signature
plugins/drs/cluster/condensed/src/test/java/org/apache/cloudstack/cluster/CondensedTest.java Updates tests for new getMetrics signature

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@vishesh92 vishesh92 force-pushed the optimize-drs-plan-generation branch 2 times, most recently from 7aa5194 to 6d21c2c Compare November 7, 2025 10:15
@vishesh92 vishesh92 requested a review from Copilot November 7, 2025 10:15
@vishesh92
Copy link
Member Author

@blueorangutan package

@blueorangutan
Copy link

@vishesh92 a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 15676

@vishesh92 vishesh92 force-pushed the optimize-drs-plan-generation branch from 6d21c2c to 912acbb Compare November 7, 2025 13:01
@vishesh92 vishesh92 linked an issue Nov 7, 2025 that may be closed by this pull request
@vishesh92 vishesh92 marked this pull request as ready for review November 7, 2025 13:10
@vishesh92 vishesh92 requested a review from Copilot November 7, 2025 13:10
@vishesh92
Copy link
Member Author

@blueorangutan package

@blueorangutan
Copy link

@vishesh92 a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 8 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@blueorangutan
Copy link

Packaging result [SF]: ✖️ el8 ✖️ el9 ✖️ debian ✖️ suse15. SL-JID 15678

@vishesh92 vishesh92 force-pushed the optimize-drs-plan-generation branch from 912acbb to 182a02d Compare November 7, 2025 20:41
@DaanHoogland
Copy link
Contributor

@vishesh92 clgtm although some more modularisation can be done.

Do we have more performance data? I read a factor 7 improvement but only in a very specific situation. Any chance we can add to that?

@vishesh92
Copy link
Member Author

@vishesh92 clgtm although some more modularisation can be done.

Do we have more performance data? I read a factor 7 improvement but only in a very specific situation. Any chance we can add to that?

I don't have more performance data. I tested this locally with Simulator.

Copy link
Contributor

@DaanHoogland DaanHoogland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

KVM DRS optimizations

3 participants