Skip to content

Conversation

@harryge00
Copy link
Contributor

@harryge00 harryge00 commented Nov 19, 2025

Why are these changes needed?

The current implementation of IsSatisfied in scale_expectation.go , will iterate through all pods in itemsCache. This is not necessary. When one pod fails to meet the expectation, we can return directly. fast-fail means return earlier once a pod's scaling result is not as expected.

Current code:

	for i := range items {
		rp := items[i].(*rayPod)
		pod := &corev1.Pod{}
		isPodSatisfied := false
		// check if pod is scaled correctly
                ...
		if isPodSatisfied {
			if err := r.itemsCache.Delete(items[i]); err != nil {
				panic(err)
			}
		} else {
			// Here we can return directly. No need to continue.
			// It is inefficient especially when
			// we have created lots of pods and few of them are synced from kube-apiserver
			isSatisfied = false
		}
	}

For example, we may create 10 pods, and expect to successfully get the 10 pods. If none of the pods is ready, no need to iterate through the 10 pods every time. Just find one pod is missing and then break.

I also refactored the code, reduced nested if-else conditions.

Related issue number

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(

@harryge00
Copy link
Contributor Author

harryge00 commented Nov 19, 2025

@Eikykun Since you implemented the first version of RayClusterScaleExpectation, could you view ?

@harryge00 harryge00 force-pushed the raycluster-scale-expectation-fail-fast branch 2 times, most recently from c5cedc5 to cf8f80e Compare November 20, 2025 08:14
@harryge00 harryge00 changed the title [RayCluster] fast-fail in RayClusterScaleExpectation's IsSatisfied [RayCluster] Improved the efficiency when checking rayclusters' expectations Nov 20, 2025
@harryge00 harryge00 force-pushed the raycluster-scale-expectation-fail-fast branch 4 times, most recently from 0aaa9f5 to e9d4859 Compare November 20, 2025 13:44
@harryge00 harryge00 force-pushed the raycluster-scale-expectation-fail-fast branch from e9d4859 to cc80b2e Compare November 20, 2025 13:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant