[API] implement chained routing support for flexible algorithm compos…#2099
[API] implement chained routing support for flexible algorithm compos…#2099paranoidRick wants to merge 7 commits into
Conversation
…ition Signed-off-by: yangyouchuan <1184540833@qq.com>
There was a problem hiding this comment.
Code Review
This pull request introduces a chained routing mechanism that allows multiple routing algorithms to be applied in sequence to filter candidate pods. It includes the implementation of the chained router, updates to existing algorithms (least GPU cache and least utility) to support candidate propagation, and changes to the gateway to parse comma-separated routing strategies. Several issues were identified in the feedback: a critical bug where the local routing variable is not updated, potentially breaking the routing feature; a logic error in handling mixed valid and invalid algorithms; and a memory leak in the chained router due to missing context cleanup and field propagation.
|
@Jeffwan @varungup90 Could you help me code review this? Thanks a lot. 😊 |
|
this is great! I will take a look today |
|
If we load the following using a config file, multiple items in the routingStrategy(multiple valid values verified) will be automatically set to the chained routing algorithm. This enables compatibility and reusability of configuration files. {
"profiles": {
"default": {
"routingStrategy": "least-request,least-kv-cache,random",
"promptLenBucketMinLength": 0,
"promptLenBucketMaxLength": 4096
},
"pd": {
"routingStrategy": "pd",
"promptLenBucketMinLength": 0,
"promptLenBucketMaxLength": 2048
},
"low-latency": {
"routingStrategy": "least-latency",
"promptLenBucketMinLength": 0,
"promptLenBucketMaxLength": 2048
}
}
} |
|
Here’s your text cleaned up and formatted for a GitHub PR comment (with headings, code blocks, and clearer structure): 🚨 Critical Bug: CandidatePods Pre-initialization Breaks NarrowingFile: routingCtx.CandidatePods = candidatePods // ← pre-set here
...
_, err = router.Route(routingCtx, podList)
...
if len(routingCtx.CandidatePods) > 0 { // ← always true!
return routingCtx.CandidatePods, nil
}
// This block is unreachable:
selectedPod := routingCtx.TargetPod()
👉 Result:
✅ Suggested FixroutingCtx.CandidatePods = nil // don't pre-set
...
_, err = router.Route(routingCtx, podList)
...
if len(routingCtx.CandidatePods) > 0 {
return routingCtx.CandidatePods, nil
}
// Now reachable
selectedPod := routingCtx.TargetPod()
|
Thank you for your patient code review and the very helpful feedback. I have updated the code accordingly:
|
Signed-off-by: yangyouchuan <1184540833@qq.com>
3562839 to
864e637
Compare
|
@varungup90 hi, could please check it again? |
| if len(candidatePods) > 0 { | ||
| targetPod = candidatePods[rand.Intn(len(candidatePods))] | ||
| // set candidatePods only if algorithm is chained | ||
| if ctx.Algorithm == RouterChained { |
There was a problem hiding this comment.
Could you help me verify if I understand it correctly?
From here ctx.Algorithm need to be RouterChained to make the following work,
but the child context’s Algorithm should be the child algorithm (not RouterChained),
that check is always false during a chained invocation,
so CandidatePods never gets written and chaining effectively does not narrow candidates.
There was a problem hiding this comment.
@DaveLi8086 yep, your understand is correctly! child ctx will make CandidatePods never gets written.
I have changed it! remove if ctx.Algorithm == RouterChained, always set CandidatePods
Now, Single Routing Algorithm Execution Flow:
┌───────────────────────────────────────────────┐
│ Request enters gateway │
└───────────────┬───────────────────────────────┘
│
▼
┌───────────────────────────────────────────────┐
│ Create RoutingContext │
│ - Algorithm: Specified single routing algorithm (e.g., least-gpu-cache) │
│ - CandidatePods: nil │
└───────────────┬───────────────────────────────┘
│
▼
┌───────────────────────────────────────────────┐
│ Select corresponding routing algorithm │
│ implementation (e.g., LeastGpuCacheRouter.Route()) │
└───────────────┬───────────────────────────────┘
│
▼
┌───────────────────────────────────────────────┐
│ Execute routing algorithm logic │
│ 1. Iterate through all available Pods, calculate GPU cache usage │
│ 2. Find candidate Pod set with lowest GPU cache usage │
│ 3. Randomly select a Pod from the candidate set as target Pod │
│ 4. Always set CandidatePods = candidate Pod set │
└───────────────┬───────────────────────────────┘
│
▼
┌───────────────────────────────────────────────┐
│ Set TargetPod and return result │
│ - ctx.SetTargetPod(selected Pod) │
│ - Return target Pod address (e.g., "1.1.1.1:8000") │
└───────────────┬───────────────────────────────┘
│
▼
┌───────────────────────────────────────────────┐
│ Clean up RoutingContext │
│ - Call Reset() method │
│ - CandidatePods is reset to nil │
└───────────────────────────────────────────────┘
And chained router like this:
┌───────────────────────────────────────────────┐
│ Request enters gateway │
└───────────────┬───────────────────────────────┘
│
▼
┌───────────────────────────────────────────────┐
│ Create RoutingContext │
│ - Algorithm: "chained" │
│ - Algorithms: ["least-gpu-cache", "least-utilization"] │
│ - CandidatePods: nil │
└───────────────┬───────────────────────────────┘
│
▼
┌───────────────────────────────────────────────┐
│ Select ChainedRouter implementation │
│ (ChainedRouter.Route()) │
└───────────────┬───────────────────────────────┘
│
▼
┌───────────────────────────────────────────────┐
│ Initialize candidate Pod set │
│ candidatePods = all available Pods │
└───────────────┬───────────────────────────────┘
│
▼
┌───────────────────────────────────────────────┐
│ Execute chained algorithms in loop │
│ Round 1: least-gpu-cache algorithm │
└───────────────┬───────────────────────────────┘
│
▼
┌───────────────────────────────────────────────┐
│ Create child RoutingContext │
│ - Algorithm: "least-gpu-cache" │
│ - CandidatePods: nil │
└───────────────┬───────────────────────────────┘
│
▼
┌───────────────────────────────────────────────┐
│ Execute least-gpu-cache algorithm │
│ 1. Iterate through current candidate Pods, calculate GPU cache usage │
│ 2. Find candidate Pod set with lowest GPU cache usage │
│ 3. Randomly select a Pod from the candidate set as target Pod │
│ 4. Always set CandidatePods = candidate Pod set │
└───────────────┬───────────────────────────────┘
│
▼
┌───────────────────────────────────────────────┐
│ Return result to ChainedRouter │
│ - Check if child context.CandidatePods > 0 │
│ - Yes → Return these candidate Pods to next algorithm │
│ - No → Return selected single Pod │
└───────────────┬───────────────────────────────┘
│
▼
┌───────────────────────────────────────────────┐
│ Update candidate Pod set │
│ candidatePods = candidate Pods returned by previous algorithm │
└───────────────┬───────────────────────────────┘
│
▼
┌───────────────────────────────────────────────┐
│ Execute chained algorithms in loop │
│ Round 2: least-utilization algorithm │
└───────────────┬───────────────────────────────┘
│
▼
┌───────────────────────────────────────────────┐
│ Create child RoutingContext │
│ - Algorithm: "least-utilization" │
│ - CandidatePods: nil │
└───────────────┬───────────────────────────────┘
│
▼
┌───────────────────────────────────────────────┐
│ Execute least-utilization algorithm │
│ 1. Iterate through current candidate Pods, calculate resource utilization │
│ 2. Find candidate Pod set with lowest resource utilization │
│ 3. Randomly select a Pod from the candidate set as target Pod │
│ 4. Always set CandidatePods = candidate Pod set │
└───────────────┬───────────────────────────────┘
│
▼
┌───────────────────────────────────────────────┐
│ Return result to ChainedRouter │
│ - Check if child context.CandidatePods > 0 │
│ - Yes → Return these candidate Pods to next algorithm (if any) │
│ - No → Return selected single Pod │
└───────────────┬───────────────────────────────┘
│
▼
┌───────────────────────────────────────────────┐
│ Process final candidate Pod set │
│ - If only 1 candidate Pod → Use directly │
│ - If multiple candidate Pods → Randomly select one │
└───────────────┬───────────────────────────────┘
│
▼
┌───────────────────────────────────────────────┐
│ Set TargetPod and return result │
│ - ctx.SetTargetPod(final selected Pod) │
│ - Return target Pod address (e.g., "1.1.1.1:8000") │
└───────────────┬───────────────────────────────┘
│
▼
┌───────────────────────────────────────────────┐
│ Clean up RoutingContext │
│ - Call Reset() method │
│ - CandidatePods is reset to nil │
└───────────────────────────────────────────────┘
Signed-off-by: yangyouchuan <1184540833@qq.com>
2b82272 to
d931dc0
Compare
| } else if len(validAlgorithms) == 1 { | ||
| routingAlgorithm = validAlgorithms[0] | ||
| } | ||
| routingCtx.Algorithm = routingAlgorithm |
There was a problem hiding this comment.
you can remove the routing.Ctx.Algorithm.
If user provides single routing-strategy, then routingctx.Algorithms will hold one algorithm. Code will be simpler.
|
|
||
| // parseChainedAlgorithms parses a comma-separated list of algorithms and returns a slice of RoutingAlgorithm. | ||
| // If any algorithm is not registered, it returns an empty slice. | ||
| func parseChainedAlgorithms(strategy string) []types.RoutingAlgorithm { |
There was a problem hiding this comment.
Add a TODO, for user to provide weights.
For example, least-request:40, prefix-cache:60
| return ctx.TargetAddress(), nil | ||
| } | ||
|
|
||
| // Apply the current algorithm to narrow down candidates |
There was a problem hiding this comment.
we do not want to take this approach. In each iteration, it keeps all the pods and returns in ranked list.
Then second strategy takes that input and re-ranks them.
One example, least-request, prefix-cache
A: requests: 10, prefix-cache: 80%
B: requests: 10, prefix-cache: 90%
C: requests: 8, prefix-cache: 50%
Output of least-request will be
C, A, B
then prefix-cache will take the input
C, B, A
finally, apply the weights, by default equal weights and selects between C or B pod.
There was a problem hiding this comment.
cc @DaveLi8086 is also working on same problem
There was a problem hiding this comment.
@varungup90 @DaveLi8086 Can we carry user-defined weights when parsing routing policies? This would allow the routing algorithm to re-rank candidate Pods rather than strictly filtering them out during the iterative process.
type xxx struct { // least-request:40, prefix-cache:60
Algorithm RoutingAlgorithm
Weight string
}
Maybe new approach: chained algorithm don't narrowing ready pod lists. For applyAlgorithm func router algorithm only set CandidatePods for updating these pods' score rather than narrowing candidatePods.
|
This feature is addressed by this PR: #2124 |
Pull Request Title
[Feature] Implement chained routing for flexible algorithm composition
Pull Request Description
Feature Overview
This PR implements chained routing functionality for AIBrix, allowing users to specify multiple routing algorithms in sequence via the
routing-strategyheader. This enables more flexible and precise routing decisions by combining different optimization objectives.Problem Solved
Current AIBrix only supports a single routing algorithm per request, which limits users' ability to balance multiple optimization goals such as:
Key Implementation Details
1. Core Components
2. Technical Implementation
CandidatePodsfield toRoutingContextfor algorithm collaborationleast-gpu-cacheandleast-utilizationto support multiple candidates3. Usage Example
This implementation provides a solid foundation for future routing extensions, as other routing algorithms only need to assign to ctx.Candidates when multiple optimal candidates are identified.
Flowchart
Test Coverage
chained_test.goRelated Issues
Resolves: #2098