Skip to content

[API] implement chained routing support for flexible algorithm compos…#2099

Open
paranoidRick wants to merge 7 commits into
vllm-project:mainfrom
paranoidRick:chain_router_hl
Open

[API] implement chained routing support for flexible algorithm compos…#2099
paranoidRick wants to merge 7 commits into
vllm-project:mainfrom
paranoidRick:chain_router_hl

Conversation

@paranoidRick
Copy link
Copy Markdown
Contributor

@paranoidRick paranoidRick commented Apr 11, 2026

Pull Request Title

[Feature] Implement chained routing for flexible algorithm composition

Pull Request Description

Feature Overview

This PR implements chained routing functionality for AIBrix, allowing users to specify multiple routing algorithms in sequence via the routing-strategy header. This enables more flexible and precise routing decisions by combining different optimization objectives.

Problem Solved

Current AIBrix only supports a single routing algorithm per request, which limits users' ability to balance multiple optimization goals such as:

  • Load balancing vs. gpu cache
  • Low latency vs. resource utilization
  • Session affinity vs. fallback strategies

Key Implementation Details

1. Core Components

  • ChainedRouter: New router implementation that applies multiple algorithms sequentially
  • CandidatePods Mechanism: Algorithms can share and narrow down candidate pod lists
  • Strategy Parsing: Support for comma-separated algorithm lists in request headers

2. Technical Implementation

  1. Data Structure Extension: Added CandidatePods field to RoutingContext for algorithm collaboration
  2. Algorithm Adaptation: Enhanced least-gpu-cache and least-utilization to support multiple candidates
  3. Short-circuit Optimization: Immediate return when only one candidate pod remains
  4. Fault Tolerance: Skip invalid or failing algorithms gracefully

3. Usage Example

# Chain least-gpu-cache and least-utilization
curl -H "routing-strategy: least-gpu-cache,least-utilization" \
     -d '{"model": "llama-70b", "prompt": "Hello"}' \
     http://gateway-url/v1/completions

This implementation provides a solid foundation for future routing extensions, as other routing algorithms only need to assign to ctx.Candidates when multiple optimal candidates are identified.

Flowchart

+----------------+     +-------------------+     +-------------------+
|                |     |                   |     |                   |
|  Incoming      |     |  ChainedRouter    |     |  Individual       |
|  Request       +---->+                   +---->+  Routers          |
|  +------------+|     |  1. Start with    |     |  - least-gpu-cache  |
|  | routing-   ||     |     all pods      |     |  - least-utilization |
|  | strategy:  ||     |  2. Apply         |     |  - random         |
|  | algo1,algo2||     |     algorithm1    |     |  - ...            |
+--+------------++     |  3. Apply         |     |                   |
   |            |      |     algorithm2    |     |                   |
   |            |      |  4. Return        |     |                   |
   |            +------+     final pod     |     +-------------------+
   |                       |
   |                       v
   |                  +----------------+
   |                  |                |
   |                  |  Request       |
   |                  |  Forwarded     |
   |                  |  to Selected   |
   +------------------>|  Pod           |
                     |                |
                     +----------------+

Test Coverage

  • Added comprehensive unit tests in chained_test.go
  • Tested all edge cases: empty algorithm list, single algorithm, multiple algorithms, invalid algorithms
  • Verified integration with existing routing algorithms
  • All existing tests continue to pass

Related Issues

Resolves: #2098

…ition

Signed-off-by: yangyouchuan <1184540833@qq.com>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a chained routing mechanism that allows multiple routing algorithms to be applied in sequence to filter candidate pods. It includes the implementation of the chained router, updates to existing algorithms (least GPU cache and least utility) to support candidate propagation, and changes to the gateway to parse comma-separated routing strategies. Several issues were identified in the feedback: a critical bug where the local routing variable is not updated, potentially breaking the routing feature; a logic error in handling mixed valid and invalid algorithms; and a memory leak in the chained router due to missing context cleanup and field propagation.

Comment thread pkg/plugins/gateway/gateway_req_body.go
Comment thread pkg/plugins/gateway/gateway_req_body.go Outdated
Comment thread pkg/plugins/gateway/algorithms/chained.go Outdated
Signed-off-by: yangyouchuan <1184540833@qq.com>
Signed-off-by: yangyouchuan <1184540833@qq.com>
Signed-off-by: yangyouchuan <1184540833@qq.com>
@paranoidRick
Copy link
Copy Markdown
Contributor Author

@Jeffwan @varungup90 Could you help me code review this? Thanks a lot. 😊

@Jeffwan
Copy link
Copy Markdown
Collaborator

Jeffwan commented Apr 12, 2026

this is great! I will take a look today

@paranoidRick
Copy link
Copy Markdown
Contributor Author

paranoidRick commented Apr 13, 2026

If we load the following using a config file, multiple items in the routingStrategy(multiple valid values verified) will be automatically set to the chained routing algorithm. This enables compatibility and reusability of configuration files.

{
  "profiles": {
    "default": {
      "routingStrategy": "least-request,least-kv-cache,random",
      "promptLenBucketMinLength": 0,
      "promptLenBucketMaxLength": 4096
    },
    "pd": {
      "routingStrategy": "pd",
      "promptLenBucketMinLength": 0,
      "promptLenBucketMaxLength": 2048
    },
    "low-latency": {
      "routingStrategy": "least-latency",
      "promptLenBucketMinLength": 0,
      "promptLenBucketMaxLength": 2048
    }
  }
}

@varungup90
Copy link
Copy Markdown
Collaborator

Here’s your text cleaned up and formatted for a GitHub PR comment (with headings, code blocks, and clearer structure):


🚨 Critical Bug: CandidatePods Pre-initialization Breaks Narrowing

File: pkg/plugins/gateway/algorithms/chained.goapplyAlgorithm()

routingCtx.CandidatePods = candidatePods  // ← pre-set here
...
_, err = router.Route(routingCtx, podList)
...
if len(routingCtx.CandidatePods) > 0 {    // ← always true!
    return routingCtx.CandidatePods, nil
}
// This block is unreachable:
selectedPod := routingCtx.TargetPod()

routingCtx.CandidatePods is set to a non-empty slice before calling Route.
After routing, len(routingCtx.CandidatePods) > 0 is always true, even for routers that never modify it (e.g., random, least-kv-cache, least-request, least-latency).

👉 Result:

  • TargetPod() fallback is dead code
  • Chaining (e.g., least-request,least-kv-cache) does not narrow results
  • Earlier algorithms are effectively ignored

✅ Suggested Fix

routingCtx.CandidatePods = nil  // don't pre-set
...
_, err = router.Route(routingCtx, podList)
...
if len(routingCtx.CandidatePods) > 0 {
    return routingCtx.CandidatePods, nil
}

// Now reachable
selectedPod := routingCtx.TargetPod()

⚠️ Design Issue: RouterChained is User-Selectable

File: pkg/plugins/gateway/algorithms/chained.go

const RouterChained types.RoutingAlgorithm = "chained"

func init() {
    RegisterProvider(RouterChained, ChainedRouterProviderFunc)
}
  • "chained" passes validation and can be sent via headers
  • routing-strategy: chained creates a chained router with no sub-algorithms
  • Silently falls back to random

👉 This should be internal-only, not user-facing.

✅ Recommendation

  • Reject "chained" in Validate() (gateway_req_body.go)
  • Treat it strictly as an internal orchestration mode

❗ Inconsistency: CandidatePods Support is Incomplete

Only these files were updated:

  • least_gpu_cache.go
  • least_util.go

But the PR claims:

"Enhanced least-kv-cache and least-request to support multiple candidates"

However:

  • least_kv_cache.go ❌ not updated
  • least_request.go ❌ not updated

👉 Result:

  • Core use case (least-request,least-kv-cache) does not work

✅ Fix Options

  • Update both routers to set CandidatePods, or
  • Correct the PR description

⚠️ Side Effect: CandidatePods Set in Standalone Routing

Files: least_gpu_cache.go, least_util.go

if len(candidatePods) > 0 {
    ctx.CandidatePods = candidatePods
    targetPod = candidatePods[rand.Intn(len(candidatePods))]
}

This now runs even in non-chained mode.

👉 Problem:

  • RoutingContext retains stale CandidatePods
  • Relies on reset() cleanup (pool-level concern)

✅ Recommendation

  • Only set CandidatePods in chained mode, or
  • Ensure applyAlgorithm() clears it (preferred with nil approach)

❌ Logic Gap in Error Handling

File: pkg/plugins/gateway/gateway_req_body.go

if len(validAlgorithms) < 1 {
    klog.Warningf("...")
    if len(invalidAlgorithms) > 0 {
        return buildErrorResponse(...)
    }
}
// Falls through silently

Problem Case

Header:

routing-strategy: ,
  • parseChainedAlgorithms → empty slice
  • validAlgorithms = 0
  • invalidAlgorithms = 0

👉 Result:

  • No error returned ❌
  • Falls through silently
  • routingAlgorithm remains unset / default

✅ Fix

if len(validAlgorithms) < 1 {
    if len(invalidAlgorithms) > 0 {
        return buildErrorResponse(...)
    }
    return buildErrorResponse(...) // explicit failure
}

🧪 Test Issues

File: pkg/plugins/gateway/algorithms/chained_test.go

1. Shared Context Across Tests

  • RoutingContext reused across table tests
  • State leaks via SetTargetPod(...)

✅ Fix: create a new context per test case


2. Invalid Algorithm Test is Ineffective

TestChainedRouterWithInvalidAlgorithm
  • Uses only 1 pod
  • Short-circuits before applyAlgorithm()

👉 Invalid algorithm is never exercised

✅ Fix: use 2+ pods


3. Missing Core Behavior Test

No test verifies:

  • Chaining actually narrows candidate sets

✅ Add test for:

least-request,least-kv-cache

→ ensures output is smaller than individual algorithms


🔍 Minor Observations

  • NewChainedRouter takes algorithms directly, but provider reads from ctx.Algorithms → inconsistent construction paths
  • getPodNames() allocates on every verbose log call → consider lazy evaluation
  • Nested "chained" in chain (e.g., random,chained) silently degrades to random

✅ Summary

This PR introduces the structure for chained routing but currently has:

  • ❌ A critical bug preventing narrowing
  • ❌ Incomplete algorithm support
  • ❌ Silent fallback behaviors
  • ❌ Gaps in validation and tests

👉 Fixing CandidatePods handling is the highest priority, as it blocks the core functionality entirely.

@paranoidRick
Copy link
Copy Markdown
Contributor Author

paranoidRick commented Apr 13, 2026

这是经过清理和格式化后的文本,适合作为 GitHub PR 评论(包含标题、代码块和更清晰的结构):

🚨 严重漏洞:候选 Pod 预初始化导致范围缩小失败

文件: pkg/plugins/gateway/algorithms/chained.go -applyAlgorithm()

routingCtx.CandidatePods = candidatePods  // ← pre-set here
...
_, err = router.Route(routingCtx, podList)
...
if len(routingCtx.CandidatePods) > 0 {    // ← always true!
    return routingCtx.CandidatePods, nil
}
// This block is unreachable:
selectedPod := routingCtx.TargetPod()

routingCtx.CandidatePods_在_调用之前,被设置为一个非空切片Route。 路由之后,始终len(routingCtx.CandidatePods) > 0,即使对于从未修改过它的路由器(例如random,,,,)。least-kv-cache``least-request``least-latency

👉 结果:

  • TargetPod()回退机制是死代码
  • 链式方法(例如,least-request,least-kv-cache不会缩小结果范围。
  • 早期的算法实际上被忽略了。

✅ 建议的修复方案

routingCtx.CandidatePods = nil  // don't pre-set
...
_, err = router.Route(routingCtx, podList)
...
if len(routingCtx.CandidatePods) > 0 {
    return routingCtx.CandidatePods, nil
}

// Now reachable
selectedPod := routingCtx.TargetPod()

⚠️设计问题:RouterChained是否可由用户选择

文件: pkg/plugins/gateway/algorithms/chained.go

const RouterChained types.RoutingAlgorithm = "chained"

func init() {
    RegisterProvider(RouterChained, ChainedRouterProviderFunc)
}
  • "chained"通过验证,可以通过标头发送。
  • routing-strategy: chained创建一个没有子算法的链式路由器
  • 悄无声息地退回到random

👉 这应该仅供内部使用,不应对外公开。

✅ 推荐

  • 拒绝()"chained"Validate()``gateway_req_body.go
  • 严格将其视为一种内部编排模式

❗ 不一致之处:CandidatePods 支持尚不完整

仅更新了以下文件:

  • least_gpu_cache.go
  • least_util.go

但公关部门声称:

“增强最小键值缓存和最小请求以支持多个候选值”

然而:

  • least_kv_cache.go❌ 未更新
  • least_request.go❌ 未更新

👉 结果:

  • 核心用例(least-request,least-kv-cache不起作用

✅ 修复选项

  • 更新两个路由器以进行设置CandidatePods
  • 更正 PR 描述

⚠️副作用:独立路由中的候选Pod设置

文件: least_gpu_cache.goleast_util.go

if len(candidatePods) > 0 {
    ctx.CandidatePods = candidatePods
    targetPod = candidatePods[rand.Intn(len(candidatePods))]
}

现在即使在非链式模式下也能运行。

👉 问题:

  • RoutingContext保留过时的 CandidatePods
  • 依赖于reset()清理(池级问题)

✅ 推荐

  • CandidatePods仅在链式模式下设置,
  • 确保applyAlgorithm()清除干净(最好采用这种nil方法)

❌ 错误处理中的逻辑缺陷

文件: pkg/plugins/gateway/gateway_req_body.go

if len(validAlgorithms) < 1 {
    klog.Warningf("...")
    if len(invalidAlgorithms) > 0 {
        return buildErrorResponse(...)
    }
}
// Falls through silently

问题案例

标题:

routing-strategy: ,
  • parseChainedAlgorithms→ 空切片
  • validAlgorithms= 0
  • invalidAlgorithms= 0

👉 结果:

  • 未返回错误❌
  • 悄无声息地坠落
  • routingAlgorithm仍为未设置/默认值

✅ 修复

if len(validAlgorithms) < 1 {
    if len(invalidAlgorithms) > 0 {
        return buildErrorResponse(...)
    }
    return buildErrorResponse(...) // explicit failure
}

🧪 测试问题

文件: pkg/plugins/gateway/algorithms/chained_test.go

1. 跨测试的共享上下文

  • RoutingContext在表格测试中重复使用
  • 国家信息泄露SetTargetPod(...)

✅ 修复方案:为每个测试用例创建一个新的上下文

2. 无效算法测试是无效的

TestChainedRouterWithInvalidAlgorithm
  • 仅需1个胶囊
  • 短路前applyAlgorithm()

👉 无效算法永远不会被执行

✅ 解决方法:使用2 个或以上烟弹

3. 缺少核心行为测试

没有测试可以验证:

  • 链式法实际上缩小了候选集。

✅ 添加测试:

least-request,least-kv-cache

→ 确保输出小于单个算法的输出。

🔍一些小观察

  • NewChainedRouter直接采用算法,但提供者从ctx.Algorithms不一致的构造路径读取数据
  • getPodNames()每次详细日志调用都会分配内存 → 考虑惰性求值
  • 嵌套"chained"在链中(例如,random,chained)会悄然降级为random

✅ 摘要

此 PR 引入了链式路由的结构,但目前包含:

  • ❌ 一个严重错误导致无法缩小范围
  • ❌ 算法支持不完整
  • ❌ 静默回退行为
  • ❌ 验证和测试方面的不足

👉 修复CandidatePods处理问题是重中之重,因为它完全阻碍了核心功能。

Here’s your text cleaned up and formatted for a GitHub PR comment (with headings, code blocks, and clearer structure):

🚨 Critical Bug: CandidatePods Pre-initialization Breaks Narrowing

File: pkg/plugins/gateway/algorithms/chained.goapplyAlgorithm()

routingCtx.CandidatePods = candidatePods  // ← pre-set here
...
_, err = router.Route(routingCtx, podList)
...
if len(routingCtx.CandidatePods) > 0 {    // ← always true!
    return routingCtx.CandidatePods, nil
}
// This block is unreachable:
selectedPod := routingCtx.TargetPod()

routingCtx.CandidatePods is set to a non-empty slice before calling Route. After routing, len(routingCtx.CandidatePods) > 0 is always true, even for routers that never modify it (e.g., random, least-kv-cache, least-request, least-latency).

👉 Result:

  • TargetPod() fallback is dead code
  • Chaining (e.g., least-request,least-kv-cache) does not narrow results
  • Earlier algorithms are effectively ignored

✅ Suggested Fix

routingCtx.CandidatePods = nil  // don't pre-set
...
_, err = router.Route(routingCtx, podList)
...
if len(routingCtx.CandidatePods) > 0 {
    return routingCtx.CandidatePods, nil
}

// Now reachable
selectedPod := routingCtx.TargetPod()

⚠️ Design Issue: RouterChained is User-Selectable

File: pkg/plugins/gateway/algorithms/chained.go

const RouterChained types.RoutingAlgorithm = "chained"

func init() {
    RegisterProvider(RouterChained, ChainedRouterProviderFunc)
}
  • "chained" passes validation and can be sent via headers
  • routing-strategy: chained creates a chained router with no sub-algorithms
  • Silently falls back to random

👉 This should be internal-only, not user-facing.

✅ Recommendation

  • Reject "chained" in Validate() (gateway_req_body.go)
  • Treat it strictly as an internal orchestration mode

❗ Inconsistency: CandidatePods Support is Incomplete

Only these files were updated:

  • least_gpu_cache.go
  • least_util.go

But the PR claims:

"Enhanced least-kv-cache and least-request to support multiple candidates"

However:

  • least_kv_cache.go ❌ not updated
  • least_request.go ❌ not updated

👉 Result:

  • Core use case (least-request,least-kv-cache) does not work

✅ Fix Options

  • Update both routers to set CandidatePods, or
  • Correct the PR description

⚠️ Side Effect: CandidatePods Set in Standalone Routing

Files: least_gpu_cache.go, least_util.go

if len(candidatePods) > 0 {
    ctx.CandidatePods = candidatePods
    targetPod = candidatePods[rand.Intn(len(candidatePods))]
}

This now runs even in non-chained mode.

👉 Problem:

  • RoutingContext retains stale CandidatePods
  • Relies on reset() cleanup (pool-level concern)

✅ Recommendation

  • Only set CandidatePods in chained mode, or
  • Ensure applyAlgorithm() clears it (preferred with nil approach)

❌ Logic Gap in Error Handling

File: pkg/plugins/gateway/gateway_req_body.go

if len(validAlgorithms) < 1 {
    klog.Warningf("...")
    if len(invalidAlgorithms) > 0 {
        return buildErrorResponse(...)
    }
}
// Falls through silently

Problem Case

Header:

routing-strategy: ,
  • parseChainedAlgorithms → empty slice
  • validAlgorithms = 0
  • invalidAlgorithms = 0

👉 Result:

  • No error returned ❌
  • Falls through silently
  • routingAlgorithm remains unset / default

✅ Fix

if len(validAlgorithms) < 1 {
    if len(invalidAlgorithms) > 0 {
        return buildErrorResponse(...)
    }
    return buildErrorResponse(...) // explicit failure
}

🧪 Test Issues

File: pkg/plugins/gateway/algorithms/chained_test.go

1. Shared Context Across Tests

  • RoutingContext reused across table tests
  • State leaks via SetTargetPod(...)

✅ Fix: create a new context per test case

2. Invalid Algorithm Test is Ineffective

TestChainedRouterWithInvalidAlgorithm
  • Uses only 1 pod
  • Short-circuits before applyAlgorithm()

👉 Invalid algorithm is never exercised

✅ Fix: use 2+ pods

3. Missing Core Behavior Test

No test verifies:

  • Chaining actually narrows candidate sets

✅ Add test for:

least-request,least-kv-cache

→ ensures output is smaller than individual algorithms

🔍 Minor Observations

  • NewChainedRouter takes algorithms directly, but provider reads from ctx.Algorithms → inconsistent construction paths
  • getPodNames() allocates on every verbose log call → consider lazy evaluation
  • Nested "chained" in chain (e.g., random,chained) silently degrades to random

✅ Summary

This PR introduces the structure for chained routing but currently has:

  • ❌ A critical bug preventing narrowing
  • ❌ Incomplete algorithm support
  • ❌ Silent fallback behaviors
  • ❌ Gaps in validation and tests

👉 Fixing CandidatePods handling is the highest priority, as it blocks the core functionality entirely.

Thank you for your patient code review and the very helpful feedback. I have updated the code accordingly:

  1. Fixed the scope reduction failure caused by pre-initialization of candidate Pods:
    Ensured routingCtx.CandidatePods remains uninitialized by explicitly setting it to nil to avoid unintended side effects.

  2. Added validation for the chained router to prevent external invocation:
    Implemented a check to reject RouterChained as it is strictly an internal-only algorithm.

if alg == routing.RouterChained { // Reject "chained" as it's an internal algorithm
    invalidAlgorithms = append(invalidAlgorithms, alg)
    continue
}
  1. Updated the PR description:
    Clarified that the current adaptation is limited to the least-gpu-cache and least-utilization algorithms.

  2. Refined the candidate list logic:
    Ensured that the candidate Pod list is only populated when the Chained routing algorithm is active.

if ctx.Algorithm == RouterChained {
    ctx.CandidatePods = candidatePods
}
  1. Resolved the potential uninitialized algorithm issue:
if len(validAlgorithms) < 1 {

  klog.Warningf(...)
  
  if len(invalidAlgorithms) > 0 {
  
  return buildErrorResponse(...)
  
  }
  
  // Return error for empty valid algorithms
  
  return buildErrorResponse(envoyTypePb.StatusCode_BadRequest, "no valid routing strategies provided", "", "", HeaderErrorRouting, "true"), model, routingCtx, stream, term

}

  1. test case:
  1. create a new context per test case
  2. use 2+ pods

Signed-off-by: yangyouchuan <1184540833@qq.com>
@paranoidRick
Copy link
Copy Markdown
Contributor Author

@varungup90 hi, could please check it again?

if len(candidatePods) > 0 {
targetPod = candidatePods[rand.Intn(len(candidatePods))]
// set candidatePods only if algorithm is chained
if ctx.Algorithm == RouterChained {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you help me verify if I understand it correctly?
From here ctx.Algorithm need to be RouterChained to make the following work,
but the child context’s Algorithm should be the child algorithm (not RouterChained),
that check is always false during a chained invocation,
so CandidatePods never gets written and chaining effectively does not narrow candidates.

Copy link
Copy Markdown
Contributor Author

@paranoidRick paranoidRick Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DaveLi8086 yep, your understand is correctly! child ctx will make CandidatePods never gets written.
I have changed it! remove if ctx.Algorithm == RouterChained, always set CandidatePods

Now, Single Routing Algorithm Execution Flow:
┌───────────────────────────────────────────────┐
│ Request enters gateway │
└───────────────┬───────────────────────────────┘


┌───────────────────────────────────────────────┐
│ Create RoutingContext │
│ - Algorithm: Specified single routing algorithm (e.g., least-gpu-cache) │
│ - CandidatePods: nil │
└───────────────┬───────────────────────────────┘


┌───────────────────────────────────────────────┐
│ Select corresponding routing algorithm │
│ implementation (e.g., LeastGpuCacheRouter.Route()) │
└───────────────┬───────────────────────────────┘


┌───────────────────────────────────────────────┐
│ Execute routing algorithm logic │
│ 1. Iterate through all available Pods, calculate GPU cache usage │
│ 2. Find candidate Pod set with lowest GPU cache usage │
│ 3. Randomly select a Pod from the candidate set as target Pod │
│ 4. Always set CandidatePods = candidate Pod set │
└───────────────┬───────────────────────────────┘


┌───────────────────────────────────────────────┐
│ Set TargetPod and return result │
│ - ctx.SetTargetPod(selected Pod) │
│ - Return target Pod address (e.g., "1.1.1.1:8000") │
└───────────────┬───────────────────────────────┘


┌───────────────────────────────────────────────┐
│ Clean up RoutingContext │
│ - Call Reset() method │
│ - CandidatePods is reset to nil │
└───────────────────────────────────────────────┘

And chained router like this:

┌───────────────────────────────────────────────┐
│ Request enters gateway │
└───────────────┬───────────────────────────────┘


┌───────────────────────────────────────────────┐
│ Create RoutingContext │
│ - Algorithm: "chained" │
│ - Algorithms: ["least-gpu-cache", "least-utilization"] │
│ - CandidatePods: nil │
└───────────────┬───────────────────────────────┘


┌───────────────────────────────────────────────┐
│ Select ChainedRouter implementation │
│ (ChainedRouter.Route()) │
└───────────────┬───────────────────────────────┘


┌───────────────────────────────────────────────┐
│ Initialize candidate Pod set │
│ candidatePods = all available Pods │
└───────────────┬───────────────────────────────┘


┌───────────────────────────────────────────────┐
│ Execute chained algorithms in loop │
│ Round 1: least-gpu-cache algorithm │
└───────────────┬───────────────────────────────┘


┌───────────────────────────────────────────────┐
│ Create child RoutingContext │
│ - Algorithm: "least-gpu-cache" │
│ - CandidatePods: nil │
└───────────────┬───────────────────────────────┘


┌───────────────────────────────────────────────┐
│ Execute least-gpu-cache algorithm │
│ 1. Iterate through current candidate Pods, calculate GPU cache usage │
│ 2. Find candidate Pod set with lowest GPU cache usage │
│ 3. Randomly select a Pod from the candidate set as target Pod │
│ 4. Always set CandidatePods = candidate Pod set │
└───────────────┬───────────────────────────────┘


┌───────────────────────────────────────────────┐
│ Return result to ChainedRouter │
│ - Check if child context.CandidatePods > 0 │
│ - Yes → Return these candidate Pods to next algorithm │
│ - No → Return selected single Pod │
└───────────────┬───────────────────────────────┘


┌───────────────────────────────────────────────┐
│ Update candidate Pod set │
│ candidatePods = candidate Pods returned by previous algorithm │
└───────────────┬───────────────────────────────┘


┌───────────────────────────────────────────────┐
│ Execute chained algorithms in loop │
│ Round 2: least-utilization algorithm │
└───────────────┬───────────────────────────────┘


┌───────────────────────────────────────────────┐
│ Create child RoutingContext │
│ - Algorithm: "least-utilization" │
│ - CandidatePods: nil │
└───────────────┬───────────────────────────────┘


┌───────────────────────────────────────────────┐
│ Execute least-utilization algorithm │
│ 1. Iterate through current candidate Pods, calculate resource utilization │
│ 2. Find candidate Pod set with lowest resource utilization │
│ 3. Randomly select a Pod from the candidate set as target Pod │
│ 4. Always set CandidatePods = candidate Pod set │
└───────────────┬───────────────────────────────┘


┌───────────────────────────────────────────────┐
│ Return result to ChainedRouter │
│ - Check if child context.CandidatePods > 0 │
│ - Yes → Return these candidate Pods to next algorithm (if any) │
│ - No → Return selected single Pod │
└───────────────┬───────────────────────────────┘


┌───────────────────────────────────────────────┐
│ Process final candidate Pod set │
│ - If only 1 candidate Pod → Use directly │
│ - If multiple candidate Pods → Randomly select one │
└───────────────┬───────────────────────────────┘


┌───────────────────────────────────────────────┐
│ Set TargetPod and return result │
│ - ctx.SetTargetPod(final selected Pod) │
│ - Return target Pod address (e.g., "1.1.1.1:8000") │
└───────────────┬───────────────────────────────┘


┌───────────────────────────────────────────────┐
│ Clean up RoutingContext │
│ - Call Reset() method │
│ - CandidatePods is reset to nil │
└───────────────────────────────────────────────┘

Signed-off-by: yangyouchuan <1184540833@qq.com>
} else if len(validAlgorithms) == 1 {
routingAlgorithm = validAlgorithms[0]
}
routingCtx.Algorithm = routingAlgorithm
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can remove the routing.Ctx.Algorithm.

If user provides single routing-strategy, then routingctx.Algorithms will hold one algorithm. Code will be simpler.


// parseChainedAlgorithms parses a comma-separated list of algorithms and returns a slice of RoutingAlgorithm.
// If any algorithm is not registered, it returns an empty slice.
func parseChainedAlgorithms(strategy string) []types.RoutingAlgorithm {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a TODO, for user to provide weights.

For example, least-request:40, prefix-cache:60

return ctx.TargetAddress(), nil
}

// Apply the current algorithm to narrow down candidates
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we do not want to take this approach. In each iteration, it keeps all the pods and returns in ranked list.

Then second strategy takes that input and re-ranks them.

One example, least-request, prefix-cache

A: requests: 10, prefix-cache: 80%
B: requests: 10, prefix-cache: 90%
C: requests: 8, prefix-cache: 50%

Output of least-request will be
C, A, B

then prefix-cache will take the input
C, B, A

finally, apply the weights, by default equal weights and selects between C or B pod.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @DaveLi8086 is also working on same problem

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@varungup90 @DaveLi8086 Can we carry user-defined weights when parsing routing policies? This would allow the routing algorithm to re-rank candidate Pods rather than strictly filtering them out during the iterative process.

type xxx struct { // least-request:40, prefix-cache:60
   Algorithm RoutingAlgorithm
   Weight      string
}

Maybe new approach: chained algorithm don't narrowing ready pod lists. For applyAlgorithm func router algorithm only set CandidatePods for updating these pods' score rather than narrowing candidatePods.

@varungup90
Copy link
Copy Markdown
Collaborator

This feature is addressed by this PR: #2124

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat:implement chained routing support for flexible algorithm composition

4 participants