[Misc] Add brixbench benchmark module by whalepark · Pull Request #2165 · vllm-project/aibrix

whalepark · 2026-05-01T00:19:39Z

Pull Request Description

This PR imports the initial brixbench benchmark module into the aibrix repository.

Included in this import:

benchmark runner, resolver, deployer, driver, and observability packages under brixbench/
benchmark scenarios, deployment manifests, and benchmark configuration files
walkthrough documentation for the benchmark module
module/import path updates for the nested brixbench module inside aibrix

Additional cleanup included in this PR:

removed local-only and AI workflow metadata from the imported tree
updated the module path to github.com/vllm-project/aibrix/brixbench
removed bundled versioned release YAMLs from the repository
added on-demand release artifact download and local cache logic under .tmp/releases/<version>/
added resolver tests for release artifact download and cache reuse

Notes:

Version-based release artifact resolution now downloads from GitHub releases at runtime instead of relying on checked-in release YAML files.

May 21 2026

Addressed the latest review feedback and pushed follow-up commits.

Key updates:

Removed the hardcoded gateway fallback and now require either a detected endpoint or an explicit BENCHMARK_GATEWAY_ENDPOINT override.
Cleaned up the resolver schema to require provider explicitly, reject deprecated deployer, and preserve provider: null as the canonical plain vLLM baseline path.
Added focused regression tests for gateway endpoint resolution, explicit provider: null handling, deprecated deployer rejection, and missing-provider rejection.

Verification:

go test -v ./internal/resolver -count=1
go test -v ./benchmark -run 'Test(FallbackGatewayEndpoint(Missing|Override)|ResolveGatewayEndpoint(UsesDetectedEndpoint|UsesOverrideOnDetectionFailure|FailsWithoutDetectedEndpointOrOverride)|ConfiguredMetricExporter|ExportMetricsIfConfigured|FormatScenarioRunIDUsesUTC|NowInUTCUsesUTCLocation)$' -count=1

May 27 2026

Pushed follow-up updates for the latest review comments.

Changes:

Moved the benchmark walkthrough into the brixbench README entry point.
Added documentation for benchmark workload configs, especially vllmArgs such as request-rate, concurrency, num-prompts, dataset-name, goodput, and routing-specific options.
Removed the large generated custom AIBrix manifests from testdata and kept the benchmark flow on release-artifact/workspace Helm-based deployment paths.

Verification:

Confirmed the removed custom manifests are not referenced by active brixbench scenarios.
Re-ran YAML/scenario reference validation.
Re-ran the relevant Go tests.

gemini-code-assist

Code Review

This pull request introduces a benchmark suite for AIBrix, including a test runner, deployment logic for various platforms, and support for running vLLM benchmarks. The feedback highlights concerns regarding the use of fixed sleeps, the security and portability issues of shelling out to bash, the need for more robust file ignoring in .gitignore, and a more reliable method for extracting benchmark results from pods.

varungup90 · 2026-05-04T21:14:24Z

@whalepark can you sign the commit and address gemini comments?

⚠️ Review Feedback: Issues and Suggestions

Thanks for the comprehensive import of the brixbench module. While the scope is impressive, there are several areas—ranging from security leaks to reliability—that need to be addressed before this is merged.

🔴 Critical / Must Fix

Internal URL Leak:
In benchmark/runner_test.go, defaultGatewayEndpoint is set to http://aibrix-gateway.bytedance.net. This internal hostname is non-functional for external contributors and should be replaced with localhost or a configurable default.
No-op Observability Exporter:
In internal/observability/exporter.go, PrometheusPushExporter currently "simulates" exports by iterating over keys without doing anything. This silently drops metrics. Please implement the export logic or return ErrNotImplemented to avoid misleading users.

🟡 Significant

Shell Invocation Overload:
Throughout benchmark/preflight_stormservice.go and vllm.go, bash -lc is used to wrap kubectl and helm commands. The -l (login) flag introduces unnecessary environment variability.
- Recommendation: Use direct binary invocations via exec.CommandContext or, at minimum, drop the -l flag.
Legacy Field Ambiguity:
In internal/resolver/resolver.go, the presence of both Provider and Deployer (legacy alias) is confusing for an initial import. Please document the deprecation timeline or clean this up now to avoid technical debt.
Initialization Panics:
mustLoadPacificLocation() in benchmark/runner_support_test.go will panic if tzdata is missing (common in minimal CI containers).
- Fix: Use a lazy initializer or provide a fallback like time.FixedZone("PT", -8*3600).
Hardcoded Timezone in Logs:
formatScenarioRunID embeds "PT" in filenames. This is problematic for global teams. Please switch to UTC for unambiguous, reproducible log naming.

🔵 Minor / Improvements

URL Construction:
In internal/resolver/artifact_resolution.go, replace manual string joining with url.JoinPath (available since Go 1.19) for better slash normalization.
Modern Atomic Usage:
Since the project uses Go 1.22.5, prefer atomic.Int32 over atomic.AddInt32 in artifact_resolution_test.go for cleaner syntax.
Dependency Transparency:
brixbench/go.mod is quite sparse because it relies heavily on shelling out to kubectl. Consider explicitly documenting the requirement for a pre-configured K8s cluster in docs/walkthrough.md.
Module Wiring:
Ensure brixbench/ is added to the root go.work or CI pipeline so it doesn't bitrot immediately after merging.

whalepark · 2026-05-21T23:33:08Z

@whalepark can you sign the commit and address gemini comments?

⚠️ Review Feedback: Issues and Suggestions

Thanks for the comprehensive import of the brixbench module. While the scope is impressive, there are several areas—ranging from security leaks to reliability—that need to be addressed before this is merged.

🔴 Critical / Must Fix

Internal URL Leak:
In benchmark/runner_test.go, defaultGatewayEndpoint is set to http://aibrix-gateway.bytedance.net. This internal hostname is non-functional for external contributors and should be replaced with localhost or a configurable default.

No-op Observability Exporter:
In internal/observability/exporter.go, PrometheusPushExporter currently "simulates" exports by iterating over keys without doing anything. This silently drops metrics. Please implement the export logic or return ErrNotImplemented to avoid misleading users.

🟡 Significant

Shell Invocation Overload:
Throughout benchmark/preflight_stormservice.go and vllm.go, bash -lc is used to wrap kubectl and helm commands. The -l (login) flag introduces unnecessary environment variability.

Recommendation: Use direct binary invocations via exec.CommandContext or, at minimum, drop the -l flag.

Legacy Field Ambiguity:
In internal/resolver/resolver.go, the presence of both Provider and Deployer (legacy alias) is confusing for an initial import. Please document the deprecation timeline or clean this up now to avoid technical debt.

Initialization Panics:
mustLoadPacificLocation() in benchmark/runner_support_test.go will panic if tzdata is missing (common in minimal CI containers).

Fix: Use a lazy initializer or provide a fallback like time.FixedZone("PT", -8*3600).

Hardcoded Timezone in Logs:
formatScenarioRunID embeds "PT" in filenames. This is problematic for global teams. Please switch to UTC for unambiguous, reproducible log naming.

🔵 Minor / Improvements

URL Construction:
In internal/resolver/artifact_resolution.go, replace manual string joining with url.JoinPath (available since Go 1.19) for better slash normalization.

Modern Atomic Usage:
Since the project uses Go 1.22.5, prefer atomic.Int32 over atomic.AddInt32 in artifact_resolution_test.go for cleaner syntax.

Dependency Transparency:
brixbench/go.mod is quite sparse because it relies heavily on shelling out to kubectl. Consider explicitly documenting the requirement for a pre-configured K8s cluster in docs/walkthrough.md.

Module Wiring:
Ensure brixbench/ is added to the root go.work or CI pipeline so it doesn't bitrot immediately after merging.

Thanks for the detailed review. I pushed follow-up commits addressing the feedback.

Summary:

Removed the hardcoded internal gateway fallback. The runner now uses the detected endpoint, allows an explicit BENCHMARK_GATEWAY_ENDPOINT override, and fails fast otherwise.
Made the Prometheus exporter return ErrNotImplemented and disabled metrics export by default unless BENCHMARK_PUSHGATEWAY_URL is set.
Reduced shell wrapping in the driver/deployer paths by moving simple kubectl/helm calls to direct exec helpers, while keeping shell-required flows explicit.
Removed the legacy deployer schema field from the resolver, now require provider explicitly, and kept provider: null as the canonical plain vLLM baseline.
Replaced Pacific-time initialization/run IDs with UTC.
Switched release artifact URL construction to url.JoinPath.
Modernized resolver test atomics to atomic.Int32.
Added Kubernetes/kubectl prerequisite documentation to the walkthrough.
Checked root workspace/CI wiring and confirmed brixbench is not currently wired there; I left that as a separate follow-up instead of widening this PR.

Verification:

go test -v ./internal/resolver -count=1
go test -v ./internal/drivers -count=1
go test -v ./benchmark -run 'Test(FallbackGatewayEndpoint(Missing|Override)|ResolveGatewayEndpoint(UsesDetectedEndpoint|UsesOverrideOnDetectionFailure|FailsWithoutDetectedEndpointOrOverride)|ConfiguredMetricExporter|ExportMetricsIfConfigured|FormatScenarioRunIDUsesUTC|NowInUTCUsesUTCLocation)$' -count=1
YAML validation for the new Qwen3-8B routing scenario: parsed 30 YAML files, verified 24 scenario test cases, and confirmed benchmark/engine/gateway resource references.
Re-ran the real cluster hello-world path as documented in the tracker.

whalepark · 2026-05-21T23:34:26Z

Pushed the latest updates to feature/add-brixbench.

New commits:

6145393 brixbench: address review feedback
aa3c94c brixbench: add qwen3 routing scenario

What changed:

Addressed the review feedback around gateway endpoint fallback, observability export behavior, shell invocation cleanup, provider/deployer schema ambiguity, UTC run IDs, artifact URL construction, atomic usage, and Kubernetes prerequisites documentation.
Added a Qwen3-8B routing scenario with 24 test cases, including default routing and PD routing variants across 4p8d, 8p4d, and 8p8d configurations.
Added benchmark YAMLs, model manifests, and gateway routing overrides for the new scenario.
Updated the vLLM benchmark driver so an explicit dataset-name in scenario YAML is preserved instead of being overwritten by the default random dataset.

Verification:

go test -v ./internal/resolver -count=1
go test -v ./internal/drivers -count=1
go test -v ./benchmark -run 'Test(FallbackGatewayEndpoint(Missing|Override)|ResolveGatewayEndpoint(UsesDetectedEndpoint|UsesOverrideOnDetectionFailure|FailsWithoutDetectedEndpointOrOverride)|ConfiguredMetricExporter|ExportMetricsIfConfigured|FormatScenarioRunIDUsesUTC|NowInUTCUsesUTCLocation)$' -count=1
Parsed 30 new YAML files and verified all 24 scenario test cases reference existing benchmark, engine manifest, and gateway resource files.

Jeffwan · 2026-05-25T13:37:21Z

@@ -0,0 +1,452 @@
+# AIBrix Benchmark Suite - Walkthrough Guide


change to README.md?

Feel free to add docs under project_root/docs as well

Sounds valid. Will do.

Jeffwan · 2026-05-25T13:39:17Z

+    engine:
+      type: vllm
+      manifest: testdata/deployments/aibrix/models/pd-model.yaml
+    benchmark: testdata/benchmarks/vllm-chat-smoke-pd.yaml


I checked this file, seems it's different from what I expected. looks like it still describe how to deploy a model. I am looking for the benchmark data, how we send request to gateway (rps, concurrency, arrival rate etc). do you have such configs?

kind: vllm-bench execution: cluster image: aibrix-container-registry-cn-beijing.cr.volces.com/autodash/vllm-bench:v0.10.2-20260118 namespace: brixbench-adhoc podName: vllm-bench-client modelHostPath: /data01/models rootHostPath: /root artifacts: resultFilename: bench_results.json logDir: testdata/logs vllmArgs:

You are right, the walkthrough was focused on deployment/scenario wiring and did not clearly explain the request workload config. The request-side benchmark settings live in the benchmark YAML referenced by each test case, under vllmArgs. (brixbench/benchmark/testdata/benchmarks/vllm-chat-smoke-pd.yaml) I will update the README to add a dedicated benchmark workload section covering request rate, concurrency, num prompts, endpoint, dataset name, and routing specific options.

Jeffwan · 2026-05-25T13:47:11Z

brixbench/benchmark/testdata/deployments/aibrix/custom/aibrix-dependency-custom.yaml
and
brixbench/benchmark/testdata/deployments/aibrix/custom/aibrix-core-custom.yaml

are too heavy, do not check in and let's get this removed. Instead, I highly suggest to use production recommended way to deploy it. please check whether helm has some sdk so we can use helm. If not, we can use kustomize to generate the right configuration.

Let me remove them. Thanks for pointing out.

Signed-off-by: Misun Park <misuneeh@gmail.com>

Add brixbench benchmark module

9a678e0

gemini-code-assist Bot reviewed May 1, 2026

View reviewed changes

Comment thread brixbench/benchmark/runner_test.go Outdated

Comment thread brixbench/internal/deployers/command_logs.go

Comment thread brixbench/.gitignore Outdated

Comment thread brixbench/benchmark/preflight_stormservice.go Outdated

Comment thread brixbench/internal/drivers/vllm.go

Merge branch 'main' into feature/add-brixbench

ff004f5

Misun added 2 commits May 22, 2026 07:26

brixbench: address review feedback

6145393

brixbench: add qwen3 routing scenario

aa3c94c

Merge branch 'main' into feature/add-brixbench

e2dabb2

Jeffwan reviewed May 25, 2026

View reviewed changes

brixbench: address latest review comments

f888e2f

Signed-off-by: Misun Park <misuneeh@gmail.com>

		@@ -0,0 +1,452 @@
		# AIBrix Benchmark Suite - Walkthrough Guide

Conversation

whalepark commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Description

May 21 2026

May 27 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

varungup90 commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ Review Feedback: Issues and Suggestions

🔴 Critical / Must Fix

🟡 Significant

🔵 Minor / Improvements

Uh oh!

whalepark commented May 21, 2026

⚠️ Review Feedback: Issues and Suggestions

🔴 Critical / Must Fix

🟡 Significant

🔵 Minor / Improvements

Uh oh!

whalepark commented May 21, 2026

Uh oh!

Jeffwan May 25, 2026

Choose a reason for hiding this comment

Uh oh!

whalepark May 27, 2026

Choose a reason for hiding this comment

Uh oh!

Jeffwan May 25, 2026

Choose a reason for hiding this comment

Uh oh!

whalepark May 27, 2026

Choose a reason for hiding this comment

Uh oh!

Jeffwan May 25, 2026

Choose a reason for hiding this comment

Uh oh!

whalepark May 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

whalepark commented May 1, 2026 •

edited

Loading

varungup90 commented May 4, 2026 •

edited

Loading