Skip to content

[Misc] Add brixbench benchmark module#2165

Open
whalepark wants to merge 6 commits into
vllm-project:mainfrom
whalepark:feature/add-brixbench
Open

[Misc] Add brixbench benchmark module#2165
whalepark wants to merge 6 commits into
vllm-project:mainfrom
whalepark:feature/add-brixbench

Conversation

@whalepark
Copy link
Copy Markdown

@whalepark whalepark commented May 1, 2026

Pull Request Description

This PR imports the initial brixbench benchmark module into the aibrix repository.

Included in this import:

  • benchmark runner, resolver, deployer, driver, and observability packages under brixbench/
  • benchmark scenarios, deployment manifests, and benchmark configuration files
  • walkthrough documentation for the benchmark module
  • module/import path updates for the nested brixbench module inside aibrix

Additional cleanup included in this PR:

  • removed local-only and AI workflow metadata from the imported tree
  • updated the module path to github.com/vllm-project/aibrix/brixbench
  • removed bundled versioned release YAMLs from the repository
  • added on-demand release artifact download and local cache logic under .tmp/releases/<version>/
  • added resolver tests for release artifact download and cache reuse

Notes:

  • Version-based release artifact resolution now downloads from GitHub releases at runtime instead of relying on checked-in release YAML files.

May 21 2026

Addressed the latest review feedback and pushed follow-up commits.

Key updates:

  • Removed the hardcoded gateway fallback and now require either a detected endpoint or an explicit BENCHMARK_GATEWAY_ENDPOINT override.
  • Cleaned up the resolver schema to require provider explicitly, reject deprecated deployer, and preserve provider: null as the canonical plain vLLM baseline path.
  • Added focused regression tests for gateway endpoint resolution, explicit provider: null handling, deprecated deployer rejection, and missing-provider rejection.

Verification:

  • go test -v ./internal/resolver -count=1
  • go test -v ./benchmark -run 'Test(FallbackGatewayEndpoint(Missing|Override)|ResolveGatewayEndpoint(UsesDetectedEndpoint|UsesOverrideOnDetectionFailure|FailsWithoutDetectedEndpointOrOverride)|ConfiguredMetricExporter|ExportMetricsIfConfigured|FormatScenarioRunIDUsesUTC|NowInUTCUsesUTCLocation)$' -count=1

May 27 2026

Pushed follow-up updates for the latest review comments.

Changes:

  • Moved the benchmark walkthrough into the brixbench README entry point.
  • Added documentation for benchmark workload configs, especially vllmArgs such as request-rate, concurrency, num-prompts, dataset-name, goodput, and routing-specific options.
  • Removed the large generated custom AIBrix manifests from testdata and kept the benchmark flow on release-artifact/workspace Helm-based deployment paths.

Verification:

  • Confirmed the removed custom manifests are not referenced by active brixbench scenarios.
  • Re-ran YAML/scenario reference validation.
  • Re-ran the relevant Go tests.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a benchmark suite for AIBrix, including a test runner, deployment logic for various platforms, and support for running vLLM benchmarks. The feedback highlights concerns regarding the use of fixed sleeps, the security and portability issues of shelling out to bash, the need for more robust file ignoring in .gitignore, and a more reliable method for extracting benchmark results from pods.

Comment thread brixbench/benchmark/runner_test.go Outdated
Comment thread brixbench/internal/deployers/command_logs.go
Comment thread brixbench/.gitignore Outdated
Comment thread brixbench/benchmark/preflight_stormservice.go Outdated
Comment thread brixbench/internal/drivers/vllm.go
@varungup90
Copy link
Copy Markdown
Collaborator

varungup90 commented May 4, 2026

@whalepark can you sign the commit and address gemini comments?


⚠️ Review Feedback: Issues and Suggestions

Thanks for the comprehensive import of the brixbench module. While the scope is impressive, there are several areas—ranging from security leaks to reliability—that need to be addressed before this is merged.


🔴 Critical / Must Fix

  1. Internal URL Leak:
    In benchmark/runner_test.go, defaultGatewayEndpoint is set to http://aibrix-gateway.bytedance.net. This internal hostname is non-functional for external contributors and should be replaced with localhost or a configurable default.
  2. No-op Observability Exporter:
    In internal/observability/exporter.go, PrometheusPushExporter currently "simulates" exports by iterating over keys without doing anything. This silently drops metrics. Please implement the export logic or return ErrNotImplemented to avoid misleading users.

🟡 Significant

  1. Shell Invocation Overload:
    Throughout benchmark/preflight_stormservice.go and vllm.go, bash -lc is used to wrap kubectl and helm commands. The -l (login) flag introduces unnecessary environment variability.
    • Recommendation: Use direct binary invocations via exec.CommandContext or, at minimum, drop the -l flag.
  2. Legacy Field Ambiguity:
    In internal/resolver/resolver.go, the presence of both Provider and Deployer (legacy alias) is confusing for an initial import. Please document the deprecation timeline or clean this up now to avoid technical debt.
  3. Initialization Panics:
    mustLoadPacificLocation() in benchmark/runner_support_test.go will panic if tzdata is missing (common in minimal CI containers).
    • Fix: Use a lazy initializer or provide a fallback like time.FixedZone("PT", -8*3600).
  4. Hardcoded Timezone in Logs:
    formatScenarioRunID embeds "PT" in filenames. This is problematic for global teams. Please switch to UTC for unambiguous, reproducible log naming.

🔵 Minor / Improvements

  1. URL Construction:
    In internal/resolver/artifact_resolution.go, replace manual string joining with url.JoinPath (available since Go 1.19) for better slash normalization.
  2. Modern Atomic Usage:
    Since the project uses Go 1.22.5, prefer atomic.Int32 over atomic.AddInt32 in artifact_resolution_test.go for cleaner syntax.
  3. Dependency Transparency:
    brixbench/go.mod is quite sparse because it relies heavily on shelling out to kubectl. Consider explicitly documenting the requirement for a pre-configured K8s cluster in docs/walkthrough.md.
  4. Module Wiring:
    Ensure brixbench/ is added to the root go.work or CI pipeline so it doesn't bitrot immediately after merging.

@whalepark
Copy link
Copy Markdown
Author

@whalepark can you sign the commit and address gemini comments?

⚠️ Review Feedback: Issues and Suggestions

Thanks for the comprehensive import of the brixbench module. While the scope is impressive, there are several areas—ranging from security leaks to reliability—that need to be addressed before this is merged.

🔴 Critical / Must Fix

  1. Internal URL Leak:
    In benchmark/runner_test.go, defaultGatewayEndpoint is set to http://aibrix-gateway.bytedance.net. This internal hostname is non-functional for external contributors and should be replaced with localhost or a configurable default.
  2. No-op Observability Exporter:
    In internal/observability/exporter.go, PrometheusPushExporter currently "simulates" exports by iterating over keys without doing anything. This silently drops metrics. Please implement the export logic or return ErrNotImplemented to avoid misleading users.

🟡 Significant

  1. Shell Invocation Overload:
    Throughout benchmark/preflight_stormservice.go and vllm.go, bash -lc is used to wrap kubectl and helm commands. The -l (login) flag introduces unnecessary environment variability.

    • Recommendation: Use direct binary invocations via exec.CommandContext or, at minimum, drop the -l flag.
  2. Legacy Field Ambiguity:
    In internal/resolver/resolver.go, the presence of both Provider and Deployer (legacy alias) is confusing for an initial import. Please document the deprecation timeline or clean this up now to avoid technical debt.

  3. Initialization Panics:
    mustLoadPacificLocation() in benchmark/runner_support_test.go will panic if tzdata is missing (common in minimal CI containers).

    • Fix: Use a lazy initializer or provide a fallback like time.FixedZone("PT", -8*3600).
  4. Hardcoded Timezone in Logs:
    formatScenarioRunID embeds "PT" in filenames. This is problematic for global teams. Please switch to UTC for unambiguous, reproducible log naming.

🔵 Minor / Improvements

  1. URL Construction:
    In internal/resolver/artifact_resolution.go, replace manual string joining with url.JoinPath (available since Go 1.19) for better slash normalization.
  2. Modern Atomic Usage:
    Since the project uses Go 1.22.5, prefer atomic.Int32 over atomic.AddInt32 in artifact_resolution_test.go for cleaner syntax.
  3. Dependency Transparency:
    brixbench/go.mod is quite sparse because it relies heavily on shelling out to kubectl. Consider explicitly documenting the requirement for a pre-configured K8s cluster in docs/walkthrough.md.
  4. Module Wiring:
    Ensure brixbench/ is added to the root go.work or CI pipeline so it doesn't bitrot immediately after merging.

Thanks for the detailed review. I pushed follow-up commits addressing the feedback.

Summary:

  • Removed the hardcoded internal gateway fallback. The runner now uses the detected endpoint, allows an explicit BENCHMARK_GATEWAY_ENDPOINT override, and fails fast otherwise.
  • Made the Prometheus exporter return ErrNotImplemented and disabled metrics export by default unless BENCHMARK_PUSHGATEWAY_URL is set.
  • Reduced shell wrapping in the driver/deployer paths by moving simple kubectl/helm calls to direct exec helpers, while keeping shell-required flows explicit.
  • Removed the legacy deployer schema field from the resolver, now require provider explicitly, and kept provider: null as the canonical plain vLLM baseline.
  • Replaced Pacific-time initialization/run IDs with UTC.
  • Switched release artifact URL construction to url.JoinPath.
  • Modernized resolver test atomics to atomic.Int32.
  • Added Kubernetes/kubectl prerequisite documentation to the walkthrough.
  • Checked root workspace/CI wiring and confirmed brixbench is not currently wired there; I left that as a separate follow-up instead of widening this PR.

Verification:

  • go test -v ./internal/resolver -count=1
  • go test -v ./internal/drivers -count=1
  • go test -v ./benchmark -run 'Test(FallbackGatewayEndpoint(Missing|Override)|ResolveGatewayEndpoint(UsesDetectedEndpoint|UsesOverrideOnDetectionFailure|FailsWithoutDetectedEndpointOrOverride)|ConfiguredMetricExporter|ExportMetricsIfConfigured|FormatScenarioRunIDUsesUTC|NowInUTCUsesUTCLocation)$' -count=1
  • YAML validation for the new Qwen3-8B routing scenario: parsed 30 YAML files, verified 24 scenario test cases, and confirmed benchmark/engine/gateway resource references.
  • Re-ran the real cluster hello-world path as documented in the tracker.

@whalepark
Copy link
Copy Markdown
Author

Pushed the latest updates to feature/add-brixbench.

New commits:

  • 6145393 brixbench: address review feedback
  • aa3c94c brixbench: add qwen3 routing scenario

What changed:

  • Addressed the review feedback around gateway endpoint fallback, observability export behavior, shell invocation cleanup, provider/deployer schema ambiguity, UTC run IDs, artifact URL construction, atomic usage, and Kubernetes prerequisites documentation.
  • Added a Qwen3-8B routing scenario with 24 test cases, including default routing and PD routing variants across 4p8d, 8p4d, and 8p8d configurations.
  • Added benchmark YAMLs, model manifests, and gateway routing overrides for the new scenario.
  • Updated the vLLM benchmark driver so an explicit dataset-name in scenario YAML is preserved instead of being overwritten by the default random dataset.

Verification:

  • go test -v ./internal/resolver -count=1
  • go test -v ./internal/drivers -count=1
  • go test -v ./benchmark -run 'Test(FallbackGatewayEndpoint(Missing|Override)|ResolveGatewayEndpoint(UsesDetectedEndpoint|UsesOverrideOnDetectionFailure|FailsWithoutDetectedEndpointOrOverride)|ConfiguredMetricExporter|ExportMetricsIfConfigured|FormatScenarioRunIDUsesUTC|NowInUTCUsesUTCLocation)$' -count=1
  • Parsed 30 new YAML files and verified all 24 scenario test cases reference existing benchmark, engine manifest, and gateway resource files.

Comment thread brixbench/docs/walkthrough.md Outdated
@@ -0,0 +1,452 @@
# AIBrix Benchmark Suite - Walkthrough Guide
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change to README.md?

Feel free to add docs under project_root/docs as well

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds valid. Will do.

Comment thread brixbench/docs/walkthrough.md Outdated
engine:
type: vllm
manifest: testdata/deployments/aibrix/models/pd-model.yaml
benchmark: testdata/benchmarks/vllm-chat-smoke-pd.yaml
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked this file, seems it's different from what I expected. looks like it still describe how to deploy a model. I am looking for the benchmark data, how we send request to gateway (rps, concurrency, arrival rate etc). do you have such configs?

kind: vllm-bench
execution: cluster
image: aibrix-container-registry-cn-beijing.cr.volces.com/autodash/vllm-bench:v0.10.2-20260118
namespace: brixbench-adhoc
podName: vllm-bench-client
modelHostPath: /data01/models
rootHostPath: /root

artifacts:
  resultFilename: bench_results.json
  logDir: testdata/logs

vllmArgs:

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, the walkthrough was focused on deployment/scenario wiring and did not clearly explain the request workload config. The request-side benchmark settings live in the benchmark YAML referenced by each test case, under vllmArgs. (brixbench/benchmark/testdata/benchmarks/vllm-chat-smoke-pd.yaml) I will update the README to add a dedicated benchmark workload section covering request rate, concurrency, num prompts, endpoint, dataset name, and routing specific options.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

brixbench/benchmark/testdata/deployments/aibrix/custom/aibrix-dependency-custom.yaml
and
brixbench/benchmark/testdata/deployments/aibrix/custom/aibrix-core-custom.yaml

are too heavy, do not check in and let's get this removed. Instead, I highly suggest to use production recommended way to deploy it. please check whether helm has some sdk so we can use helm. If not, we can use kustomize to generate the right configuration.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me remove them. Thanks for pointing out.

Signed-off-by: Misun Park <misuneeh@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants