[controller] Auto-initialize ZK admin protocol version and enable detection service by default#2769
Open
minhmo1620 wants to merge 3 commits intolinkedin:mainfrom
Conversation
added 3 commits
April 30, 2026 15:35
ProtocolVersionAutoDetectionService previously skipped updating the admin operation protocol version in ZK when the upstream value was the sentinel -1 (undefined). That left fresh clusters stuck without a recorded version, since ZK is only ever initialized when this service writes the smallest good version observed across controllers. Drop the -1 short-circuit so the service initializes ZK on first detection and keeps it in sync thereafter. Also fix a stale Javadoc reference (/adminTopicMetadata -> /adminTopicMetadataV2) in VeniceParentHelixAdmin#getWriterSchemaIdFromZK.
Now that the auto-detection service correctly initializes ZK from the undefined sentinel (-1), flip the default of controller.protocol.version.auto.detection.service.enabled from false to true so parent controllers run it without requiring an explicit opt-in. Updated the Javadoc on the config key accordingly.
… test CONTROLLER_PROTOCOL_VERSION_AUTO_DETECTION_SERVICE_ENABLED now defaults to true, so TestProtocolVersionAutoDetection no longer needs to set it explicitly. Keep the SLEEP_MS override so the test still observes detection within its 30s timeout.
There was a problem hiding this comment.
Pull request overview
This PR updates the Venice parent controller’s protocol version auto-detection behavior to be enabled by default and to auto-initialize the admin operation protocol version in ZK when it is currently undefined.
Changes:
- Enable
controller.protocol.version.auto.detection.service.enabledby default in controller cluster config (and update the config key Javadoc accordingly). - Update
ProtocolVersionAutoDetectionServiceto write the detected “current good” protocol version to ZK even when the upstream ZK value is the undefined sentinel-1. - Update/extend unit + integration tests and fix a stale ZK path reference in
VeniceParentHelixAdminJavadoc.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| services/venice-controller/src/main/java/com/linkedin/venice/controller/VeniceControllerClusterConfig.java | Flips the default to enable the auto-detection service. |
| internal/venice-common/src/main/java/com/linkedin/venice/ConfigKeys.java | Updates config-key Javadoc to reflect the new default behavior. |
| services/venice-controller/src/main/java/com/linkedin/venice/controller/ProtocolVersionAutoDetectionService.java | Removes the upstreamVersion != -1 short-circuit so ZK can be initialized from undefined. |
| services/venice-controller/src/test/java/com/linkedin/venice/controller/TestProtocolVersionAutoDetectionService.java | Renames/adjusts the “no update” test and adds coverage for initializing from upstream -1. |
| internal/venice-test-common/src/integrationTest/java/com/linkedin/venice/endToEnd/TestProtocolVersionAutoDetection.java | Removes explicit enablement since the service is now enabled by default. |
| services/venice-controller/src/main/java/com/linkedin/venice/controller/VeniceParentHelixAdmin.java | Fixes a stale Javadoc reference to the admin topic metadata ZK path. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem Statement
ProtocolVersionAutoDetectionServiceis a parent-controller background task that reconciles the admin operation protocol version stored in ZK with the smallest version reported by all controllers (parent + child) in a cluster.Two issues today:
The service is off by default.
controller.protocol.version.auto.detection.service.enableddefaulted tofalse, so parent controllers needed an explicit opt-in to run the service at all. The service has been running for 6 months+ without issue, it's safe to enable it by default.Remove the required step to init version for cluster The previous predicate skipped the update whenever the upstream value in ZK was
-1(undefined).We will need to manually init the version by using admin-tool. With this new change, the service will auto register the version.
Solution
[controller] Enable protocol version auto-detection service by defaultCONTROLLER_PROTOCOL_VERSION_AUTO_DETECTION_SERVICE_ENABLEDdefault fromfalsetotrueinVeniceControllerClusterConfig.ConfigKeys.javato match the new default.[controller] Initialize ZK protocol version when undefined (-1)upstreamVersion != -1short-circuit inProtocolVersionAutoDetectionService. The remainingcurrentGoodVersion != Long.MAX_VALUEguard still handles the empty-controllers edge case, and the!Objects.equals(currentGoodVersion, upstreamVersion)check now distinguishes "needs update" from "already in sync". When ZK is-1, the inequality is true and the service initializes ZK to the current good version on the next tick.VeniceParentHelixAdmin#getWriterSchemaIdFromZK(/adminTopicMetadata->/adminTopicMetadataV2) noticed while tracing this code path.Code changes
controller.protocol.version.auto.detection.service.enabledflipped fromfalse->true. Listed in the description above.Concurrency-Specific Checks
How was this PR tested?
testProtocolVersionDetectionWithNoUpdate->testProtocolVersionDetectionWithMatchingUpstreamVersionand adjusted the setup to use a matching upstream (1L) so the no-update assertion remains meaningful under the new behavior.testProtocolVersionDetectionUpdatesWhenUpstreamIsUndefinedto cover the new behavior: when ZK has no recorded version, the service must writecurrentGoodVersionto ZK../gradlew :services:venice-controller:test --tests "com.linkedin.venice.controller.TestProtocolVersionAutoDetectionService"— 5 tests pass.Does this PR introduce any user-facing or breaking changes?