[schema] Bump AdminOperation v98 for prc record-count verification (stacked on #2776)#2777
Merged
ymuppala merged 3 commits intolinkedin:mainfrom May 7, 2026
Merged
Conversation
…count verification Schema-only PR. Adds two of the three schema versions needed for upcoming server-side batch-push record-count verification (prc) work. AdminOperation v98 lands in a separate follow-up PR (split to stay under the enforce-lines-added 2000-line cap). Schemas added: - PartitionState v22 -> v23: adds `batchPushRecordCount` (long, default 0). Persists the server-side count of batch-push data records, used at EOP to verify against the producer-side `prc` PubSub header (PR linkedin#2758). - StoreMetaValue v42 -> v43: adds `batchPushRecordCountVerificationEnabled` (boolean, default false). Per-store opt-in for the throw path on record-count deficits. Both additive with safe defaults — backward-compatible. `build.gradle` pins compileAvro to v22/v42 via `versionOverrides`. The follow-up consumer-code PR will remove the pins and bump `AvroProtocolDefinition.PARTITION_STATE` (22->23) and `METADATA_SYSTEM_SCHEMA_STORE` (42->43) constants in lockstep. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Adds a new AdminOperation v98 Avro schema to carry a new store-level flag used by upcoming server-side batch-push record-count verification, while keeping current runtime behavior unchanged by pinning Avro generation to v97 during the schema-only window.
Changes:
- Added
services/venice-controller/.../AdminOperation/v98/AdminOperation.avscwith the new boolean fieldbatchPushRecordCountVerificationEnabled(defaultfalse) on the store-config admin message payload. - Updated root
build.gradlecompileAvro.versionOverridesto keep the controller’s active AdminOperation schema pinned to v97 until the follow-up wiring PR lands.
Reviewed changes
Copilot reviewed 1 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| services/venice-controller/src/main/resources/avro/AdminOperation/v98/AdminOperation.avsc | Introduces AdminOperation v98 with the new store config flag (additive, default false). |
| build.gradle | Pins controller Avro generation to AdminOperation v97 so v98 can land inertly until code is ready. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Schema-only PR. Adds AdminOperation v98 to carry the new per-store `batchPushRecordCountVerificationEnabled` flag on the SetStore admin op, so parent controllers can propagate it to child controllers via the admin topic. Stacked on top of PR linkedin#2776 (PartitionState v23 + StoreMetaValue v43) because both PRs touch the same `versionOverrides` block in build.gradle. Stacking ensures the override list grows monotonically with no merge conflict when both PRs land. Once linkedin#2776 merges, this PR's branch will be rebased on new main and the diff will collapse to just the AdminOperation additions (the v22 / v42 pins from linkedin#2776 will already be in main). Field added (additive, default false): batchPushRecordCountVerificationEnabled (boolean) on SetStore. `build.gradle` extends the existing `versionOverrides` list (PartitionState/v22 and StoreMetaValue/v42 from linkedin#2776) with AdminOperation/v97. The follow-up consumer-code PR removes all three pins together when bumping `AvroProtocolDefinition.PARTITION_STATE` / METADATA_SYSTEM_SCHEMA_STORE / ADMIN_OPERATION constants in lockstep with the controller wiring. Backward-compatible: v97-aware deserializers reading v98 records treat the unknown field as default `false`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
5b42eb0 to
d332d70
Compare
Collaborator
Author
|
@copilot resolve the merge conflicts in this pull request |
xunyin8
reviewed
May 7, 2026
xunyin8
approved these changes
May 7, 2026
ymuppala
added a commit
to ymuppala/venice
that referenced
this pull request
May 7, 2026
linkedin#2776 (PartitionState v23 + StoreMetaValue v43) and linkedin#2777 (AdminOperation v98) merged with versionOverrides pinning compileAvro to the prior versions (v22, v42, v97). This PR consumes those new schemas via the AvroProtocolDefinition bumps (already in earlier commits on this branch), so the pins must come down in lockstep. After this change, compileAvro picks v23/v43/v98 (no OVERRIDE) and the generated SpecificRecord classes contain the new fields. The unrelated KafkaMessageEnvelope/v13 pin (from linkedin#2778) is preserved. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
In March 2026, a Spark-based VPJ KIF repush silently wrote batch data to the wrong datacenter (incident-10468). VPJ reported success while two of three DCs lost batch data. The root cause was an unguarded trust boundary: Venice servers had no way to verify that they actually received the records VPJ claimed it sent.
The full fix has three parts: (1) a producer-side
prcPubSub header on EOP (already shipped in #2758); (2) consumer-side counting + comparison + per-store opt-in throw path; (3) failure plumbing back to the push job. Parts (2) and (3) require schema bumps, which land schema-only first per repo policy, then in the consumer-code PR.Solution
This PR adds AdminOperation v98. Companion to #2776 (which adds
PartitionStatev23 +StoreMetaValuev43); split out separately becauseAdminOperation.avscalone is 1309 lines and combining all three would push #2776 over theenforce-lines-added2000-line cap.Stacking on #2776
Both #2776 and this PR modify the same
versionOverridesblock inbuild.gradle. To avoid a merge conflict when both PRs land, this PR is stacked on top of #2776:prc-schema-admin-op-v98is based onprc-schema-bumps(the branch behind [schema] Bump PartitionState v23 + StoreMetaValue v43 for prc record-count verification #2776), not directly onmain.build.gradlein this PR shows the full final state of the override list:[PartitionState/v22, StoreMetaValue/v42, AdminOperation/v97]. The first two come from [schema] Bump PartitionState v23 + StoreMetaValue v43 for prc record-count verification #2776; this PR appends the third.main. The rebased diff will only contain the AdminOperation additions (the PartitionState/StoreMetaValue overrides will already be inmain).Why AdminOperation needs a new version
SetStorebatchPushRecordCountVerificationEnabled(boolean, defaultfalse)UpdateStorelands) and replicated to every child controller via this admin topic. Without a bump on theSetStoremessage, child controllers can't receive the new flag — opting a store in on the parent would be a no-op everywhere else.The change is strictly additive with safe default
false. Forward-compatibility: a v98-aware reader can still consume v97 records (missing field reads asfalse). Backward-compatibility: a v97-aware reader consumes v98 records by ignoring the extra field. No coordinated cutover required.Build pinning during the schema-only window
build.gradleadds an entry tocompileAvro.versionOverridespinning the activeAdminOperationschema to v97 even while v98 sits in the source tree. Pattern documented atbuild.gradle:328-330. This is critical because:.avscfile needs to be physically merged so it's inmainbefore any consumer code references it.AvroProtocolDefinition.ADMIN_OPERATIONto 98 here without consumer code would lock us into a half-built state where serialized admin ops record protocol version 98 but carry no new payload.Keeping
AvroProtocolDefinition.javaat the old constant and pinning the gradle build to v97 ensures zero runtime behavior change from this PR — v98 is dormant in the source tree, ready for the follow-up consumer-code PR to activate it in lockstep withVeniceParentHelixAdminandAdminExecutionTaskwiring.Companion PRs
versionOverridespins, bumpsAvroProtocolDefinition.PARTITION_STATE(22→23),METADATA_SYSTEM_SCHEMA_STORE(42→43),ADMIN_OPERATION(97→98) constants, wires the actual consumer code: server-side counting, EOP comparison, per-store throw path, controllerstatusDetailspropagation, VPJ checkpoint detection.Test plan
./gradlew :services:venice-controller:compileAvro— log confirms(OVERRIDE)annotation onAdminOperation/v97(the pinned version actually picked)../gradlew :internal:venice-common:compileAvro— same forPartitionState/v22andStoreMetaValue/v42(inherited from [schema] Bump PartitionState v23 + StoreMetaValue v43 for prc record-count verification #2776's commit).Backwards compatibility
false.🤖 Generated with Claude Code