Skip to content

Conversation

@wlwilliamx
Copy link
Collaborator

@wlwilliamx wlwilliamx commented Jan 22, 2026

What problem does this PR solve?

Issue Number: close #4049

What is changed and how it works?

Retry EtcdBackend.SetChangefeedProgress when etcd ModRevision CAS fails due to concurrent writers updating the same key, so admin APIs (remove/pause) won’t flake with ErrMetaOpFailed. Also adds a unit test covering a CAS-conflict-then-success retry.

Check List

Tests

  • Unit test
  • Manual test

Questions

Will it cause performance regression or break compatibility?

No. The retry loop is bounded and only triggers on CAS conflicts; the common path is unchanged.

Do you need to update user documentation, design documentation or monitoring documentation?

N/A

Release note

Fix flaky changefeed remove/pause failures caused by etcd CAS conflicts when persisting changefeed progress.

@ti-chi-bot ti-chi-bot bot added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label Jan 22, 2026
@gemini-code-assist
Copy link

Summary of Changes

Hello @wlwilliamx, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the robustness of changefeed administration by introducing a retry mechanism for SetChangefeedProgress operations that interact with etcd. By gracefully handling concurrent write conflicts on etcd keys, it eliminates a source of flakiness in critical user-facing APIs like changefeed removal and pausing, ensuring more reliable system behavior.

Highlights

  • Retry Mechanism for Etcd CAS Conflicts: Implemented a retry loop in the SetChangefeedProgress function to gracefully handle etcd Compare-And-Swap (CAS) conflicts that occur due to concurrent writes.
  • Improved Reliability of Admin APIs: This change prevents ErrMetaOpFailed errors, which previously caused flakiness in changefeed administration APIs such as remove and pause.
  • Unit Test for Retry Logic: A new unit test, TestSetChangefeedProgressRetriesOnCASConflict, has been added to verify the correct behavior of the retry mechanism when a CAS conflict occurs and is subsequently resolved.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@ti-chi-bot ti-chi-bot bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jan 22, 2026
@wlwilliamx
Copy link
Collaborator Author

/test all

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a retry mechanism for SetChangefeedProgress when an etcd Compare-And-Swap (CAS) operation fails due to concurrent updates. This significantly improves the robustness of admin APIs like remove and pause by preventing ErrMetaOpFailed flakiness. A dedicated unit test has been added to cover the CAS conflict and successful retry scenario, which is excellent. The retry logic correctly incorporates a maximum number of attempts and a delay with context cancellation. Overall, this is a valuable improvement for the stability of the system.

@wlwilliamx
Copy link
Collaborator Author

/retest

@ti-chi-bot ti-chi-bot bot added needs-1-more-lgtm Indicates a PR needs 1 more LGTM. approved labels Jan 29, 2026
@ti-chi-bot ti-chi-bot bot added the lgtm label Jan 30, 2026
@ti-chi-bot
Copy link

ti-chi-bot bot commented Jan 30, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: tenfyzhong, wk989898

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [tenfyzhong,wk989898]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot removed the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Jan 30, 2026
@ti-chi-bot
Copy link

ti-chi-bot bot commented Jan 30, 2026

[LGTM Timeline notifier]

Timeline:

  • 2026-01-29 04:26:26.042716923 +0000 UTC m=+1245613.656673779: ☑️ agreed by wk989898.
  • 2026-01-30 10:55:53.953213423 +0000 UTC m=+1355381.567170269: ☑️ agreed by tenfyzhong.

@wlwilliamx wlwilliamx merged commit 43ffaf5 into pingcap:master Jan 30, 2026
26 of 29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved lgtm release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Changefeed remove/pause can fail due to etcd CAS conflict when persisting progress

3 participants