Skip to content

Conversation

@aerfrei
Copy link
Owner

@aerfrei aerfrei commented Nov 3, 2025

No description provided.

KeithCh and others added 3 commits October 31, 2025 13:17
Add support for ALTER CHANGEFEED SET/UNSET INCLUDE/EXCLUDE TABLES.
The following syntax is supported:
ALTER CHANGEFEED 123 SET INCLUDE TABLES foo, bar
ALTER CHANGEFEED 123 SET EXCLUDE TABLES foo, bar
ALTER CHANGEFEED 123 UNSET INCLUDE TABLES
ALTER CHANGEFEED 123 UNSET EXCLUDE TABLES

Release note: none
Resolves: cockroachdb#147428
Doesn't address initial scan, do that in follow up
@blathers-crl
Copy link

blathers-crl bot commented Nov 3, 2025

Thank you for contributing to CockroachDB. Please ensure you have followed the guidelines for creating a PR.

Before a member of our team reviews your PR, I have some potential action items for you:

  • We notice you have more than one commit in your PR. We try break logical changes into separate commits, but commits such as "fix typo" or "address review commits" should be squashed into one commit and pushed with --force
  • Please ensure your git commit message contains a release note.
  • When CI has completed, please ensure no errors have appeared.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@blathers-crl
Copy link

blathers-crl bot commented Nov 3, 2025

Thank you for updating your pull request.

Before a member of our team reviews your PR, I have some potential action items for you:

  • We notice you have more than one commit in your PR. We try break logical changes into separate commits, but commits such as "fix typo" or "address review commits" should be squashed into one commit and pushed with --force
  • Please ensure your git commit message contains a release note.
  • When CI has completed, please ensure no errors have appeared.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@blathers-crl
Copy link

blathers-crl bot commented Nov 3, 2025

Thank you for updating your pull request.

Before a member of our team reviews your PR, I have some potential action items for you:

  • We notice you have more than one commit in your PR. We try break logical changes into separate commits, but commits such as "fix typo" or "address review commits" should be squashed into one commit and pushed with --force
  • Please ensure your git commit message contains a release note.
  • When CI has completed, please ensure no errors have appeared.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@aerfrei aerfrei force-pushed the alter-database-changefeed-syntax branch from 528bc77 to b4acc4c Compare November 4, 2025 22:34
aerfrei pushed a commit that referenced this pull request Nov 14, 2025
156830:  storeliveness: smear storeliveness heartbeat messages to reduce goroutine spikes at heartbeat interval tick r=miraradeva,iskettaneh a=dodeca12

This PR introduces heartbeat smearing logic that batches and smears Store Liveness heartbeat sends across destination nodes to prevent thundering herd of goroutine spikes.

### Changes

Core changes are within these files:

```sh
pkg/kv/kvserver/storeliveness
├── support_manager.go  # Rename SendAsync→EnqueueMessage, add smearing settings
└── transport.go        # Add smearing sender goroutine to transport.go which takes care of smearing when enabled
```

### Background

Previously, all stores in a cluster sent heartbeats immediately at each heartbeat interval tick. In large clusters with multi-store nodes, this created synchronized bursts of goroutine spikes that caused issues in other parts of the running CRDB process.

### Commits

**Commit: Introduce heartbeat smearing**

- Adds a smearing sender goroutine to `transport.go` that batches enqueued messages
- Smears send signals across queues using `taskpacer` to spread traffic over time
- Configurable via cluster settings (default: enabled)

**How it works:**

1. Messages are enqueued via `EnqueueMessage()` into per-node queues
2. When `SendAllEnqueuedMessages()` is called, transport's smearing sender goroutine waits briefly to batch messages
3. Transport's smearing sender goroutine uses `taskpacer` to pace signaling to each queue over a smear duration
4. Each `processQueue` goroutine drains its queue and sends when signalled

### New Cluster Settings

- `kv.store_liveness.heartbeat_smearing.enabled` (default: true) - Enable/disable smearing
- `kv.store_liveness.heartbeat_smearing.refresh` (default: 10ms) - Batching window duration
- `kv.store_liveness.heartbeat_smearing.smear` (default: 1ms) - Time to spread sends across queues

### Backward Compatibility

- Feature is disabled by setting `kv.store_liveness.heartbeat_smearing.enabled=false`
- When disabled, messages are sent immediately via the existing path (non-smearing mode)

### Testing

- Added comprehensive unit tests verifying:
  - Messages batch correctly across multiple destinations
  - Smearing spreads signals over the configured time window
  - Smearing mode vs immediate mode behaviour
  - Message ordering and reliability

All existing tests updated to call `SendAllEnqueuedMessages()` after enqueuing when smearing is enabled.

#### Roachprod testing

##### Prototype #1

- Ran a prototype with a [similar design](cockroachdb#154942) (TL;DR of prototype, the heartbeats were smeared via `SupportManager` goroutines being put to sleep; this current design ensures that `SupportManager` goroutines do not get blocked) on a roachprod with 150 node cluster to verify smearing works. 

| Before changes (current behaviour on master) | After changes (prototype) |
|--------|--------|
| <img width="2680" height="570" alt="image" src="https://github.com/user-attachments/assets/32fe6ee0-437f-48eb-b3f1-087a3eafe6ac" /> | <img width="2692" height="634" alt="image" src="https://github.com/user-attachments/assets/66b5b82b-bbc4-4f47-a13e-5f6d42a1c6d4" /> |

##### Current changes

- Ran a roachprod test with current changes but without the check for empty queues (more info - https://reviewable.io/reviews/cockroachdb/cockroach/156378#-). This check did end up proving vital, as the test results didn't show the expected smearing behaviour. 

- Ran a mini-roachprod test on [this prototype commit](https://github.com/cockroachdb/cockroach/pull/155317/files#diff-9282b4b1d9a2fe32fae81e5776eb081e58069b4bc7db76718873b75f026e16c1) (where the only real difference between my changes is the inclusion of the length check on the queues that have messages on that commit) showed expected smearing behaviour. 

<img width="1797" height="469" alt="image" src="https://github.com/user-attachments/assets/bd7778ef-9f8d-4dbf-8ed2-dac40e7fb03c" />

Fixes: cockroachdb#148210

Release note: None


Co-authored-by: Swapneeth Gorantla <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants