Skip to content

Conversation

@pcholakov
Copy link
Contributor

This change adds a basic time interval configuration option for triggering automated snapshots. When both time and record-based intervals are set, the logic for the trigger condition is an "and" between the two, so the record interval becomes a minimum requirement for time-based snapshotting.

Time interval-based snapshotting will get a lot more useful with incremental snapshot support, but this is already a big improvement as it allows operators to configure snapshots in slightly more intuitive terms of elapsed time.

The time interval is measured from the created-at timestamp of the published snapshot, which is based on the wall clock time of the producing node. Clock drift error is assumed to be relatively negligible compared to typical snapshotting intervals. A small amount of jitter is added to each partition's snapshot age check to spread snapshot uploads out over time.

Configuration examples

Periodic + minimum number of records

[worker.snapshots]
destination = "s3://bucket/cluster-name"
snapshot-interval-num-records = 10000
snapshot-interval = "30 min"

Translates into: "snapshot every 30 minutes, as long as the applied LSN has advanced by least 10,000 records since the last snapshot".

Periodic (unconditional)

[worker.snapshots]
destination = "s3://bucket/cluster-name"
snapshot-interval = "24 hours"

Translates into: "snapshot once every 24 hours even if zero new changes are committed".

@github-actions
Copy link

github-actions bot commented Oct 21, 2025

Test Results

  7 files  ±0    7 suites  ±0   2m 37s ⏱️ +7s
 47 tests ±0   47 ✅ ±0  0 💤 ±0  0 ❌ ±0 
200 runs  ±0  200 ✅ ±0  0 💤 ±0  0 ❌ ±0 

Results for commit ba47e39. ± Comparison against base commit dbb33cf.

♻️ This comment has been updated with latest results.

Copy link
Contributor

@tillrohrmann tillrohrmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for creating this PR @pcholakov. The changes look good to me. +1 for merging. Do we need to update our docs to also mention the snapshot_interval?

}

/// Retrieve the latest known LSN to be archived to the snapshot repository.
/// Response of `Ok(Lsn::INVALID)` indicates no existing snapshot for the partition.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe update the comment.

match self {
ArchivedLsn::None => Ok(Duration::MAX),
ArchivedLsn::Snapshot { created_at, .. } => {
SystemTime::now().duration_since(*created_at)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it important that we distinguish between created_at > now and created_at == now? If not, then we might get rid fo the error case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, collapsed to zero - good call to remove the error!

@pcholakov
Copy link
Contributor Author

Thanks for taking a look, @tillrohrmann! Good points, addressed both. Docs will definitely need an update but there are a couple more PRs in this train and I'll take a pass all the changes are in.

@tillrohrmann tillrohrmann linked an issue Nov 17, 2025 that may be closed by this pull request
@pcholakov pcholakov merged commit 961169b into main Nov 17, 2025
26 checks passed
@pcholakov pcholakov deleted the pavel/vowpsqrwurmp branch November 17, 2025 20:23
@github-actions github-actions bot locked and limited conversation to collaborators Nov 17, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Time based snapshots

3 participants