Update Zodan for v5_STABLE #335

mason-sharp · 2026-02-08T04:55:29Z

Cherry-pick Zodan commits into v5_STABLE.

coderabbitai · 2026-02-08T04:55:36Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

🔍 Trigger a full review

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch task/v5-zodan-updates

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

A psql script can't rely on a PGDATABASE value. Using the default value also looks unreliable. Hence, let's require dbname and throw an exception if it is not mentioned in the DSN string. Also, centralise manipulations with the dbname string to ease further potential changes.

A node is added under high load of 'blind' UPDATE transactions. These transactions designed to update separate parts of the table on different nodes to avoid any conflict issues during the test.

Now, each apply progress HTAB entry contains tangled values of remote_commit_lsn and remote_commit_ts columns (see handle_commit). So, when extracting data from the progress view, we can use any of these numbers, and using LSN streamlines the check_commit_timestamp_and_advance_slot procedure: we can directly point to the correct WAL record, avoiding the expensive get_lsn_from_commit_ts() call. So, replace the expensive search for the proper LSN with direct picking of a specific WAL record. TODO: Further commits should also change the Python version. And consequently, remove the function get_lsn_from_commit_ts() from the extension altogether.

Add health check from zodan.py to zodan.sql Add verification from zodan.sql to zodan.py

Remove the following unused routines: - get_commit_timestamp - advance_replication_slot - monitor_replication_lag(text, boolean) - monitor_replication_lag_wait - monitor_multiple_replication_lags - wait_for_n3_sync - check_node_lag In the procedure monitor_lag_with_dblink, use the function clock_timestamp instead of now(). The now() function is stable and doesn't change the value within the monitoring loop. So, the volatile clock_timestamp looks more correct to obtain fresh value of time.

When adding a third node in a Z0DAN sync scenario, the origin advancement logic in spock.check_commit_timestamp_and_advance_slot was using spock.lag_tracker to retrieve commit timestamps and convert them back to LSNs. This approach no longer works because spock.progress is now an in-memory HTAB instead of a catalog table, so lag_tracker doesn't retain historical data after the SYNC process COPY operation. Root Cause: The procedure spock.create_disable_subscriptions_and_slots creates logical slots on existing nodes (e.g., n2) when adding a new node (n3). In v5, the commit LSN/timestamp from the source node (n1) was copied to n3 via lag_tracker during SYNC, and spock.check_commit_timestamp_and_advance_slot would use this to advance the origin. With the HTAB-based progress table, this data is no longer available after COPY. The Fix: 1. Capture the LSN returned by pg_create_logical_replication_slot in create_disable_subscriptions_and_slots and store it in temp_sync_lsns 2. Use this LSN directly in check_commit_timestamp_and_advance_slot to advance the origin, eliminating the dependency on lag_tracker and the timestamp→LSN conversion This approach is more reliable because it uses the authoritative LSN from slot creation - the exact point where the apply/sync process will begin decoding when the subscription is enabled.

Apply the same fix from commit 86acd7b to zodan.py: - Use LSN from pg_create_logical_replication_slot() - Advance slot to stored commit LSN instead of querying lag_tracker - Store both sync_lsn and commit_lsn for later use

* zodan.sql: add subscriptions pre-check. With this commit, we require that each node of the Spock cluster has only enabled subscriptions. There is no algorithmic reason to do that, because when adding the node, we are concerned only about the donor's state and the consistency of its progress table. However, such a strict procedure ensures that users can be sure they add a node to a healthy cluster, take action beforehand if it is not healthy, and that replication doesn't delay WAL cutting. Introduce a TAP test that could be used to check Z0DAN checks. XXX: add_node finishes its job with subscriptions in the 'initialising' state. Do we need to take any action here? Also, fix the annoying WARNING on the 'exception_replay_queue_size'. * Check subscription state in the wait_for_sync_event. If no group's subscription is enabled, wait_for_sync_event may get stuck in an infinite loop. Add a check of the subscription state in the waiting loop to reflect the fact that an apply worker may quietly die, deactivating its subscription. Also, add the correct processing of this behaviour to the Z0DAN and include a TAP test. Original cherry-pick fbe0cbc, modified to move sql file changes to sql/spock--5.0.4--5.0.5.sql

…ilt-in routine spock.spock_gen_slot_name().

May currently timeout, we have coverage with 090 and 010.

Do later separately

Unrelated Zodan change had been slipped in.

rasifr · 2026-02-09T10:11:51Z

samples/Z0DAN/zodan.py

            # Trigger sync event on new node
            sql = "SELECT spock.sync_event();"


Just noting here to update the comment later. The comment should be for source node.

mason-sharp requested a review from rasifr February 8, 2026 05:31

danolivo and others added 14 commits February 8, 2026 10:23

Block zodan if lolor is installed on new node (#245)

7708ba9

SPOC-307: Add lolor check to zodan.py health-check (#260)

fe18e96

Plain Z0DAN SYNC test for the 2+1 node configuration.

7e199ba

A node is added under high load of 'blind' UPDATE transactions. These transactions designed to update separate parts of the table on different nodes to avoid any conflict issues during the test.

Bug with not using source node to wait on sync (#281)

4778b53

Harmonize zodan.sql and zodan.py

6755a01

Add health check from zodan.py to zodan.sql Add verification from zodan.sql to zodan.py

Port Z0DAN sync LSN fix to Python script

ef7384b

Apply the same fix from commit 86acd7b to zodan.py: - Use LSN from pg_create_logical_replication_slot() - Advance slot to stored commit LSN instead of querying lag_tracker - Store both sync_lsn and commit_lsn for later use

Bugfix: Z0DAN should unify approach to build slot names by calling bu…

9eba089

…ilt-in routine spock.spock_gen_slot_name().

Comment out 012 in schedule

f511f11

May currently timeout, we have coverage with 090 and 010.

Remove sql/spock--5.0.4--5.0.5.sql

ce98ec9

Do later separately

mason-sharp force-pushed the task/v5-zodan-updates branch from e4b9965 to ce98ec9 Compare February 8, 2026 18:23

Avoid changing C files for 5.0.5

41681aa

Unrelated Zodan change had been slipped in.

rasifr reviewed Feb 9, 2026

View reviewed changes

rasifr approved these changes Feb 9, 2026

View reviewed changes

rasifr merged commit 1e9e9a1 into v5_STABLE Feb 9, 2026
6 checks passed

rasifr deleted the task/v5-zodan-updates branch February 9, 2026 12:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update Zodan for v5_STABLE #335

Update Zodan for v5_STABLE #335

Uh oh!

mason-sharp commented Feb 8, 2026

Uh oh!

coderabbitai bot commented Feb 8, 2026

Review skipped

Uh oh!

rasifr Feb 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		# Trigger sync event on new node
		sql = "SELECT spock.sync_event();"

Update Zodan for v5_STABLE #335

Update Zodan for v5_STABLE #335

Uh oh!

Conversation

mason-sharp commented Feb 8, 2026

Uh oh!

coderabbitai bot commented Feb 8, 2026

Review skipped

Uh oh!

rasifr Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants