-
Notifications
You must be signed in to change notification settings - Fork 45
Fix forward origin advance for cascade replication #328
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
rasifr
commented
Feb 3, 2026
ae2acce to
ffb3e75
Compare
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
📝 WalkthroughWalkthroughAdds optional origin names to the replication protocol and output API, exposes the slot-name generator, advances forwarded origins with caching in the apply path, changes get_node to accept a missing_ok flag, and adds a TAP test for forward-origin advancement. Changes
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@src/spock_output_plugin.c`:
- Around line 619-629: The current code calls get_node(txn->origin_id) and then
checks for NULL, but get_node never returns NULL and will ERROR on a bad id;
change the logic to avoid calling get_node with arbitrary txn->origin_id: first
verify that txn->origin_id maps to a valid Spock node using a safe lookup (or a
new helper like node_exists_for_reporigin/lookup_node_by_reporigin that returns
NULL on miss), and only call get_node (or retrieve SpockNode->name) when that
verification succeeds; if no mapping is found, set origin_name = NULL and call
data->api->write_origin(ctx->out, txn->origin_id, txn->origin_lsn, origin_name)
so forwarded transactions don’t crash the process.
In `@tests/tap/t/015_forward_origin_advance.pl`:
- Line 25: The test file currently declares an explicit plan with "use
Test::More tests => 22;" but also calls done_testing(), causing a plan vs
done_testing conflict; remove the explicit plan and change the import to simply
"use Test::More;" (or otherwise eliminate the "tests => 22" part) so the file
relies on done_testing() at the end (refer to the "use Test::More tests => 22;"
line and the done_testing() call).
|
|
||
| use strict; | ||
| use warnings; | ||
| use Test::More tests => 22; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# First, let's read the actual file to see its current state
head -30 tests/tap/t/015_forward_origin_advance.plRepository: pgEdge/spock
Length of output: 1366
🏁 Script executed:
# Check the file size to understand how much to read
wc -l tests/tap/t/015_forward_origin_advance.plRepository: pgEdge/spock
Length of output: 101
🏁 Script executed:
# Look for all test-related calls
grep -n "ok\|is\|done_testing\|Test::More" tests/tap/t/015_forward_origin_advance.pl | head -50Repository: pgEdge/spock
Length of output: 2101
🌐 Web query:
Test::More explicit plan tests => N with done_testing() conflict
💡 Result:
In Test::More/Test::Builder you may only declare a plan once per test process. If you do:
use Test::More tests => N; # (or: plan tests => N)
...
done_testing(); # <- second plan attempt / possible mismatchyou’ll get a “plan twice”/planning conflict because done_testing() will emit a plan only if one hasn’t already been set, and if you pass a number it will be checked against any existing numbered plan (and can fail if it contradicts). [1][2]
Fix options:
- Use an explicit plan (recommended when you know N): keep
use Test::More tests => N;(orplan tests => N;) and removedone_testing(). [1][2] - Use
done_testingas the plan (when N is unknown or computed late): douse Test::More;(notests => N), run tests, then calldone_testing();(ordone_testing($N)if you want to assert the expected count). [1][2]
Sources: [1] (Test::More docs), [2] (Test::Builder docs).
🏁 Script executed:
# Count all test assertion calls comprehensively
grep -cE "^\s*(ok|is|isnt|like|unlike|cmp_ok|can_ok|isa_ok|new_ok|subtest|pass|fail)" tests/tap/t/015_forward_origin_advance.plRepository: pgEdge/spock
Length of output: 58
🏁 Script executed:
# Get a full list with line numbers of all test assertions
grep -nE "^\s*(ok|is|isnt|like|unlike|cmp_ok|can_ok|isa_ok|new_ok|subtest|pass|fail)" tests/tap/t/015_forward_origin_advance.plRepository: pgEdge/spock
Length of output: 1003
Fix the Test::More plan vs done_testing conflict.
Line 25 sets an explicit plan (22 tests) while Line 184 calls done_testing(). Test::More cannot have both an explicit plan and done_testing() in the same test file—this creates a TAP validation failure. The file contains only 14 actual test assertions, so the plan count is incorrect. Remove the explicit plan to use done_testing() instead.
🛠️ Suggested fix
-use Test::More tests => 22;
+use Test::More;Also applies to: 184-184
🤖 Prompt for AI Agents
In `@tests/tap/t/015_forward_origin_advance.pl` at line 25, The test file
currently declares an explicit plan with "use Test::More tests => 22;" but also
calls done_testing(), causing a plan vs done_testing conflict; remove the
explicit plan and change the import to simply "use Test::More;" (or otherwise
eliminate the "tests => 22" part) so the file relies on done_testing() at the
end (refer to the "use Test::More tests => 22;" line and the done_testing()
call).
ffb3e75 to
b0097ae
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@src/spock_apply.c`:
- Around line 3945-4028: maybe_advance_forwarded_origin can throw after the main
transaction commits which causes apply_work to catch and replay an
already-committed transaction; wrap the body that does
replorigin_by_name/replorigin_create/replorigin_advance (the parts that call
StartTransactionCommand/CommitTransactionCommand and the calls to
replorigin_create and replorigin_advance) in a PG_TRY/PG_CATCH block so any
error is caught locally, log the error with elog (including context like
MySubscription->name and remote_origin_name), ensure any open transaction is
cleaned up in the PG_CATCH (call AbortCurrentTransaction or appropriate abort
helper if a transaction was started) and restore MemoryContext to
MessageContext, and do not rethrow the error so the apply loop is not disrupted.
🧹 Nitpick comments (1)
src/spock_apply.c (1)
3960-3975: Single-entry cache may thrash with multiple forwarded origins.If node C receives forwarded transactions from multiple origin nodes (e.g., A and D both forwarded through B), each origin switch causes a cache miss with a full transaction + origin lookup. Consider using a small hash table if multi-origin cascade topologies are expected.
This is fine for the initial implementation if the typical topology involves only one forwarded origin.
This test validates that when node B forwards transactions from node A to node C (with forward_origins='all'), node C properly creates and advances the replication origin for the original source (node A). The test: - Sets up a 3-node cascade topology: A -> B -> C - Creates data on node A that flows through B to C - Verifies the forwarded origin exists on C using slot name format - Checks that the origin LSN is properly advanced Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
8b9e4f7 to
b10ff0f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@include/spock_proto_native.h`:
- Around line 41-42: spock_write_origin and spock_read_origin currently
unconditionally write/read the flags byte causing protocol deserialization
errors for protocol 4; update both functions to gate emitting/consuming the
flags byte behind the protocol check by calling spock_get_proto_version() and
only writing/reading the flags when spock_get_proto_version() >= 5 (or
alternatively bump SPOCK_PROTO_MIN_VERSION_NUM if flags are required for all
supported versions), ensuring the flags handling is colocated in the functions
and tested against the protocol version.
🧹 Nitpick comments (3)
include/spock.h (1)
58-60: Parameter names differ between declaration and definition.The declaration uses
provider_name/subscriber_name, but the definition insrc/spock_functions.c(line 2878) usesprovider_node/subscription_name. While this compiles fine in C, it creates unnecessary confusion for anyone reading the header.♻️ Suggested fix: align names with the definition
-extern void gen_slot_name(Name slot_name, char *dbname, - const char *provider_name, - const char *subscriber_name); +extern void gen_slot_name(Name slot_name, char *dbname, + const char *provider_node, + const char *subscription_name);src/spock_functions.c (1)
194-196: Redundantexternforward declaration.This forward declaration at line 194 is redundant since
gen_slot_nameis already declaredexternininclude/spock.h(line 58), which is included at line 101 of this file. Consider removing the local forward declaration to avoid maintaining duplicate signatures.src/spock_apply.c (1)
4001-4013: Consider usingNoLockintable_closeto hold the lock until transaction commit.
table_close(replorigin_rel, RowExclusiveLock)releases the lock immediately. Between the lock release (line 4013) andCommitTransactionCommand()(line 4014), a concurrent session could attempt to create the same origin name. UsingNoLockwould hold the lock until the transaction commits, closing this TOCTOU window.Suggested fix
- table_close(replorigin_rel, RowExclusiveLock); + table_close(replorigin_rel, NoLock);
ibrarahmad
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR looks ok to me.
1 - I have not done any testing, just reviewed the code.
2 - Please take a look at the comment made by @coderabbitai.
When forwarding changes in cascade replication, include the origin name in ORIGIN messages so downstream subscribers can track the true origin. Changes: - Add origin_name parameter to spock_write_origin() and spock_read_origin() - Use ORIGIN_FLAG_HAS_NAME flag (0x01) to indicate presence of origin_name - Flags byte is always sent/read (backward compatible with protocol v4) - Origin name only included when protocol >= 5 and name is provided - Pass origin_name through handle_origin() to the output plugin Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
b10ff0f to
b6e6ddb
Compare