Fix forward origin advance for cascade replication #328

rasifr · 2026-02-03T13:16:43Z

When node B forwards transactions from node A to node C, the origin
tracking for the original source (A) was not being properly maintained
on C. This was disabled in commit f60484e because Spock node.id was
sent instead of PostgreSQL RepOriginId in the ORIGIN protocol message.

This fix:
- Extends the ORIGIN protocol message to include the origin name using
  a flag bit (ORIGIN_FLAG_HAS_NAME) for backward compatibility
- Uses the slot name format (spk_<db>_<provider>_<subscriber>) to
  lookup/create the forwarded origin, matching the naming convention
  used for direct subscriptions
- Caches the remote_origin_id -> local_origin_id mapping since the
  Spock node.id is consistent across the cluster
- Extracts the logic into maybe_advance_forwarded_origin() function

The fix enables proper cascade replication topology (A -> B -> C) with
forward_origins='all' where C can track the original commit LSN from A.

mason-sharp · 2026-02-03T18:33:11Z

@coderabbitai full review

coderabbitai · 2026-02-03T18:33:21Z

✅ Actions performed

Full review triggered.

coderabbitai · 2026-02-03T18:33:30Z

📝 Walkthrough

Walkthrough

Adds optional origin names to the replication protocol and output API, exposes the slot-name generator, advances forwarded origins with caching in the apply path, changes get_node to accept a missing_ok flag, and adds a TAP test for forward-origin advancement.

Changes

Cohort / File(s)	Summary
Public headers `include/spock.h`, `include/spock_output_proto.h`, `include/spock_proto_native.h`, `include/spock_node.h`	Declared `gen_slot_name` in the public header; updated `spock_write_origin_fn` to accept `origin_name`; added `ORIGIN_FLAG_HAS_NAME`; updated `spock_write_origin`/`spock_read_origin` signatures; changed `get_node` declaration to `get_node(Oid, bool missing_ok)`.
Protocol implementation `src/spock_proto_native.c`, `include/spock_proto_native.h`	Origin messages optionally carry `origin_name` guarded by `ORIGIN_FLAG_HAS_NAME`; `spock_write_origin`/`spock_read_origin` signatures updated and name memory handling added.
Output plugin / API usage `src/spock_output_plugin.c`, `include/spock_output_proto.h`	`pg_decode_begin_txn` looks up and passes `origin_name`; `write_origin` callback now invoked as `write_origin(..., origin_name)`.
Apply/forwarding logic `src/spock_apply.c`	Added `remote_origin_name`, cached forward-origin IDs, `maybe_advance_forwarded_origin(...)`; integrated forwarded-origin advancement into commit flow and improved origin-name memory lifecycle.
Slot-name function export `src/spock_functions.c`, `include/spock.h`	`gen_slot_name` changed from `static` to public (`extern`) and declared in header.
Node lookup behavior & callers `src/spock_node.c`, `src/spock_conflict.c`	`get_node` implementation updated to `get_node(Oid, bool missing_ok)` with callers adjusted; function may return NULL when `missing_ok` is true.
Tests `tests/tap/schedule`, `tests/tap/t/015_forward_origin_advance.pl`	Added schedule entry and a new TAP test exercising 3-node cascade forward-origin creation, slot-name format checks, LSN advancement, and gap detection.

Poem

🐇 I hop the WAL, I carry a name,

spk____{— a tiny claim.

Flags whisper whether the name is known,

cached hops remember where I've flown.

A slot, a name, a forward-bound song,

Little rabbit, forwarding along.}

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'Fix forward origin advance for cascade replication' clearly and specifically describes the main change: enabling proper origin tracking in cascade replication scenarios, matching the core objective of the PR.
Description check	✅ Passed	The description is well-related to the changeset, explaining the problem, the technical solution approach (protocol extension, slot naming, caching), and the expected outcome with cascade replication.
Docstring Coverage	✅ Passed	Docstring coverage is 86.67% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch task/SPOC-402/forwarded_origin_v2

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In `@src/spock_output_plugin.c`:
- Around line 619-629: The current code calls get_node(txn->origin_id) and then
checks for NULL, but get_node never returns NULL and will ERROR on a bad id;
change the logic to avoid calling get_node with arbitrary txn->origin_id: first
verify that txn->origin_id maps to a valid Spock node using a safe lookup (or a
new helper like node_exists_for_reporigin/lookup_node_by_reporigin that returns
NULL on miss), and only call get_node (or retrieve SpockNode->name) when that
verification succeeds; if no mapping is found, set origin_name = NULL and call
data->api->write_origin(ctx->out, txn->origin_id, txn->origin_lsn, origin_name)
so forwarded transactions don’t crash the process.

In `@tests/tap/t/015_forward_origin_advance.pl`:
- Line 25: The test file currently declares an explicit plan with "use
Test::More tests => 22;" but also calls done_testing(), causing a plan vs
done_testing conflict; remove the explicit plan and change the import to simply
"use Test::More;" (or otherwise eliminate the "tests => 22" part) so the file
relies on done_testing() at the end (refer to the "use Test::More tests => 22;"
line and the done_testing() call).

src/spock_output_plugin.c

coderabbitai · 2026-02-03T18:43:25Z

tests/tap/t/015_forward_origin_advance.pl

+
+use strict;
+use warnings;
+use Test::More tests => 22;


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's read the actual file to see its current state head -30 tests/tap/t/015_forward_origin_advance.pl

Repository: pgEdge/spock

Length of output: 1366

🏁 Script executed:

# Check the file size to understand how much to read wc -l tests/tap/t/015_forward_origin_advance.pl

Repository: pgEdge/spock

Length of output: 101

🏁 Script executed:

# Look for all test-related calls grep -n "ok\|is\|done_testing\|Test::More" tests/tap/t/015_forward_origin_advance.pl | head -50

Repository: pgEdge/spock

Length of output: 2101

🌐 Web query:

Test::More explicit plan tests => N with done_testing() conflict

💡 Result:

In Test::More/Test::Builder you may only declare a plan once per test process. If you do:

use Test::More tests => N; # (or: plan tests => N) ... done_testing(); # <- second plan attempt / possible mismatch

you’ll get a “plan twice”/planning conflict because done_testing() will emit a plan only if one hasn’t already been set, and if you pass a number it will be checked against any existing numbered plan (and can fail if it contradicts). [1][2]

Fix options:

Use an explicit plan (recommended when you know N): keep use Test::More tests => N; (or plan tests => N;) and remove done_testing(). [1][2]

Use done_testing as the plan (when N is unknown or computed late): do use Test::More; (no tests => N), run tests, then call done_testing(); (or done_testing($N) if you want to assert the expected count). [1][2]

Sources: [1] (Test::More docs), [2] (Test::Builder docs).

🏁 Script executed:

# Count all test assertion calls comprehensively grep -cE "^\s*(ok|is|isnt|like|unlike|cmp_ok|can_ok|isa_ok|new_ok|subtest|pass|fail)" tests/tap/t/015_forward_origin_advance.pl

Repository: pgEdge/spock

Length of output: 58

🏁 Script executed:

# Get a full list with line numbers of all test assertions grep -nE "^\s*(ok|is|isnt|like|unlike|cmp_ok|can_ok|isa_ok|new_ok|subtest|pass|fail)" tests/tap/t/015_forward_origin_advance.pl

Repository: pgEdge/spock

Length of output: 1003

Fix the Test::More plan vs done_testing conflict.

Line 25 sets an explicit plan (22 tests) while Line 184 calls done_testing(). Test::More cannot have both an explicit plan and done_testing() in the same test file—this creates a TAP validation failure. The file contains only 14 actual test assertions, so the plan count is incorrect. Remove the explicit plan to use done_testing() instead.

🛠️ Suggested fix

-use Test::More tests => 22; +use Test::More;

Also applies to: 184-184

🤖 Prompt for AI Agents

In `@tests/tap/t/015_forward_origin_advance.pl` at line 25, The test file currently declares an explicit plan with "use Test::More tests => 22;" but also calls done_testing(), causing a plan vs done_testing conflict; remove the explicit plan and change the import to simply "use Test::More;" (or otherwise eliminate the "tests => 22" part) so the file relies on done_testing() at the end (refer to the "use Test::More tests => 22;" line and the done_testing() call).

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@src/spock_apply.c`:
- Around line 3945-4028: maybe_advance_forwarded_origin can throw after the main
transaction commits which causes apply_work to catch and replay an
already-committed transaction; wrap the body that does
replorigin_by_name/replorigin_create/replorigin_advance (the parts that call
StartTransactionCommand/CommitTransactionCommand and the calls to
replorigin_create and replorigin_advance) in a PG_TRY/PG_CATCH block so any
error is caught locally, log the error with elog (including context like
MySubscription->name and remote_origin_name), ensure any open transaction is
cleaned up in the PG_CATCH (call AbortCurrentTransaction or appropriate abort
helper if a transaction was started) and restore MemoryContext to
MessageContext, and do not rethrow the error so the apply loop is not disrupted.

🧹 Nitpick comments (1)

src/spock_apply.c (1)

3960-3975: Single-entry cache may thrash with multiple forwarded origins.

If node C receives forwarded transactions from multiple origin nodes (e.g., A and D both forwarded through B), each origin switch causes a cache miss with a full transaction + origin lookup. Consider using a small hash table if multi-origin cascade topologies are expected.

This is fine for the initial implementation if the typical topology involves only one forwarded origin.

src/spock_apply.c

This test validates that when node B forwards transactions from node A to node C (with forward_origins='all'), node C properly creates and advances the replication origin for the original source (node A). The test: - Sets up a 3-node cascade topology: A -> B -> C - Creates data on node A that flows through B to C - Verifies the forwarded origin exists on C using slot name format - Checks that the origin LSN is properly advanced Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@include/spock_proto_native.h`:
- Around line 41-42: spock_write_origin and spock_read_origin currently
unconditionally write/read the flags byte causing protocol deserialization
errors for protocol 4; update both functions to gate emitting/consuming the
flags byte behind the protocol check by calling spock_get_proto_version() and
only writing/reading the flags when spock_get_proto_version() >= 5 (or
alternatively bump SPOCK_PROTO_MIN_VERSION_NUM if flags are required for all
supported versions), ensuring the flags handling is colocated in the functions
and tested against the protocol version.

🧹 Nitpick comments (3)

include/spock.h (1)
58-60: Parameter names differ between declaration and definition.

The declaration uses provider_name / subscriber_name, but the definition in src/spock_functions.c (line 2878) uses provider_node / subscription_name. While this compiles fine in C, it creates unnecessary confusion for anyone reading the header.
♻️ Suggested fix: align names with the definition
-extern void gen_slot_name(Name slot_name, char *dbname,
-						  const char *provider_name,
-						  const char *subscriber_name);
+extern void gen_slot_name(Name slot_name, char *dbname,
+						  const char *provider_node,
+						  const char *subscription_name);
src/spock_functions.c (1)

194-196: Redundant extern forward declaration.

This forward declaration at line 194 is redundant since gen_slot_name is already declared extern in include/spock.h (line 58), which is included at line 101 of this file. Consider removing the local forward declaration to avoid maintaining duplicate signatures.
src/spock_apply.c (1)
4001-4013: Consider using NoLock in table_close to hold the lock until transaction commit.

table_close(replorigin_rel, RowExclusiveLock) releases the lock immediately. Between the lock release (line 4013) and CommitTransactionCommand() (line 4014), a concurrent session could attempt to create the same origin name. Using NoLock would hold the lock until the transaction commits, closing this TOCTOU window.
Suggested fix
-		table_close(replorigin_rel, RowExclusiveLock);
+		table_close(replorigin_rel, NoLock);

include/spock_proto_native.h

ibrarahmad

The PR looks ok to me.

1 - I have not done any testing, just reviewed the code.
2 - Please take a look at the comment made by @coderabbitai.

When forwarding changes in cascade replication, include the origin name in ORIGIN messages so downstream subscribers can track the true origin. Changes: - Add origin_name parameter to spock_write_origin() and spock_read_origin() - Use ORIGIN_FLAG_HAS_NAME flag (0x01) to indicate presence of origin_name - Flags byte is always sent/read (backward compatible with protocol v4) - Origin name only included when protocol >= 5 and name is provided - Pass origin_name through handle_origin() to the output plugin Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

rasifr force-pushed the task/SPOC-402/forwarded_origin_v2 branch from ae2acce to ffb3e75 Compare February 3, 2026 14:35

coderabbitai bot reviewed Feb 3, 2026

View reviewed changes

mason-sharp requested a review from ibrarahmad February 4, 2026 05:13

rasifr force-pushed the task/SPOC-402/forwarded_origin_v2 branch from ffb3e75 to b0097ae Compare February 4, 2026 05:44

coderabbitai bot reviewed Feb 6, 2026

View reviewed changes

src/spock_apply.c Show resolved Hide resolved

rasifr force-pushed the task/SPOC-402/forwarded_origin_v2 branch from 8b9e4f7 to b10ff0f Compare February 6, 2026 06:37

coderabbitai bot reviewed Feb 6, 2026

View reviewed changes

include/spock_proto_native.h Show resolved Hide resolved

ibrarahmad approved these changes Feb 9, 2026

View reviewed changes

rasifr force-pushed the task/SPOC-402/forwarded_origin_v2 branch from b10ff0f to b6e6ddb Compare February 9, 2026 14:40

mason-sharp merged commit 1c3ca20 into main Feb 10, 2026
15 of 17 checks passed

mason-sharp deleted the task/SPOC-402/forwarded_origin_v2 branch February 10, 2026 17:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix forward origin advance for cascade replication #328

Fix forward origin advance for cascade replication #328

Uh oh!

rasifr commented Feb 3, 2026

Uh oh!

mason-sharp commented Feb 3, 2026

Uh oh!

coderabbitai bot commented Feb 3, 2026

Uh oh!

coderabbitai bot commented Feb 3, 2026 •

edited

Loading

Walkthrough

Changes

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot Feb 3, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

ibrarahmad left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix forward origin advance for cascade replication #328

Fix forward origin advance for cascade replication #328

Uh oh!

Conversation

rasifr commented Feb 3, 2026

Uh oh!

mason-sharp commented Feb 3, 2026

Uh oh!

coderabbitai bot commented Feb 3, 2026

Uh oh!

coderabbitai bot commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ibrarahmad left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

coderabbitai bot commented Feb 3, 2026 •

edited

Loading