Skip to content

Conversation

@rasifr
Copy link
Member

@rasifr rasifr commented Feb 3, 2026

When node B forwards transactions from node A to node C, the origin
tracking for the original source (A) was not being properly maintained
on C. This was disabled in commit f60484e because Spock node.id was
sent instead of PostgreSQL RepOriginId in the ORIGIN protocol message.

This fix:
- Extends the ORIGIN protocol message to include the origin name using
  a flag bit (ORIGIN_FLAG_HAS_NAME) for backward compatibility
- Uses the slot name format (spk_<db>_<provider>_<subscriber>) to
  lookup/create the forwarded origin, matching the naming convention
  used for direct subscriptions
- Caches the remote_origin_id -> local_origin_id mapping since the
  Spock node.id is consistent across the cluster
- Extracts the logic into maybe_advance_forwarded_origin() function

The fix enables proper cascade replication topology (A -> B -> C) with
forward_origins='all' where C can track the original commit LSN from A.

@rasifr rasifr force-pushed the task/SPOC-402/forwarded_origin_v2 branch from ae2acce to ffb3e75 Compare February 3, 2026 14:35
@mason-sharp
Copy link
Member

@coderabbitai full review

@coderabbitai
Copy link

coderabbitai bot commented Feb 3, 2026

✅ Actions performed

Full review triggered.

@coderabbitai
Copy link

coderabbitai bot commented Feb 3, 2026

📝 Walkthrough

Walkthrough

Adds optional origin names to the replication protocol and output API, exposes the slot-name generator, advances forwarded origins with caching in the apply path, changes get_node to accept a missing_ok flag, and adds a TAP test for forward-origin advancement.

Changes

Cohort / File(s) Summary
Public headers
include/spock.h, include/spock_output_proto.h, include/spock_proto_native.h, include/spock_node.h
Declared gen_slot_name in the public header; updated spock_write_origin_fn to accept origin_name; added ORIGIN_FLAG_HAS_NAME; updated spock_write_origin/spock_read_origin signatures; changed get_node declaration to get_node(Oid, bool missing_ok).
Protocol implementation
src/spock_proto_native.c, include/spock_proto_native.h
Origin messages optionally carry origin_name guarded by ORIGIN_FLAG_HAS_NAME; spock_write_origin/spock_read_origin signatures updated and name memory handling added.
Output plugin / API usage
src/spock_output_plugin.c, include/spock_output_proto.h
pg_decode_begin_txn looks up and passes origin_name; write_origin callback now invoked as write_origin(..., origin_name).
Apply/forwarding logic
src/spock_apply.c
Added remote_origin_name, cached forward-origin IDs, maybe_advance_forwarded_origin(...); integrated forwarded-origin advancement into commit flow and improved origin-name memory lifecycle.
Slot-name function export
src/spock_functions.c, include/spock.h
gen_slot_name changed from static to public (extern) and declared in header.
Node lookup behavior & callers
src/spock_node.c, src/spock_conflict.c
get_node implementation updated to get_node(Oid, bool missing_ok) with callers adjusted; function may return NULL when missing_ok is true.
Tests
tests/tap/schedule, tests/tap/t/015_forward_origin_advance.pl
Added schedule entry and a new TAP test exercising 3-node cascade forward-origin creation, slot-name format checks, LSN advancement, and gap detection.

Poem

🐇 I hop the WAL, I carry a name,

spk___ — a tiny claim.
Flags whisper whether the name is known,
cached hops remember where I've flown.
A slot, a name, a forward-bound song,
Little rabbit, forwarding along.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Fix forward origin advance for cascade replication' clearly and specifically describes the main change: enabling proper origin tracking in cascade replication scenarios, matching the core objective of the PR.
Description check ✅ Passed The description is well-related to the changeset, explaining the problem, the technical solution approach (protocol extension, slot naming, caching), and the expected outcome with cascade replication.
Docstring Coverage ✅ Passed Docstring coverage is 86.67% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch task/SPOC-402/forwarded_origin_v2

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@src/spock_output_plugin.c`:
- Around line 619-629: The current code calls get_node(txn->origin_id) and then
checks for NULL, but get_node never returns NULL and will ERROR on a bad id;
change the logic to avoid calling get_node with arbitrary txn->origin_id: first
verify that txn->origin_id maps to a valid Spock node using a safe lookup (or a
new helper like node_exists_for_reporigin/lookup_node_by_reporigin that returns
NULL on miss), and only call get_node (or retrieve SpockNode->name) when that
verification succeeds; if no mapping is found, set origin_name = NULL and call
data->api->write_origin(ctx->out, txn->origin_id, txn->origin_lsn, origin_name)
so forwarded transactions don’t crash the process.

In `@tests/tap/t/015_forward_origin_advance.pl`:
- Line 25: The test file currently declares an explicit plan with "use
Test::More tests => 22;" but also calls done_testing(), causing a plan vs
done_testing conflict; remove the explicit plan and change the import to simply
"use Test::More;" (or otherwise eliminate the "tests => 22" part) so the file
relies on done_testing() at the end (refer to the "use Test::More tests => 22;"
line and the done_testing() call).


use strict;
use warnings;
use Test::More tests => 22;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's read the actual file to see its current state
head -30 tests/tap/t/015_forward_origin_advance.pl

Repository: pgEdge/spock

Length of output: 1366


🏁 Script executed:

# Check the file size to understand how much to read
wc -l tests/tap/t/015_forward_origin_advance.pl

Repository: pgEdge/spock

Length of output: 101


🏁 Script executed:

# Look for all test-related calls
grep -n "ok\|is\|done_testing\|Test::More" tests/tap/t/015_forward_origin_advance.pl | head -50

Repository: pgEdge/spock

Length of output: 2101


🌐 Web query:

Test::More explicit plan tests => N with done_testing() conflict

💡 Result:

In Test::More/Test::Builder you may only declare a plan once per test process. If you do:

use Test::More tests => N;   # (or: plan tests => N)
...
done_testing();              # <- second plan attempt / possible mismatch

you’ll get a “plan twice”/planning conflict because done_testing() will emit a plan only if one hasn’t already been set, and if you pass a number it will be checked against any existing numbered plan (and can fail if it contradicts). [1][2]

Fix options:

  • Use an explicit plan (recommended when you know N): keep use Test::More tests => N; (or plan tests => N;) and remove done_testing(). [1][2]
  • Use done_testing as the plan (when N is unknown or computed late): do use Test::More; (no tests => N), run tests, then call done_testing(); (or done_testing($N) if you want to assert the expected count). [1][2]

Sources: [1] (Test::More docs), [2] (Test::Builder docs).


🏁 Script executed:

# Count all test assertion calls comprehensively
grep -cE "^\s*(ok|is|isnt|like|unlike|cmp_ok|can_ok|isa_ok|new_ok|subtest|pass|fail)" tests/tap/t/015_forward_origin_advance.pl

Repository: pgEdge/spock

Length of output: 58


🏁 Script executed:

# Get a full list with line numbers of all test assertions
grep -nE "^\s*(ok|is|isnt|like|unlike|cmp_ok|can_ok|isa_ok|new_ok|subtest|pass|fail)" tests/tap/t/015_forward_origin_advance.pl

Repository: pgEdge/spock

Length of output: 1003


Fix the Test::More plan vs done_testing conflict.

Line 25 sets an explicit plan (22 tests) while Line 184 calls done_testing(). Test::More cannot have both an explicit plan and done_testing() in the same test file—this creates a TAP validation failure. The file contains only 14 actual test assertions, so the plan count is incorrect. Remove the explicit plan to use done_testing() instead.

🛠️ Suggested fix
-use Test::More tests => 22;
+use Test::More;

Also applies to: 184-184

🤖 Prompt for AI Agents
In `@tests/tap/t/015_forward_origin_advance.pl` at line 25, The test file
currently declares an explicit plan with "use Test::More tests => 22;" but also
calls done_testing(), causing a plan vs done_testing conflict; remove the
explicit plan and change the import to simply "use Test::More;" (or otherwise
eliminate the "tests => 22" part) so the file relies on done_testing() at the
end (refer to the "use Test::More tests => 22;" line and the done_testing()
call).

@rasifr rasifr force-pushed the task/SPOC-402/forwarded_origin_v2 branch from ffb3e75 to b0097ae Compare February 4, 2026 05:44
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@src/spock_apply.c`:
- Around line 3945-4028: maybe_advance_forwarded_origin can throw after the main
transaction commits which causes apply_work to catch and replay an
already-committed transaction; wrap the body that does
replorigin_by_name/replorigin_create/replorigin_advance (the parts that call
StartTransactionCommand/CommitTransactionCommand and the calls to
replorigin_create and replorigin_advance) in a PG_TRY/PG_CATCH block so any
error is caught locally, log the error with elog (including context like
MySubscription->name and remote_origin_name), ensure any open transaction is
cleaned up in the PG_CATCH (call AbortCurrentTransaction or appropriate abort
helper if a transaction was started) and restore MemoryContext to
MessageContext, and do not rethrow the error so the apply loop is not disrupted.
🧹 Nitpick comments (1)
src/spock_apply.c (1)

3960-3975: Single-entry cache may thrash with multiple forwarded origins.

If node C receives forwarded transactions from multiple origin nodes (e.g., A and D both forwarded through B), each origin switch causes a cache miss with a full transaction + origin lookup. Consider using a small hash table if multi-origin cascade topologies are expected.

This is fine for the initial implementation if the typical topology involves only one forwarded origin.

This test validates that when node B forwards transactions from node A
to node C (with forward_origins='all'), node C properly creates and
advances the replication origin for the original source (node A).

The test:
- Sets up a 3-node cascade topology: A -> B -> C
- Creates data on node A that flows through B to C
- Verifies the forwarded origin exists on C using slot name format
- Checks that the origin LSN is properly advanced

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@rasifr rasifr force-pushed the task/SPOC-402/forwarded_origin_v2 branch from 8b9e4f7 to b10ff0f Compare February 6, 2026 06:37
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@include/spock_proto_native.h`:
- Around line 41-42: spock_write_origin and spock_read_origin currently
unconditionally write/read the flags byte causing protocol deserialization
errors for protocol 4; update both functions to gate emitting/consuming the
flags byte behind the protocol check by calling spock_get_proto_version() and
only writing/reading the flags when spock_get_proto_version() >= 5 (or
alternatively bump SPOCK_PROTO_MIN_VERSION_NUM if flags are required for all
supported versions), ensuring the flags handling is colocated in the functions
and tested against the protocol version.
🧹 Nitpick comments (3)
include/spock.h (1)

58-60: Parameter names differ between declaration and definition.

The declaration uses provider_name / subscriber_name, but the definition in src/spock_functions.c (line 2878) uses provider_node / subscription_name. While this compiles fine in C, it creates unnecessary confusion for anyone reading the header.

♻️ Suggested fix: align names with the definition
-extern void gen_slot_name(Name slot_name, char *dbname,
-						  const char *provider_name,
-						  const char *subscriber_name);
+extern void gen_slot_name(Name slot_name, char *dbname,
+						  const char *provider_node,
+						  const char *subscription_name);
src/spock_functions.c (1)

194-196: Redundant extern forward declaration.

This forward declaration at line 194 is redundant since gen_slot_name is already declared extern in include/spock.h (line 58), which is included at line 101 of this file. Consider removing the local forward declaration to avoid maintaining duplicate signatures.

src/spock_apply.c (1)

4001-4013: Consider using NoLock in table_close to hold the lock until transaction commit.

table_close(replorigin_rel, RowExclusiveLock) releases the lock immediately. Between the lock release (line 4013) and CommitTransactionCommand() (line 4014), a concurrent session could attempt to create the same origin name. Using NoLock would hold the lock until the transaction commits, closing this TOCTOU window.

Suggested fix
-		table_close(replorigin_rel, RowExclusiveLock);
+		table_close(replorigin_rel, NoLock);

Copy link
Contributor

@ibrarahmad ibrarahmad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR looks ok to me.

1 - I have not done any testing, just reviewed the code.
2 - Please take a look at the comment made by @coderabbitai.

When forwarding changes in cascade replication, include the origin name
in ORIGIN messages so downstream subscribers can track the true origin.

Changes:
- Add origin_name parameter to spock_write_origin() and spock_read_origin()
- Use ORIGIN_FLAG_HAS_NAME flag (0x01) to indicate presence of origin_name
- Flags byte is always sent/read (backward compatible with protocol v4)
- Origin name only included when protocol >= 5 and name is provided
- Pass origin_name through handle_origin() to the output plugin

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@rasifr rasifr force-pushed the task/SPOC-402/forwarded_origin_v2 branch from b10ff0f to b6e6ddb Compare February 9, 2026 14:40
@mason-sharp mason-sharp merged commit 1c3ca20 into main Feb 10, 2026
15 of 17 checks passed
@mason-sharp mason-sharp deleted the task/SPOC-402/forwarded_origin_v2 branch February 10, 2026 17:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants