Hello, I am testing various conflict scenarios on an active-active setup. I encountered a situation involving a DROP COLUMN operation where the replication broke (as expected), but I was unable to recover it simply by re-adding the column.
Steps to Reproduce:
- I have a table test_table (id int primary key, data text) synced on both Node 1 and Node 2.
- On Node 1: I dropped the column: ALTER TABLE test_table DROP COLUMN data;
- On Node 2: Immediately after (while replication was active), I inserted data targeting that dropped column: INSERT INTO test_table (id, data) VALUES (1, 'val');
- Result: The apply worker on Node 1 failed with the expected error:
ERROR: data for dropped column
CONTEXT: apply INSERT from remote relation ...
- Attempted Fix: To recover the replication without skipping the transaction, I manually added the column back on Node 1:
ALTER TABLE test_table ADD COLUMN data text;
- Observation: Even after adding the column back with the exact same name and type, the Apply Worker continued to fail with the same data for dropped column error. It seems the worker did not recognize the new column because the attnum (attribute number) changed, or the worker was holding onto a stale schema cache.
Questions:
- Is this expected behavior that the Apply Worker does not automatically refresh its relation map when a column is re-added with the same name during a retry loop?
- I noticed that ordinal_position changed in information_schema after re-adding the column. Does pgactive (or logical replication in general) rely on column names or internal attribute IDs (attnum) during the apply phase?
- Is the only solution in this scenario to manually terminate/restart the Apply Worker process to force a schema cache refresh?
Any insights or best practices for recovering from this "accidental drop" scenario would be appreciated.
Thanks!
Environment:
- PostgreSQL Version: 17.6
- Extension: pgactive (latest)
- Setup: Active-Active replication between 2 nodes.
Hello, I am testing various conflict scenarios on an active-active setup. I encountered a situation involving a DROP COLUMN operation where the replication broke (as expected), but I was unable to recover it simply by re-adding the column.
Steps to Reproduce:
ERROR: data for dropped column
CONTEXT: apply INSERT from remote relation ...
ALTER TABLE test_table ADD COLUMN data text;
Questions:
Any insights or best practices for recovering from this "accidental drop" scenario would be appreciated.
Thanks!
Environment: