Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 16 additions & 10 deletions site/docs/guides/backfilling-data.md
Original file line number Diff line number Diff line change
Expand Up @@ -281,30 +281,36 @@ Preventing backfills when possible can help save costs and computational resourc

In this case, many connectors allow you to turn off backfilling on a per-stream or per-table basis. See each individual connector's properties for details.

### Preventing backfills during database upgrades
### Preventing backfills during database upgrades and failovers

During an upgrade, some databases invalidate a replication slot, binlog position, CDC tables, or similar. As Estuary relies on these methods to keep its place, upgrades will disrupt the Estuary pipeline in these cases.
During an upgrade, some databases invalidate a replication slot, binlog position, CDC tables, or similar. As Estuary relies on these methods to keep its place, upgrades will disrupt the Estuary pipeline in these cases. The same is true of a failover or promotion that moves the capture onto a new writer - for example a standby promotion or a migration to a new instance - because the new writer does not carry the old writer's CDC position.

- If a database upgrade **will not** affect these resources, the Estuary connector should simply resume when the upgrade completes and no action is required.
- If a database upgrade **will** affect these or similar resources, you may need to trigger a backfill after the upgrade completes.
- If the operation **will not** affect these resources, the Estuary connector should simply resume when it completes and no action is required.
- If the operation **will** affect these or similar resources, you may need to trigger a backfill afterward.

The easiest and most bulletproof solution when this happens is to backfill all bindings of the impacted capture(s) after performing the upgrade. This will permit the captures to recreate entities as necessary, establish a new CDC position, and then backfill all table contents to ensure that any changes which might have occurred in the meantime are correctly captured.
The easiest and most bulletproof solution when this happens is to backfill all bindings of the impacted capture(s) afterward. This will permit the captures to recreate entities as necessary, establish a new CDC position, and then backfill all table contents to ensure that any changes which might have occurred in the meantime are correctly captured.

However, it is common to want to avoid a full backfill when performing this sort of database maintenance, as these backfills may take some time and require a significant amount of extra data movement even if nothing has actually changed. Some connectors provide features which may be used to accomplish this, however they typically require some amount of extra setup or user knowledge to guarantee certain invariants (put simply: if there were a more efficient way to re-establish consistency in the general case, that's what we would already be doing when asked to backfill the data again).

For example, Postgres currently deletes or requires users to drop logical replication slots during a major version upgrade. To prevent a full backfill during the upgrade, follow these steps:
For example, Postgres deletes or requires users to drop logical replication slots during a major version upgrade. To prevent a full backfill, follow these steps:

1. Pause database writes so no further changes can occur.

2. Monitor the current capture to ensure captures are fully up-to-date.
- These two steps ensure the connector won't miss any changes.

3. Perform the database upgrade.
3. Perform the database upgrade or failover.

4. Backfill all bindings of the capture using the ["Only Changes" backfill mode](#resource-configuration-backfill-modes) and make sure to select "Incremental Backfill (Advanced)" from the drop down.
- This will not cause a full backfill. "Backfilling" all bindings at once resets the WAL (Write-Ahead Log) position for the capture, essentially allowing it to "jump ahead" to the current end of the WAL. The "Only Changes" mode will skip re-reading existing table content. Incremental backfill will append new data to your current collection.
- This will not cause a full backfill. "Backfilling" all bindings at once resets the WAL (Write-Ahead Log) or binlog position for the capture, essentially allowing it to "jump ahead" to the current end. The "Only Changes" mode will skip re-reading existing table content. Incremental backfill will append new data to your current collection.
- **4a. Same host** (in-place upgrade): no other change is needed.
- **4b. New host** (failover, promotion, or migration to a new instance): in the same edit, also update the capture's `address` to the new writer's endpoint. Changing `address` on its own is not enough - the connector would try to resume from a stored CDC position that does not exist on the new server.

5. Resume database writes.
5. Resume database writes (on the new writer, if the host changed).

:::note
If the new writer is reachable over [AWS PrivateLink](/private-byoc/privatelink), you can register its endpoint with Estuary ahead of time so the DNS name is ready before you need it. See [Pre-registering an endpoint ahead of time](/private-byoc/privatelink#pre-registering-an-endpoint-ahead-of-time).
:::

## Resource configuration backfill modes

Expand All @@ -320,7 +326,7 @@ bindings:
```

:::warning
In general, you should not change this setting. Make sure you understand your use case, such as [preventing backfills](#preventing-backfills-during-database-upgrades).
In general, you should not change this setting. Make sure you understand your use case, such as [preventing backfills](#preventing-backfills-during-database-upgrades-and-failovers).
:::

The following modes are available:
Expand Down
4 changes: 4 additions & 0 deletions site/docs/private-byoc/privatelink.md
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,10 @@ When accessing services cross-region, you must use the **regional** DNS name (e.
* **Connection request never appears in your console**: check that Estuary's principal is on **Allow principals** *and* that the data plane region is in your endpoint service's Supported Regions list.
* **Connection accepted but the connector still fails to resolve the host**: verify the connector is using the *regional* DNS name returned by Estuary, not a zonal variant.

#### Pre-registering an endpoint ahead of time

You can register an additional endpoint service with Estuary before you need it, such as a standby or backup database kept ready for upgrades, failover, or disaster recovery. It sits idle without affecting your live connection, and the DNS name Estuary returns is stable, so cutting over is just a matter of repointing the capture's `address`. See [Preventing backfills during database upgrades and failovers](/reference/backfilling-data/#preventing-backfills-during-database-upgrades-and-failovers).

### Variations

Certain services may use AWS PrivateLink in unique ways. More detailed instructions for these services are provided below.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -320,6 +320,12 @@ The concern is that if a capture is disabled or the server becomes unreachable f

The `"binlog retention period is too short"` error should normally be fixed by setting a longer retention period as described in these setup instructions. However, advanced users who understand the risks can use the `skip_binlog_retention_check` configuration option to disable this safety.

### Failover and Host Changes

MySQL binlog coordinates are specific to each server, so they do not carry over when you fail over to a new writer, for example by promoting a standby. After a failover, the capture's stored position is invalid on the new writer and replication fails with `ERROR 1236`. Always capture from the **writer** endpoint; reader endpoints report `log_bin = OFF` and fail the prerequisite check.

If the failover is planned and you can pause writes, you can re-establish the capture on the new writer without a full backfill. See [Preventing backfills during database upgrades and failovers](/reference/backfilling-data/#preventing-backfills-during-database-upgrades-and-failovers).

### Empty Collection Key

Every Estuary collection must declare a [key](/concepts/collections.md#keys) which is used to group its documents. When testing your capture, if you encounter an error indicating collection key cannot be empty, you will need to either add a key to the table in your source, or manually edit the generated specification and specify keys for the collection before publishing to the catalog as documented [here](/concepts/collections.md#empty-keys).
Original file line number Diff line number Diff line change
Expand Up @@ -240,6 +240,8 @@ In this case, you may turn off backfilling on a per-table basis. See [properties

If the replication slot is dropped or invalidated — for example after a major version upgrade, a failover, or a WAL size limit being exceeded — the capture will fail and require manual recovery. See [PostgreSQL replication slot recovery](/guides/troubleshooting/postgres-replication-slot-recovery) for step-by-step instructions.

If the failover is planned and you can pause writes, you can re-establish the capture without a full backfill. See [Preventing backfills during database upgrades and failovers](/reference/backfilling-data/#preventing-backfills-during-database-upgrades-and-failovers).

## WAL Retention and Tuning Parameters

Postgres logical replication works by reading change events from the writeahead log,
Expand Down
Loading