You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: install_template/templates/products/failover-manager/base.njk
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -21,7 +21,7 @@ redirects:
21
21
{{super() }}
22
22
{%endblockproduct_prerequisites%}
23
23
{%blockpostinstall%}
24
-
Where `<5x>` is the version of Failover Manager that you're installing. For example, if you're installing version 5.1, the package name is `edb-efm51`.
24
+
Where `<5x>` is the version of Failover Manager that you're installing. For example, if you're installing version 5.2, the package name is `edb-efm52`.
25
25
26
26
The installation process creates a user named efm that has privileges to invoke scripts that control the Failover Manager service for clusters owned by enterprisedb or postgres.
After copying the template file, change the owner of the file to efm:
@@ -29,7 +29,7 @@ After copying the template file, change the owner of the file to efm:
29
29
30
30
After creating the cluster properties file, add or modify configuration parameter values as required. For detailed information about each property, see [Specifying cluster properties](#specifying-cluster-properties).
31
31
32
-
The property files are owned by root. The Failover Manager service script expects to find the files in the `/etc/edb/efm-5.<x>` directory. If you move the property file to another location, you must create a symbolic link that specifies the new location.
32
+
The Failover Manager service script expects to find the files in the `/etc/edb/efm-5.<x>` directory. If you move the property file to another location, you must create a symbolic link that specifies the new location.
33
33
34
34
!!! Note
35
35
All user scripts referenced in the properties file are invoked as the Failover Manager user.
@@ -96,6 +96,7 @@ Use the properties in the `efm.properties` file to specify connection, administr
96
96
|[auto.failover](#auto_failover)| Y | Y | true ||
97
97
|[auto.reconfigure](#auto_reconfigure)| Y || true | This value must be same for all the agents. |
98
98
|[auto.rewind](#auto_rewind)| Y || false ||
99
+
|[auto.basebackup](#auto_basebackup)| Y || false ||
99
100
|[promotable](#promotable)| Y || true ||
100
101
|[use.replay.tiebreaker](#use_replay_tiebreaker)| Y | Y | true | This value must be same for all the agents. |
When the `auto.rewind` property is set to `true`, the agent will attempt to reconfigure a failed or replaced primary database as a standby, running`pg_rewind`if necessary. Some cases that apply:
704
+
When the `auto.rewind`and/or `auto.basebackup`property is set to `true`, the agent will attempt to reconfigure a failed or replaced primary database as a standby using`pg_rewind`or `pg_basebackup`. Some cases that apply:
703
705
- A primary database failure: when the agents are notified to reconfigure for the new primary, the original primary agent will check to see if it should rebuild.
704
706
- An isolated primary node: when the node is reconnected to the cluster, the primary agent will check to see if it has been replaced by a newer primary and if it should rebuild (or resume as the primary if there was not a promotion).
705
707
- On startup: if the agent sees that there is already a primary database in the cluster, and the local database is not configured to be a standby, it will check to see if it should rebuild.
706
708
707
-
If the agent sees that it should rebuild, it will collect current database configuration settings, run `pg_rewind` with the `--dry-run` option to see if a rewind is needed, rewind if indicated, and reconfigure the database as a standby before resuming monitoring.
709
+
If the agent sees that it should rebuild, it will collect current database configuration settings and perform the following:
710
+
- If `auto.rewind` is set to true, the agent will run `pg_rewind` with the `--dry-run` option to see if a rewind is needed, rewind if indicated, and reconfigure the database as a standby before resuming monitoring. If there is an error running `pg_rewind` and `auto.basebackup` is set to true, the agent will rebuild with `pg_basebackup`.
711
+
- If `auto.rewind` is set to false and `auto.basebackup` is set to true, the agent will use `pg_basebackup` to attempt to rebuild the database and resume monitoring.
712
+
713
+
!!! Note
714
+
Use this feature with caution, as it is intended for use cases where it is necessary to automatically bring the failed node back into the cluster, and where the cause of the failure is known and predictable. There may be conditions where EFM is unable to rebuild the failed primary, and manual intervention is still required.
708
715
709
716
```ini
710
-
# Set to true to have this agent attempt to rebuild a failed
711
-
# primary database as a standby after failover. The agent
712
-
# will use pg_rewind if necessary to have the database follow
713
-
# the new primary. See the user's guide for more information.
717
+
# Set either or both of these properties to true to have this agent
718
+
# attempt to reconfigure a failed primary database as a standby after
719
+
# failover. If both properties are set to true, the agent will attempt
720
+
# to use pg_rewind first, and then pg_basebackup if the rewind fails.
721
+
# See the user's guide for more information.
714
722
auto.rewind=false
723
+
auto.basebackup=false
715
724
```
716
725
!!! Note
717
726
Since auto.rewind uses pg_rewind internally, all prerequisites for [pg_rewind](https://www.postgresql.org/docs/current/app-pgrewind.html) should be fulfilled before setting up this parameter. This means, you may whether have to [set wal_log_hints](https://www.postgresql.org/docs/current/runtime-config-wal.html#GUC-WAL-LOG-HINTS) to `on` or enable [data_checksums](https://www.postgresql.org/docs/current/checksums.html) manually.
@@ -1039,7 +1048,7 @@ The `release.vip.*` properties can be used to control the timing of when the VIP
1039
1048
```ini
1040
1049
# In certain networks, there can be errors trying to connect to remote databases
1041
1050
# at the same time the VIP is being released (i.e. on the primary node during a
1042
-
# switchover). Set the delete.vip.background property to false to have the agent
1051
+
# switchover). Set the release.vip.background property to false to have the agent
1043
1052
# pause while the VIP is being released. The pre and post wait periods can add
1044
1053
# time (in seconds) to wait before and after the VIP is released in case there
After receiving your encrypted password, paste the password into the properties file and start the Failover Manager service. If there's a problem with the encrypted password, the Failover Manager service doesn't start:
Job for edb-efm-5.1.service failed because the control process exited with error code. See "systemctl status edb-efm-5.1.service" and "journalctl -xe" for details.
Job for edb-efm-5.2.service failed because the control process exited with error code. See "systemctl status edb-efm-5.2.service" and "journalctl -xe" for details.
54
54
```
55
55
56
56
If you receive this message when starting the Failover Manager service with version 4.x instead of 5.x, see the startup log `/var/log/efm-4.<x>/startup-efm.log` for more information.
Then, modify the cluster properties file, providing the name of the user in the `db.service.owner` property. Also make sure that the `db.service.name` property is blank. Without sudo, you can't run services without root access.
98
98
99
99
After modifying the configuration, the new user can control Failover Manager with the following command:
Where `<directory/cluster_name.properties>` specifies the full path of the cluster properties file. The user provides the full path to the properties file whenever the nondefault user is controlling agents or using the `efm` script.
Copy file name to clipboardExpand all lines: product_docs/docs/efm/5/05_using_efm.mdx
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -273,11 +273,11 @@ After creating the `acctg.properties` and `sales.properties` files, create a ser
273
273
274
274
If you're using RHEL/Rocky Linux/AlmaLinux 8.x or later, copy the service file `/usr/lib/systemd/system/edb-efm-5.<x>.service` to `/etc/systemd/system` with a new name that's unique for each cluster.
275
275
276
-
For example, if you have two clusters named `acctg` and `sales` managed by Failover Manager 5.1, the unit file names might be `efm-acctg.service` and `efm-sales.service`. You can create them with:
276
+
For example, if you have two clusters named `acctg` and `sales` managed by Failover Manager 5.2, the unit file names might be `efm-acctg.service` and `efm-sales.service`. You can create them with:
Copy file name to clipboardExpand all lines: product_docs/docs/efm/5/08_controlling_efm_service.mdx
+22-22Lines changed: 22 additions & 22 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -40,26 +40,26 @@ Stop the Failover Manager on the current node. This command must be invoked by r
40
40
The `status` command returns the status of the Failover Manager agent on which it is invoked. You can invoke the status command on any node to instruct Failover Manager to return status and server startup information.
0 commit comments