Skip to content

Commit 7eb1945

Browse files
authored
Merge pull request #489 from EnterpriseDB/efm/v5_2_changes
EFM 5.2 changes.
2 parents 227a6aa + 8638fc3 commit 7eb1945

26 files changed

+115
-94
lines changed

install_template/templates/products/failover-manager/base.njk

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ redirects:
2121
{{ super() }}
2222
{% endblock product_prerequisites %}
2323
{% block postinstall %}
24-
Where `<5x>` is the version of Failover Manager that you're installing. For example, if you're installing version 5.1, the package name is `edb-efm51`.
24+
Where `<5x>` is the version of Failover Manager that you're installing. For example, if you're installing version 5.2, the package name is `edb-efm52`.
2525

2626
The installation process creates a user named efm that has privileges to invoke scripts that control the Failover Manager service for clusters owned by enterprisedb or postgres.
2727

product_docs/docs/efm/5/04_configuring_efm/01_cluster_properties.mdx

Lines changed: 18 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ Each node in a Failover Manager cluster has a properties file (by default, named
1515
After completing the Failover Manager installation, make a working copy of the template before modifying the file contents:
1616

1717
```text
18-
# cp /etc/edb/efm-5.1/efm.properties.in /etc/edb/efm-5.1/efm.properties
18+
# cp /etc/edb/efm-5.2/efm.properties.in /etc/edb/efm-5.2/efm.properties
1919
```
2020

2121
After copying the template file, change the owner of the file to efm:
@@ -29,7 +29,7 @@ After copying the template file, change the owner of the file to efm:
2929

3030
After creating the cluster properties file, add or modify configuration parameter values as required. For detailed information about each property, see [Specifying cluster properties](#specifying-cluster-properties).
3131

32-
The property files are owned by root. The Failover Manager service script expects to find the files in the `/etc/edb/efm-5.<x>` directory. If you move the property file to another location, you must create a symbolic link that specifies the new location.
32+
The Failover Manager service script expects to find the files in the `/etc/edb/efm-5.<x>` directory. If you move the property file to another location, you must create a symbolic link that specifies the new location.
3333

3434
!!! Note
3535
All user scripts referenced in the properties file are invoked as the Failover Manager user.
@@ -96,6 +96,7 @@ Use the properties in the `efm.properties` file to specify connection, administr
9696
| [auto.failover](#auto_failover) | Y | Y | true | |
9797
| [auto.reconfigure](#auto_reconfigure) | Y | | true | This value must be same for all the agents. |
9898
| [auto.rewind](#auto_rewind) | Y | | false | |
99+
| [auto.basebackup](#auto_basebackup) | Y | | false | |
99100
| [promotable](#promotable) | Y | | true | |
100101
| [use.replay.tiebreaker](#use_replay_tiebreaker) | Y | Y | true | This value must be same for all the agents. |
101102
| [standby.restart.delay](#standby_restart_delay) | | | 0 | |
@@ -698,20 +699,28 @@ auto.reconfigure=true
698699
`primary_conninfo` is a space-delimited list of keyword=value pairs.
699700

700701
<div id="auto_rewind" class="registered_link"></div>
702+
<div id="auto_basebackup" class="registered_link"></div>
701703

702-
When the `auto.rewind` property is set to `true`, the agent will attempt to reconfigure a failed or replaced primary database as a standby, running `pg_rewind` if necessary. Some cases that apply:
704+
When the `auto.rewind` and/or `auto.basebackup` property is set to `true`, the agent will attempt to reconfigure a failed or replaced primary database as a standby using `pg_rewind` or `pg_basebackup`. Some cases that apply:
703705
- A primary database failure: when the agents are notified to reconfigure for the new primary, the original primary agent will check to see if it should rebuild.
704706
- An isolated primary node: when the node is reconnected to the cluster, the primary agent will check to see if it has been replaced by a newer primary and if it should rebuild (or resume as the primary if there was not a promotion).
705707
- On startup: if the agent sees that there is already a primary database in the cluster, and the local database is not configured to be a standby, it will check to see if it should rebuild.
706708

707-
If the agent sees that it should rebuild, it will collect current database configuration settings, run `pg_rewind` with the `--dry-run` option to see if a rewind is needed, rewind if indicated, and reconfigure the database as a standby before resuming monitoring.
709+
If the agent sees that it should rebuild, it will collect current database configuration settings and perform the following:
710+
- If `auto.rewind` is set to true, the agent will run `pg_rewind` with the `--dry-run` option to see if a rewind is needed, rewind if indicated, and reconfigure the database as a standby before resuming monitoring. If there is an error running `pg_rewind` and `auto.basebackup` is set to true, the agent will rebuild with `pg_basebackup`.
711+
- If `auto.rewind` is set to false and `auto.basebackup` is set to true, the agent will use `pg_basebackup` to attempt to rebuild the database and resume monitoring.
712+
713+
!!! Note
714+
Use this feature with caution, as it is intended for use cases where it is necessary to automatically bring the failed node back into the cluster, and where the cause of the failure is known and predictable. There may be conditions where EFM is unable to rebuild the failed primary, and manual intervention is still required.
708715

709716
```ini
710-
# Set to true to have this agent attempt to rebuild a failed
711-
# primary database as a standby after failover. The agent
712-
# will use pg_rewind if necessary to have the database follow
713-
# the new primary. See the user's guide for more information.
717+
# Set either or both of these properties to true to have this agent
718+
# attempt to reconfigure a failed primary database as a standby after
719+
# failover. If both properties are set to true, the agent will attempt
720+
# to use pg_rewind first, and then pg_basebackup if the rewind fails.
721+
# See the user's guide for more information.
714722
auto.rewind=false
723+
auto.basebackup=false
715724
```
716725
!!! Note
717726
Since auto.rewind uses pg_rewind internally, all prerequisites for [pg_rewind](https://www.postgresql.org/docs/current/app-pgrewind.html) should be fulfilled before setting up this parameter. This means, you may whether have to [set wal_log_hints](https://www.postgresql.org/docs/current/runtime-config-wal.html#GUC-WAL-LOG-HINTS) to `on` or enable [data_checksums](https://www.postgresql.org/docs/current/checksums.html) manually.
@@ -1039,7 +1048,7 @@ The `release.vip.*` properties can be used to control the timing of when the VIP
10391048
```ini
10401049
# In certain networks, there can be errors trying to connect to remote databases
10411050
# at the same time the VIP is being released (i.e. on the primary node during a
1042-
# switchover). Set the delete.vip.background property to false to have the agent
1051+
# switchover). Set the release.vip.background property to false to have the agent
10431052
# pause while the VIP is being released. The pre and post wait periods can add
10441053
# time (in seconds) to wait before and after the VIP is released in case there
10451054
# are other network effects that require them.

product_docs/docs/efm/5/04_configuring_efm/02_encrypting_database_password.mdx

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ This example shows using the `encrypt` utility to encrypt a password for the `ac
3535
# efm encrypt acctg
3636
This utility will generate an encrypted password for you to place in
3737
your Failover Manager cluster property file:
38-
/etc/edb/efm-5.1/acctg.properties
38+
/etc/edb/efm-5.2/acctg.properties
3939
Please enter the password and hit enter:
4040
Please enter the password again to confirm:
4141
The encrypted password is: 516b36fb8031da17cfbc010f7d09359c
@@ -49,8 +49,8 @@ db.password.encrypted=516b36fb8031da17cfbc010f7d09359c
4949
After receiving your encrypted password, paste the password into the properties file and start the Failover Manager service. If there's a problem with the encrypted password, the Failover Manager service doesn't start:
5050

5151
```text
52-
[witness@localhost ~]# systemctl start edb-efm-5.1
53-
Job for edb-efm-5.1.service failed because the control process exited with error code. See "systemctl status edb-efm-5.1.service" and "journalctl -xe" for details.
52+
[witness@localhost ~]# systemctl start edb-efm-5.2
53+
Job for edb-efm-5.2.service failed because the control process exited with error code. See "systemctl status edb-efm-5.2.service" and "journalctl -xe" for details.
5454
```
5555

5656
If you receive this message when starting the Failover Manager service with version 4.x instead of 5.x, see the startup log `/var/log/efm-4.<x>/startup-efm.log` for more information.

product_docs/docs/efm/5/04_configuring_efm/03_cluster_members.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ Each node in a Failover Manager cluster has a cluster members file (by default n
1515
After completing the Failover Manager installation, make a working copy of the template:
1616

1717
```shell
18-
cp /etc/edb/efm-5.1/efm.nodes.in /etc/edb/efm-5.1/efm.nodes
18+
cp /etc/edb/efm-5.2/efm.nodes.in /etc/edb/efm-5.2/efm.nodes
1919
```
2020

2121
After copying the template file, change the owner of the file to efm:

product_docs/docs/efm/5/04_configuring_efm/04_extending_efm_permissions.mdx

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -36,18 +36,18 @@ The `efm-50` file is located in `/etc/sudoers.d` and contains the following entr
3636
# If you run your db service under a non-default account, you will need to copy
3737
# this file to grant the proper permissions and specify the account in your efm
3838
# cluster properties file by changing the 'db.service.owner' property.
39-
efm ALL=(postgres) NOPASSWD: /usr/edb/efm-5.1/bin/efm_db_functions
40-
efm ALL=(enterprisedb) NOPASSWD: /usr/edb/efm-5.1/bin/efm_db_functions
39+
efm ALL=(postgres) NOPASSWD: /usr/edb/efm-5.2/bin/efm_db_functions
40+
efm ALL=(enterprisedb) NOPASSWD: /usr/edb/efm-5.2/bin/efm_db_functions
4141
4242
# Allow user 'efm' to sudo efm_root_functions as 'root' to write/delete the PID file,
4343
# validate the db.service.owner property, etc.
44-
efm ALL=(ALL) NOPASSWD: /usr/edb/efm-5.1/bin/efm_root_functions
44+
efm ALL=(ALL) NOPASSWD: /usr/edb/efm-5.2/bin/efm_root_functions
4545
4646
# Allow user 'efm' to sudo efm_address as root for VIP tasks.
47-
efm ALL=(ALL) NOPASSWD: /usr/edb/efm-5.1/bin/efm_address
47+
efm ALL=(ALL) NOPASSWD: /usr/edb/efm-5.2/bin/efm_address
4848
4949
# Allow user 'efm' to sudo efm_pgpool_functions as root for pgpool tasks.
50-
efm ALL=(ALL) NOPASSWD: /usr/edb/efm-5.1/bin/efm_pgpool_functions
50+
efm ALL=(ALL) NOPASSWD: /usr/edb/efm-5.2/bin/efm_pgpool_functions
5151
5252
# relax tty requirement for user 'efm'
5353
Defaults:efm !requiretty
@@ -89,17 +89,17 @@ To run Failover Manager without sudo, you must select a database process owner w
8989
```shell
9090
su - enterprisedb
9191

92-
cp /etc/edb/efm-5.1/efm.properties.in <directory/cluster_name>.properties
92+
cp /etc/edb/efm-5.2/efm.properties.in <directory/cluster_name>.properties
9393

94-
cp /etc/edb/efm-5.1/efm.nodes.in <directory>/<cluster_name>.nodes
94+
cp /etc/edb/efm-5.2/efm.nodes.in <directory>/<cluster_name>.nodes
9595
```
9696

9797
Then, modify the cluster properties file, providing the name of the user in the `db.service.owner` property. Also make sure that the `db.service.name` property is blank. Without sudo, you can't run services without root access.
9898

9999
After modifying the configuration, the new user can control Failover Manager with the following command:
100100

101101
```shell
102-
/usr/edb/efm-5.1/bin/runefm.sh start|stop <directory/cluster_name>.properties
102+
/usr/edb/efm-5.2/bin/runefm.sh start|stop <directory/cluster_name>.properties
103103
```
104104

105105
Where `<directory/cluster_name.properties>` specifies the full path of the cluster properties file. The user provides the full path to the properties file whenever the nondefault user is controlling agents or using the `efm` script.

product_docs/docs/efm/5/05_using_efm.mdx

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -273,11 +273,11 @@ After creating the `acctg.properties` and `sales.properties` files, create a ser
273273

274274
If you're using RHEL/Rocky Linux/AlmaLinux 8.x or later, copy the service file `/usr/lib/systemd/system/edb-efm-5.<x>.service` to `/etc/systemd/system` with a new name that's unique for each cluster.
275275

276-
For example, if you have two clusters named `acctg` and `sales` managed by Failover Manager 5.1, the unit file names might be `efm-acctg.service` and `efm-sales.service`. You can create them with:
276+
For example, if you have two clusters named `acctg` and `sales` managed by Failover Manager 5.2, the unit file names might be `efm-acctg.service` and `efm-sales.service`. You can create them with:
277277

278278
```shell
279-
cp /usr/lib/systemd/system/edb-efm-5.1.service /etc/systemd/system/efm-acctg.service
280-
cp /usr/lib/systemd/system/edb-efm-5.1.service /etc/systemd/system/efm-sales.service
279+
cp /usr/lib/systemd/system/edb-efm-5.2.service /etc/systemd/system/efm-acctg.service
280+
cp /usr/lib/systemd/system/edb-efm-5.2.service /etc/systemd/system/efm-sales.service
281281
```
282282

283283
Then use `systemctl edit` to edit the `CLUSTER` variable in each unit file, changing the specified cluster name from `efm` to the new cluster name.
@@ -288,15 +288,15 @@ In this example, edit the `acctg` cluster by running `systemctl edit efm-acctg.s
288288
```ini
289289
[Service]
290290
Environment=CLUSTER=acctg
291-
PIDFile=/run/efm-5.1/acctg.pid
291+
PIDFile=/run/efm-5.2/acctg.pid
292292
```
293293

294294
Edit the `sales` cluster by running `systemctl edit efm-sales.service` and write:
295295

296296
```ini
297297
[Service]
298298
Environment=CLUSTER=sales
299-
PIDFile=/run/efm-5.1/sales.pid
299+
PIDFile=/run/efm-5.2/sales.pid
300300
```
301301

302302
!!!Note

product_docs/docs/efm/5/07_using_efm_utility.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -127,7 +127,7 @@ If the `-prompt` option is specified, the command will output the steps it will
127127
The following example shows the command output when both the `-prompt` and `-slot` options are specified:
128128

129129
```text
130-
# /usr/edb/efm-5.1/bin/efm create-standby efm -slot s2 -prompt
130+
# /usr/edb/efm-5.2/bin/efm create-standby efm -slot s2 -prompt
131131
Found primary node1 from cluster status.
132132
Verify primary address node1 does not match this agent's bind address 'node2' or external address ''.
133133
Will signal local agent to run database stop command and become idle if not already.

product_docs/docs/efm/5/08_controlling_efm_service.mdx

Lines changed: 22 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -40,26 +40,26 @@ Stop the Failover Manager on the current node. This command must be invoked by r
4040
The `status` command returns the status of the Failover Manager agent on which it is invoked. You can invoke the status command on any node to instruct Failover Manager to return status and server startup information.
4141

4242
```text
43-
[root@ONE ~]}> systemctl status edb-efm-5.1
44-
● edb-efm-5.1.service - EnterpriseDB Failover Manager 5.1
45-
Loaded: loaded (/etc/systemd/system/edb-efm-5.1.service; enabled; preset: disabled)
46-
Active: active (running) since Wed 2025-09-24 13:50:15 UTC; 7min ago
47-
Process: 10711 ExecStart=/bin/bash -c /usr/edb/efm-5.1/bin/runefm.sh start ${CLUSTER} (code=exited, status=0/SUCCESS)
48-
Main PID: 10793 (java)
49-
Tasks: 42 (limit: 50116)
50-
Memory: 182.9M
51-
CPU: 7.291s
52-
CGroup: /docker/224bba5e52f842fbcfe42dce9995a48f45b2acafde02d6337a9cf3e0e15a13c4/system.slice/edb-efm-5.1.service
53-
└─10793 /usr/lib/jvm/java-11-openjdk-11.0.25.0.9-7.el9.aarch64/bin/java -cp /usr/edb/efm-5.1/lib/EFM-5.1.jar -Xmx128m com.enterprisedb.efm.main.ServiceCommand __int_start /etc/edb/e>
54-
55-
Sep 24 13:50:15 node1 bash[10776]: 2025-09-24 13:50:15 Addresses to check after adding standbys: [node3-16262(node3), node4-52738(node4), node2-40700(node2)]
56-
Sep 24 13:50:15 node1 bash[10776]: 2025-09-24 13:50:15 Testing remote database connections.
57-
Sep 24 13:50:15 node1 bash[10776]: 2025-09-24 13:50:15 Checking host node3 for address: node3-16262(node3)
58-
Sep 24 13:50:15 node1 bash[10776]: 2025-09-24 13:50:15 Checking host node4 for address: node4-52738(node4)
59-
Sep 24 13:50:15 node1 bash[10776]: 2025-09-24 13:50:15 Checking host node2 for address: node2-40700(node2)
60-
Sep 24 13:50:15 node1 bash[10776]: 2025-09-24 13:50:15 Now monitoring database.
61-
Sep 24 13:50:15 node1 systemd[1]: Started EnterpriseDB Failover Manager 5.1.
62-
Sep 24 13:52:12 node1 sudo[11065]: efm : PWD=/ ; USER=postgres ; COMMAND=/usr/edb/efm-5.1/bin/efm_db_functions extrecconfexists /etc/edb/efm-5.1/efm.properties
63-
Sep 24 13:53:12 node1 sudo[11101]: efm : PWD=/ ; USER=postgres ; COMMAND=/usr/edb/efm-5.1/bin/efm_db_functions fileexists /etc/edb/efm-5.1/efm.properties /opt/postgres/data/recovery.signal
64-
Sep 24 13:54:12 node1 sudo[11129]: efm : PWD=/ ; USER=postgres ; COMMAND=/usr/edb/efm-5.1/bin/efm_db_functions fileexists /etc/edb/efm-5.1/efm.properties /opt/postgres/data/standby.signal
43+
[root@ONE ~]}> systemctl status edb-efm-5.2
44+
● edb-efm-5.2.service - EnterpriseDB Failover Manager 5.2
45+
Loaded: loaded (/usr/lib/systemd/system/edb-efm-5.2.service; disabled; preset: disabled)
46+
Active: active (running) since Mon 2025-11-24 16:00:08 UTC; 10s ago
47+
Process: 11755 ExecStart=/bin/bash -c /usr/edb/efm-5.2/bin/runefm.sh start ${CLUSTER} (code=exited, status=0/SUCCESS)
48+
Main PID: 11837 (java)
49+
Tasks: 46 (limit: 79998)
50+
Memory: 195.6M
51+
CPU: 2.989s
52+
CGroup: /docker/7ce08d0a35648d9156b56ef27a1bfced6af013a3e174f756681612002ac85961/system.slice/edb-efm-5.2.service
53+
└─11837 /usr/lib/jvm/java-11-openjdk-11.0.25.0.9-7.el9.aarch64/bin/java -cp /usr/edb/efm-5.2/lib/EFM-5.2.jar -Xmx128m com.enterprisedb.efm.main.ServiceCommand __int_start /etc/edb/efm-5.2/efm.properties
54+
55+
Nov 24 16:00:08 node1 bash[11820]: 2025-11-24 16:00:08 -------------------------------------------------------------------
56+
Nov 24 16:00:08 node1 bash[11820]: 2025-11-24 16:00:08 Checking shared properties with node2-51572(node2)
57+
Nov 24 16:00:08 node1 bash[11820]: 2025-11-24 16:00:08 Getting remote database addresses to check.
58+
Nov 24 16:00:08 node1 bash[11820]: 2025-11-24 16:00:08 Addresses to check after adding standbys: [node2-51572(node2), node3-8868(node3), node4-56220(node4)]
59+
Nov 24 16:00:08 node1 bash[11820]: 2025-11-24 16:00:08 Testing remote database connections.
60+
Nov 24 16:00:08 node1 bash[11820]: 2025-11-24 16:00:08 Checking host node2 for address: node2-51572(node2)
61+
Nov 24 16:00:08 node1 bash[11820]: 2025-11-24 16:00:08 Checking host node3 for address: node3-8868(node3)
62+
Nov 24 16:00:08 node1 bash[11820]: 2025-11-24 16:00:08 Checking host node4 for address: node4-56220(node4)
63+
Nov 24 16:00:08 node1 bash[11820]: 2025-11-24 16:00:08 Now monitoring database.
64+
Nov 24 16:00:08 node1 systemd[1]: Started EnterpriseDB Failover Manager 5.2.
6565
```

0 commit comments

Comments
 (0)