Frontends procedures#
Intended audience
sysadm staff members
Pacemaker maintenance mode#
In maintenance mode, pacemaker will not attempt to manage the service or switch the ips from one node to another.
- Force the maintenance mode 
crm_attribute --name maintenance-mode --update true
- Go back to the nominal mode 
crm_attribute --name maintenance-mode --delete
- check the status 
Nominal mode:
root@gloin001:~# crm status
Status of pacemakerd: 'Pacemaker is running' (last updated 2024-03-06 18:45:31 +01:00)
Cluster Summary:
   * Stack: corosync
   * Current DC: gloin001 (version 2.1.5-a3f44794f94) - MIXED-VERSION partition with quorum
   * Last updated: Wed Mar  6 18:45:31 2024
   * Last change:  Wed Mar  6 18:45:27 2024 by root via crm_attribute on gloin001
   * 2 nodes configured
   * 4 resource instances configured
Node List:
   * Online: [ gloin001 gloin002 ]
Full List of Resources:
   * r_vip_pub   (ocf:heartbeat:IPaddr2):         Started gloin001
   * r_vip_ha    (ocf:heartbeat:IPaddr2):         Started gloin001
   * Clone Set: ha_postgresql [r_postgresql] (promotable):
      * Promoted: [ gloin001 ]
      * Unpromoted: [ gloin002 ]
In maintenance:
root@gloin001:~# crm status
Status of pacemakerd: 'Pacemaker is running' (last updated 2024-03-06 18:43:58 +01:00)
Cluster Summary:
   * Stack: corosync
   * Current DC: gloin001 (version 2.1.5-a3f44794f94) - MIXED-VERSION partition with quorum
   * Last updated: Wed Mar  6 18:43:58 2024
   * Last change:  Wed Mar  6 18:41:47 2024 by root via crm_attribute on gloin001
   * 2 nodes configured
   * 4 resource instances configured
            *** Resource management is DISABLED ***
The cluster will not attempt to start, stop or recover services
Node List:
   * Online: [ gloin001 gloin002 ]
Full List of Resources:
   * r_vip_pub   (ocf:heartbeat:IPaddr2):         Started gloin001 (unmanaged)
   * r_vip_ha    (ocf:heartbeat:IPaddr2):         Started gloin001 (unmanaged)
   * Clone Set: ha_postgresql [r_postgresql] (promotable, unmanaged):
      * r_postgresql      (ocf:heartbeat:pgsqlms):         Unpromoted gloin002 (unmanaged)
      * r_postgresql      (ocf:heartbeat:pgsqlms):         Promoted gloin001 (unmanaged)
Clear the pacemaker error status of a resource#
For example:
crm_resource -r r_postgresql -H gloin002 -C
Restore a postgresql secondary from the primary#
- Activate the pacemaker maintenance mode 
- Stop postgresql via pacemaker (here the postgresql on gloin002) 
crm --wait resource ban r_postgresql gloin002
Check the postgresql logs to check the status
If the postgresql doesn’t stop, it can be force with:
export VERSION=<version>
sudo -u postgres /usr/lib/postgresql/$VERSION/bin/pg_ctl -D /var/lib/postgresql/$VERSION/main stop
- Delete or move the content of the postgresql data directory in - /var/lib/postgresql/<version>/main
- Launch the restoration from the master 
sudo -u postgres pg_basebackup -h 10.25.1.1 -D /var/lib/postgresql/16/main/ -P -U replicator --wal-method=fetch
- Restore the nominal pacemaker mode 
Postgresql should restart and recover its lag.
- Check the pacemaker after the secondary is up to date