how to promote master, after failover on postgresql with docker how to promote master, after failover on postgresql with docker postgresql postgresql

how to promote master, after failover on postgresql with docker


When the Master fails the PostgreSQL cluster elects another master from the stanby nodes (based on the node weight in the cluster). So when the ex-master is finally brought back to life the cluster remains loyal to it's current master ,however the ex-master is initiated back in the cluster but this time as a standby. All of that is completely managed by the PostgreSQL and not the Pgpool.

so what you would expect is that if the new master (ex-standby) fails (or scaled to 0) then the cluster would failover to the ex-master and elect it as a leader once again and when the standby is scaled up again it would join as a standby and things are back to normal. And that is exactly what the PostgreSQL cluster would do.

But most probably the Pgpool service would fail at that moment because whenever a node fails the Pgpool sign that node status as DOWN and even if that node comes back to live it will not notify the pgpool and your traffic would not reach that node.

so if you checked the recovered node status - after its recovery - on the pgpool container using the PCP commands :

pcp_node_info -U pcp_user -h localhost -n 1 # master node idpgmaster 5432 down 1

so what you have to do is to re-attach the once fallen node back to the Pgpool manually using:

pcp_attach_node -U pcp_user -h localhost -n 1 # master node id--- executed successfully ---pcp_node_info -U pcp_user -h localhost -n 1 # master node idpgmaster 5432 up 1

At this point the pgpool recognizes the ex-master node once again and can direct traffic to it.

After that whenever you remove ( scale to 0 ) the ex-standby (now master) service the whole solution (PostgreSQL - Pgpool) would failover to the actual master and now you could bring standby up again and re-attach it to the pgpool.

P.S. The downtime is only the failover downtime of the pgpool and the pgpool service will maintain its original configuration nothing added nothing restarted (well except for the PostgresQL node that failed hopefully :D ).


So i figure out how to sort of solve the problem,

  1. Create the containers manually

    Master docker run \ -e INITIAL_NODE_TYPE='master' \ -e NODE_ID=1 \ -e NODE_NAME='node1' \ -e CLUSTER_NODE_NETWORK_NAME='pgmaster' \ -e POSTGRES_PASSWORD='monkey_pass' \ -e POSTGRES_USER='monkey_user' \ -e POSTGRES_DB='monkey_db' \ -e CLUSTER_NODE_REGISTER_DELAY=5 \ -e REPLICATION_DAEMON_START_DELAY=120 \ -e CLUSTER_NAME='pg_cluster' \ -e REPLICATION_DB='replication_db' \ -e REPLICATION_USER='replication_user' \ -e REPLICATION_PASSWORD='replication_pass' \ -v cluster-archives:/var/cluster_archive \ -p 5432:5432 \ --net mynet \ --net-alias pgmaster \ --name pgmastertest \paunin/postgresql-cluster-pgsql

    Slave docker run \ -e INITIAL_NODE_TYPE='standby' \ -e NODE_ID=2 \ -e NODE_NAME='node2' \ -e REPLICATION_PRIMARY_HOST='pgmaster' \ -e CLUSTER_NODE_NETWORK_NAME='pgslave1' \ -e REPLICATION_UPSTREAM_NODE_ID=1 \ -v cluster-archives:/var/cluster_archive \ -p 5441:5432 \ --net mynet \ --net-alias pgslave1 \ --name pgslavetest \paunin/postgresql-cluster-pgsql

    Pgpool
    docker run \ -e PCP_USER='pcp_user' \ -e PCP_PASSWORD='pcp_pass' \ -e PGPOOL_START_DELAY=120 \ -e REPLICATION_USER='replication_user' \ -e REPLICATION_PASSWORD='replication_pass' \ -e SEARCH_PRIMARY_NODE_TIMEOUT=5 \ -e DB_USERS='monkey_user:monkey_pass' \ -e BACKENDS='0:pgmaster:5432:1:/var/lib/postgresql/data:ALLOW_TO_FAILOVER,1:pgslave1::::' \ -p 5430:5432 \ -p 9898:9898 \ --net mynet \ --net-alias pgpool \ --name pgpooltest \paunin/postgresql-cluster-pgpool

on the line BACKENDS='0:pgmaster:5432:1:/var/lib/postgresql/data:ALLOW_TO_FAILOVER,1:pgslave1::::' \ you can add more slaves to pgppool

  1. Stop master pgmaster, slave pgslave1 would be promoted after a few secs,
  2. Add new slave container docker run \ -e INITIAL_NODE_TYPE='standby' \ -e NODE_ID=3 \ -e NODE_NAME='node1' \ -e REPLICATION_PRIMARY_HOST='pgslave1' \ -e CLUSTER_NODE_NETWORK_NAME='pgmaster' \ -e REPLICATION_UPSTREAM_NODE_ID=2 \ -v cluster-archives:/var/cluster_archive \ -p 5432:5432 \ --net mynet \ --net-alias pgmaster \ --name pgmastertest3 \paunin/postgresql-cluster-pgsql

on the following lines -e REPLICATION_PRIMARY_HOST='pgslave1' \ make sure you are pointing to the alias of the new master (pgslave1). -e REPLICATION_UPSTREAM_NODE_ID=2 \ make sure you are pointing to the new master node id (2). -e NODE_ID=3 \ make sure this id doesn't exists on the table repl_nodes.--net-alias pgmaster \ u can use the one from your old master, or use one that you already added on pgpool BACKENDS='0:pgmaster:5432:1:/var/lib/postgresql/data:ALLOW_TO_FAILOVER,1:pgslave1::::' \ otherwise if the new master fails repmgr wont be able to recover it.

Its a little manual, but it does what i need, and thats to add a new slave to the new master.