Article - CS438655
Issue with one or more Zookeeper nodes prevented ThingWorx Platform Active-Passive High Availability (HA) from successful failover
Modified: 04-Mar-2025
Applies To
- ThingWorx Platform 8.4 to 8.5
- Zookeeper
Description
- Attempting to complete maintenance tasks on ThingWorx Platform Active-Passive High Availability (HA) configuration resulted in downtime despite having the proper number of nodes available at all times
- Took one of three available Zookeeper nodes offline for maintenance and ThingWorx Platform was inaccessible
- Unexpected downtime when performing maintenance on ThingWorx Platform Active-Passive HA environment
- Only two of three Zookeeper nodes were part of the quorum which resulted in downtime for ThingWorx Platform when one of the Zookeeper nodes went offline
- Ensured the following node counts were online and available in ThingWorx Active-Passive HA configuration but still got unplanned downtime:
- 1 ThingWorx node
- 2 Zookeeper nodes
- Zookeeper logs indicate that only two of three nodes are part of the quorum:
-
[myid:<ZK ID>] - INFO [QuorumPeer[myid=3](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):Leader@1296] - Have quorum of supporters, sids: [ [<ZK ID 1> <ZK ID 2>],[<ZK ID 1>, <ZK ID 2>] ]; starting up and setting last processed zxid: 0x2900000000
-
- Took a restart of a Zookeeper node and it immediately formed a quorum where it was the leader per the Zookeeper logs:
-
[myid:<ZK ID>] - INFO [QuorumPeer[myid=3](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):Leader@464] - LEADING - LEADER ELECTION TOOK - <Time> MS - When restarting a single Zookeeper node it should join an existing quorum as a FOLLOWER
-
This is a printer-friendly version of Article 438655 and may be out of date. For the latest version click CS438655