IBM MQ Multi-Instance-QMgr-Auto-Restart
For organisations, who operate Multi-Instance Queue Managers as a HA solution for IBM MQ often notice that once MQ failover occurs, it leaves behind a defunct QMgr not capable to take over should a failback reoccurs. Autostart feature of defunct QMgr is not available in IBM MQ. This is by design as one should manually introspect the reason of failover, fix it and start the defunct QMgr to standby mode.
So far so good.
However, there are few cases where the problem is transitory and goes away with MQ restart/fail-over.
Let’s observe few use cases -
These few cases coupled with hundreds of Multi-Instance QMgrs, managing them quickly becomes a challenge.
So for all those scenarios, this solution could be used.
This shell/bash solution is designed to run as a process. This solution should be deployed and run on IBM MQ server nodes where Multi-Instance Queue Managers are configured.
The same process should run on both Active and Standby nodes.
The solution is tested on Linux - RedHat Enterprise Linux and CentOS versions - (6.x and 7.x) with MQ version 7.5.X.X and 8.XXX
Copy MQ_Multi_Instance_Monitor.bash on both nodes where MQ Multi-Instance Queue manager (Active and Standby) are running.
(Location doesn’t matter)
Start the program on both nodes.
Optional Customization:
Default directory, where all data by this script is gathered, is in $HOME/MI
. If you are happy with this location, no change needed. Else, see below Variables used in original script that could be altered
./MQ_Multi_Instance_Monitor.bash start
./MQ_Multi_Instance_Monitor.bash stop
./MQ_Multi_Instance_Monitor.bash CHECK
[mqm@joker7 ~]$ ./MQ_Multi_Instance_Monitor.bash
Specify only one argument:
USAGE: ./MQ_Multi_Instance_Monitor.bash START | STOP | CHECK
[mqm@joker7 ~]$ ./MQ_Multi_Instance_Monitor.bash start
------------------------------------------------------------------------------------------------------------
Sat Feb 17 17:17:54 EST 2018:
Manual start attempted but Maintenance mode detected.
./MQ_Multi_Instance_Monitor.bash process not started
[touch /home/mqm/MI/LOCK_FILE.txt] to take it out of maintenance mode and Retry.
------------------------------------------------------------------------------------------------------------
[mqm@joker7 ~]$ touch /home/mqm/MI/LOCK_FILE.txt
[mqm@joker7 ~]$ ./MQ_Multi_Instance_Monitor.bash Start
------------------------------------------------------------------------------------------------------------
Started MQ_Multi_Instance_Monitor Process Manually at Sat Feb 17 17:21:07 EST 2018
------------------------------------------------------------------------------------------------------------
[mqm@joker7 ~]$ ./MQ_Multi_Instance_Monitor.bash Start
Process already Running.
[mqm@joker7 ~]$ ./MQ_Multi_Instance_Monitor.bash stop
------------------------------------------------------------------------------------------------------------
Sat Feb 17 15:59:43 EST 2018:
Stopped MQ_Multi_Instance_Monitor Process Manually.
This can take up to 20 seconds to stop.
You may kill it for instant gratification.
------------------------------------------------------------------------------------------------------------
[mqm@joker7 ~]$ ./MQ_Multi_Instance_Monitor.bash stop
MQ_Multi_Instance_Monitor Process Already in Stopped status
[mqm@joker7 ~]$
[mqm@joker7 ~]$ ./MQ_Multi_Instance_Monitor.bash check
UID PID STIME
mqm 19598 12:50
[mqm@joker7 ~]$ ./MQ_Multi_Instance_Monitor.bash check
PROCESS MQ_Multi_Instance_Monitor NOT RUNNING
[mqm@joker7 ~]$
Puts the Multi-Instance Failed over QMgr(s) with Status - “RUNNING ELSEWHERE” to STANDY MODE
Creates a directory $FAILOVER_ACTIVITY_DIR
and file - $ACTIVITY_FILE
(if they don’t already exist)
Keeps a check on $ACTIVITY_FILE
from expanding beyond 150KB
—- (deprecated feature)
Logs all Failover MQ activity on both nodes with the timestamp for review later.
If you have to stop QMgr normally/immediately using endmqm; this process wouldn’t interfere. QMgrs (Active and Standby) would end normally on both nodes. MQ_Multi_Instance_Monitor.bash acts only on QMgr with STATUS(Running elsewhere). But it may be a good idea to stop this process as well if you’re servicing IBM MQ
Service/Daemon MQ_Multi_Instance_Monitor.bash polls every 20 seconds. You can edit that in the script by altering the POLLING_INTERVAL
variable.
Single Instance QMgrs are not affected by this service.
To Check CPU/Memory usage in real-time by a process; do top -p PID
where PID
== process ID. Put in the PID of MQ_Multi_Instance_Monitor.bash
FAILOVER_ACTIVITY_DIR=$HOME/MI
LOCK_FILE=$FAILOVER_ACTIVITY_DIR/LOCK_FILE.txt
ACTIVITY_FILE=${FAILOVER_ACTIVITY_DIR}/Activity_trail.txt
POLLING_INTERVAL=20
POLLING_INTERVAL=20
FAILOVER_ACTIVITY_DIR=$HOME/MI
Failover MI QMgr [endmqm -s QMgrName]
Stop defunct MI QMgr [endmqm -x QMgrName]
Stop Single Instance QMgr [endmqm QMgrName]