项目作者: Raman-Kathpalia

项目描述 :
IBM MQ Multi-Instance-QMgr-Auto-Restart
高级语言: Shell
项目地址: git://github.com/Raman-Kathpalia/IBM-MQ-Multi-Instance-QMgr-Automatic-Start.git


This is a Readme file for program - MQ_Multi_Instance_Monitor.bash

MQ_Multi_Instance_Monitor.bash is enhanced version of StartStandby.bash

(offering same core functionality as StartStandby.bash)

By - Raman Kathpalia. IBM MQ SME & an Automation Enthusiast

This is a high-level solution. You’re allowed to use as is or customize it. No Warranties.

What is new in MQ_Multi_Instance_Monitor.bash (Compared to StartStandby.bash)

1. Simple Operation. [start | stop | check] Arguments introduced
2. Ease of putting the program into Maintenance mode
3. Single threaded operation. Multi-threaded operation is an overkill. This program doesn’t spawns multiple threads
4. Penguin replaces the crazy cat :)

Introduction:

For organisations, who operate Multi-Instance Queue Managers as a HA solution for IBM MQ often notice that once MQ failover occurs, it leaves behind a defunct QMgr not capable to take over should a failback reoccurs. Autostart feature of defunct QMgr is not available in IBM MQ. This is by design as one should manually introspect the reason of failover, fix it and start the defunct QMgr to standby mode.

So far so good.

However, there are few cases where the problem is transitory and goes away with MQ restart/fail-over.

Let’s observe few use cases -

  1. Underlying NAS storage is being serviced and causes Active QMgr instance to fail-over
  2. Application bug causes MQ to be non-responsive, but MQ restart/fail-over fixes the problem
  3. Please feel free to add more cases that you’ve witnessed

These few cases coupled with hundreds of Multi-Instance QMgrs, managing them quickly becomes a challenge.

So for all those scenarios, this solution could be used.

This shell/bash solution is designed to run as a process. This solution should be deployed and run on IBM MQ server nodes where Multi-Instance Queue Managers are configured.
The same process should run on both Active and Standby nodes.

The solution is tested on Linux - RedHat Enterprise Linux and CentOS versions - (6.x and 7.x) with MQ version 7.5.X.X and 8.XXX

Having this piece of code with multi-Instance MQ, One can bring HA for MQ closer to a Vendor based traditional HA solutions - RedHat Cluster Suite or VCS with MQ to name a few.

What does this process/daemon do? (High level)

  • Puts QMgr(s) with status “Running elsewhere” to “Running as Standby”. Thus secondary QMgr is Ready to take over should a fail-over reoccurs.
  • Writes every MQ fail-over activity performed to Log for later review/audit.
  • CPU consumption by this process is low (< 0.01%) as observed in 2 CPU Intel Zeon machine with aggressive polling. (10 seconds)

How to Install this program?

  • Copy MQ_Multi_Instance_Monitor.bash on both nodes where MQ Multi-Instance Queue manager (Active and Standby) are running.
    (Location doesn’t matter)

  • Start the program on both nodes.

Optional Customization:

  • Default directory, where all data by this script is gathered, is in $HOME/MI. If you are happy with this location, no change needed. Else, see below Variables used in original script that could be altered

    • By default, this process polls every 20 seconds. If you’re happy with this value, no change needed. Else, see below Variables used in original script that could be altered

Question: How to Start or Stop this Process?

TO START Process:

  1. ./MQ_Multi_Instance_Monitor.bash start

TO STOP Process:

  1. ./MQ_Multi_Instance_Monitor.bash stop

To CHECK Process:

  1. ./MQ_Multi_Instance_Monitor.bash CHECK

Note: Case of Argument [start|stop|check] does Not matter. You can specify however you want.

EXAMPLE: DRY-RUN

  1. [mqm@joker7 ~]$ ./MQ_Multi_Instance_Monitor.bash
  2. Specify only one argument:
  3. USAGE: ./MQ_Multi_Instance_Monitor.bash START | STOP | CHECK

EXAMPLE: START

  1. [mqm@joker7 ~]$ ./MQ_Multi_Instance_Monitor.bash start
  2. ------------------------------------------------------------------------------------------------------------
  3. Sat Feb 17 17:17:54 EST 2018:
  4. Manual start attempted but Maintenance mode detected.
  5. ./MQ_Multi_Instance_Monitor.bash process not started
  6. [touch /home/mqm/MI/LOCK_FILE.txt] to take it out of maintenance mode and Retry.
  7. ------------------------------------------------------------------------------------------------------------

Note: This process runs on the concept of lock file monitoring. The process won’t start unless lock file is present. This is a safety measure against inadvertent start. Any attempt to start without lock file being present is reported in Activity Log along with timestamp

  1. [mqm@joker7 ~]$ touch /home/mqm/MI/LOCK_FILE.txt
  2. [mqm@joker7 ~]$ ./MQ_Multi_Instance_Monitor.bash Start
  3. ------------------------------------------------------------------------------------------------------------
  4. Started MQ_Multi_Instance_Monitor Process Manually at Sat Feb 17 17:21:07 EST 2018
  5. ------------------------------------------------------------------------------------------------------------

Program won’t start another instance if one is running. Multi-threaded operation is not needed and frankly, is an overkill.

  1. [mqm@joker7 ~]$ ./MQ_Multi_Instance_Monitor.bash Start
  2. Process already Running.

Process continues to run even after the user logs out.

EXAMPLE: STOP

  1. [mqm@joker7 ~]$ ./MQ_Multi_Instance_Monitor.bash stop
  2. ------------------------------------------------------------------------------------------------------------
  3. Sat Feb 17 15:59:43 EST 2018:
  4. Stopped MQ_Multi_Instance_Monitor Process Manually.
  5. This can take up to 20 seconds to stop.
  6. You may kill it for instant gratification.
  7. ------------------------------------------------------------------------------------------------------------

If Process is already stopped

  1. [mqm@joker7 ~]$ ./MQ_Multi_Instance_Monitor.bash stop
  2. MQ_Multi_Instance_Monitor Process Already in Stopped status
  3. [mqm@joker7 ~]$

EXAMPLE: CHECK

  1. [mqm@joker7 ~]$ ./MQ_Multi_Instance_Monitor.bash check
  2. UID PID STIME
  3. mqm 19598 12:50
UID - User ID. This must be a part of mqm group if ID other than mqm is being used.
PID - Process ID.
STIME - Start time. Indicates how long the program has been running.
If Process is already stopped:
  1. [mqm@joker7 ~]$ ./MQ_Multi_Instance_Monitor.bash check
  2. PROCESS MQ_Multi_Instance_Monitor NOT RUNNING
  3. [mqm@joker7 ~]$

What does this process do? - Step by Step

  1. Puts the Multi-Instance Failed over QMgr(s) with Status - “RUNNING ELSEWHERE” to STANDY MODE

  2. Creates a directory $FAILOVER_ACTIVITY_DIR and file - $ACTIVITY_FILE (if they don’t already exist)

  3. Keeps a check on $ACTIVITY_FILE from expanding beyond 150KB

  4. —- (deprecated feature)

  5. Logs all Failover MQ activity on both nodes with the timestamp for review later.

  6. If you have to stop QMgr normally/immediately using endmqm; this process wouldn’t interfere. QMgrs (Active and Standby) would end normally on both nodes. MQ_Multi_Instance_Monitor.bash acts only on QMgr with STATUS(Running elsewhere). But it may be a good idea to stop this process as well if you’re servicing IBM MQ

  7. Service/Daemon MQ_Multi_Instance_Monitor.bash polls every 20 seconds. You can edit that in the script by altering the POLLING_INTERVAL variable.

  8. Single Instance QMgrs are not affected by this service.

  9. To Check CPU/Memory usage in real-time by a process; do top -p PID where PID == process ID. Put in the PID of MQ_Multi_Instance_Monitor.bash

10. No code change necessary from one node to another irrespective of Queue Managers Names on individual boxes.

No hard-coding of QMgrNames needed anywhere in solution.

No configuration file(s) need be supplied.

Variables used in original script
  1. FAILOVER_ACTIVITY_DIR=$HOME/MI
  2. LOCK_FILE=$FAILOVER_ACTIVITY_DIR/LOCK_FILE.txt
  3. ACTIVITY_FILE=${FAILOVER_ACTIVITY_DIR}/Activity_trail.txt
  4. POLLING_INTERVAL=20

Variables used in original script that could be altered

You can change POLLING_INTERVAL value. Default is 20 seconds

POLLING_INTERVAL=20

You an choose where you want to put data generated by MQ_Multi_Instance_Monitor.bash

FAILOVER_ACTIVITY_DIR=$HOME/MI

You don’t have to create this directory though. It will be automatically created.

NOTES:

To learn more on IBM MQ High Availability, visit here
For curious minds: VCS Vs. Oracle RAC
How to manually failover Multi-Instance Queue Manager
  1. Failover MI QMgr [endmqm -s QMgrName]
  2. Stop defunct MI QMgr [endmqm -x QMgrName]
  3. Stop Single Instance QMgr [endmqm QMgrName]