项目作者: aerospike-community

项目描述 :
Aerospike Nagios Integration - a community driven open source project
高级语言: Python
项目地址: git://github.com/aerospike-community/aerospike-nagios.git
创建时间: 2014-04-17T23:12:40Z
项目社区:https://github.com/aerospike-community/aerospike-nagios

开源协议:Apache License 2.0

下载


Introduction

aerospike_nagios.py simplifies nagios configurations for Aerospike clusters.
The goal is to reduce the complexity to 2 simple steps.

  1. Copy aerospike_nagios.py and dependencies to your Nagios server
  2. Add aerospike configs into Nagios

Aerospike Monitoring Stack

For monitoring and alerting you should consider using the Prometheus and Grafana based Aerospike Monitoring Stack. This is the monitoring solution being developed by Aerospike.

Community Development

This repository has been turned over to the community. If you wish to contribute code, go ahead and clone this repo, modify the code, and create a pull request.

Active contributors can then ask to become maintainers for the repo. The wiki can similarly be modified by any code contributor who has been granted pull permissions.

Compatibility

Fully compatible with Aerospike Server 4.0 - 5.1.0.7

Features

  • Can monitor any stat returned by
    • $ asinfo -v 'statistics' [-h host]
    • $ asinfo -v 'namespace/<NAMESPACE NAME>' [-h host]
    • $ asinfo -v 'sets/<NAMESPACE NAME>/<SET NAME>' [-h host]
    • $ asinfo -v 'bins/<NAMESPACE NAME>' [-h host]
    • $ asinfo -v 'sindex/<NAMESPACE NAME>/<SINDEX NAME>' [-h host]
    • $ asinfo -v 'get-stats:context=xdr;dc=<DATACENTER>' [-h host] or $ asinfo -v 'dc/<DATACENTER>' [-h host]
    • $ asinfo -v 'latency:hist=<LATCENCY STAT>' [-h host] or $ asinfo -v 'latencies:hist=<LATCENCY STAT>' [-h host]

Requirements

Additional python modules are required and installed using pip:

  1. sudo pip install -r requirements.txt

See requirements.txt.

Getting Started

  1. Copy aerospike_nagios.py to your prefered scripts dir

    eg: /opt/aerospike/bin/

  2. Copy aerospike_schema.yaml and ssl directory to the same directory

  3. Copy examples/aerospike.cfg into your nagios conf.d directory

    /etc/nagios/conf.d if installed from repo
    /usr/local/nagios/etc/objects if installed from source

Note: If you are using nrpe to monitor a remote client use the contents
of nrpe/ instead. Information for configuring nrpe can be found in the nrpe
documentation.

  1. Edit examples/aerospike.cfg

    • Add your aerospike hosts into the hostgroup
    • Change examples to reflect your current aerospike configuration
  2. Edit nagios.cfg by adding a cfg_file directive that points to the location of aerospike.cfg.

    cfg_file=/etc/nagios/conf.d if installed from repo
    cfg_file=/usr/local/nagios/etc/objects if installed from source
    Note: Not required if cfg_dir directive is being used.

  3. Restart/reload nagios

Aerospike nagios Plugin

See aerospike_nagios.py, this is the file that nagios will schedule to perform
queries against Aerospike. Other than copying it to the appropriate location,
you are not required to interact with it.

Usage

  1. $ python /opt/aerospike/bin/aerospike_nagios.py --help
  2. usage: aerospike_nagios.py [-u] [-U USER] [-P [PASSWORD]]
  3. [--credentials-file CREDENTIALS]
  4. [--auth-mode AUTH_MODE] [-v]
  5. [-n NAMESPACE | -l LATENCY | -x DC]
  6. [-t SET | -b | -i SINDEX] -s STAT [-p PORT]
  7. [-h HOST] -c CRIT -w WARN [--timeout TIMEOUT]
  8. [--tls-enable] [--tls-name TLS_NAME]
  9. [--tls-keyfile TLS_KEYFILE]
  10. [--tls-keyfile-pw TLS_KEYFILE_PW]
  11. [--tls-certfile TLS_CERTFILE]
  12. [--tls-cafile TLS_CAFILE] [--tls-capath TLS_CAPATH]
  13. [--tls-ciphers TLS_CIPHERS]
  14. [--tls-protocols TLS_PROTOCOLS]
  15. [--tls-cert-blacklist TLS_CERT_BLACKLIST]
  16. [--tls-crl-check] [--tls-crl-check-all]
  17. optional arguments:
  18. -u, --usage, --help Show this help message and exit
  19. -U USER, --user USER user name
  20. -P [PASSWORD], --password [PASSWORD]
  21. password
  22. --credentials-file CREDENTIALS
  23. Path to the credentials file. Use this in place of
  24. --user and --password.
  25. --auth-mode AUTH_MODE
  26. Authentication mode. Values: ['EXTERNAL',
  27. 'EXTERNAL_INSECURE', 'INTERNAL'] (default: INTERNAL)
  28. -v, --verbose Enable verbose logging
  29. -n NAMESPACE, --namespace NAMESPACE
  30. Namespace name. eg: bar
  31. -l LATENCY, --latency LATENCY
  32. Histogram name e.g. {test}-write Options: see output
  33. of "asinfo -v 'latency:hist' -l" or "asinfo -v
  34. 'latencies:hist -l "
  35. -x DC, --xdr DC Datacenter name. eg: myDC1.
  36. -t SET, --set SET Set name. eg: testSet. Statistic for a particular set
  37. in a particular namespace.
  38. -b, --bin Bin usage information for a particular namspace.
  39. -i SINDEX, --sindex SINDEX
  40. Secondary Index name. eg: age. Statistic for a
  41. particular secondary index in a particular namespace.
  42. -s STAT, --stat STAT Statistic name or in the case of --latency, as bucket.
  43. eg: cluster_size or 1ms
  44. -p PORT, ---port PORT
  45. PORT for Aerospike server (default: 3000)
  46. -h HOST, --host HOST HOST for Aerospike server (default: 127.0.0.1)
  47. -c CRIT, --critical CRIT
  48. Critical level
  49. -w WARN, --warning WARN
  50. Warning level
  51. --timeout TIMEOUT Set timeout value in seconds to node level operations.
  52. TLS connection does not support timeout. (default: 5)
  53. --tls-enable Enable TLS
  54. --tls-name TLS_NAME The expected name on the server side certificate
  55. --tls-keyfile TLS_KEYFILE
  56. The private keyfile for your client TLS Cert
  57. --tls-keyfile-pw TLS_KEYFILE_PW
  58. Password to load protected tls-keyfile
  59. --tls-certfile TLS_CERTFILE
  60. The client TLS cert
  61. --tls-cafile TLS_CAFILE
  62. The CA for the server's certificate
  63. --tls-capath TLS_CAPATH
  64. The path to a directory containing CA certs and/or
  65. CRLs
  66. --tls-ciphers TLS_CIPHERS
  67. Ciphers to include. See https://www.openssl.org/docs/m
  68. an1.1.0/man1/ciphers.html for cipher list format
  69. --tls-protocols TLS_PROTOCOLS
  70. The TLS protocol to use. Available choices: TLSv1,
  71. TLSv1.1, TLSv1.2, all. An optional + or - can be
  72. appended before the protocol to indicate specific
  73. inclusion or exclusion.
  74. --tls-cert-blacklist TLS_CERT_BLACKLIST
  75. Blacklist including serial number of certs to revoke
  76. --tls-crl-check Checks SSL/TLS certs against vendor's Certificate
  77. Revocation Lists for revoked certificates. CRLs are
  78. found in path specified by --tls-capath. Checks the
  79. leaf certificates only
  80. --tls-crl-check-all Check on all entries within the CRL chain
  1. -U user (Enterprise only)
  2. -P password (Enterprise only)

Examples

To monitor a specific general statistic:

  1. aerospike_nagios.py -h YOUR_ASD_HOST -s STAT_NAME -w WARN_LEVEL -c CRIT_LEVEL

To monitor a specific metric in a namepsace:

  1. aerospike_nagios.py -h YOUR_ASD_HOST -s STAT_NAME -n YOUR_NAMESPACE -w WARN_LEVEL -c CRIT_LEVEL

To monitor a specific metric in a set:

  1. aerospike_nagios.py -h YOUR_ASD_HOST -s STAT_NAME -n YOUR_NAMESPACE -t YOUR_SET -w WARN_LEVEL -c CRIT_LEVEL

To monitor a specific metric in bins:

  1. aerospike_nagios.py -h YOUR_ASD_HOST -s STAT_NAME -n YOUR_NAMESPACE -b -w WARN_LEVEL -c CRIT_LEVEL

To monitor a specific metric in a sindex:

  1. aerospike_nagios.py -h YOUR_ASD_HOST -s STAT_NAME -n YOUR_NAMESPACE -i YOUR_SINDEX -w WARN_LEVEL -c CRIT_LEVEL

To monitor a specific metric in xdr:

  1. aerospike_nagios.py -h YOUR_ASD_HOST -s STAT_NAME -x DATACENTER -w WARN_LEVEL -c CRIT_LEVEL

To monitor latency statistics (ASD <= 3.9):

  1. aerospike_nagios.py -h YOUR_ASD_HOST -s BUCKET -l HISTOGRAM -w WARN_LEVEL -c CRIT_LEVEL

For possible values of HISTOGRAM run “asinfo -v ‘latency:hist’ -l”
BUCKETS = <1ms|8ms|64ms>

To monitor latency statistics (ASD > 3.9):

  1. aerospike_nagios.py -h YOUR_ASD_HOST -s BUCKET -l {NAMESPACE}-HISTOGRAM -w WARN_LEVEL -c CRIT_LEVEL

For possible values of HISTOGRAM run “asinfo -v ‘latency:hist’ -l” or “asinfo -v ‘latencies:hist’ -l”
eg: aerospike_nagios.py -h localhost -s 1ms -l {test}-read -w 8 -c 10
BUCKETS = <1ms|8ms|64ms> for (3.9 < ASD < 5.1)
BUCKETS = <2**i from i = 0 to i = 17> for (ASD >= 5.1)
Note: For ASD 5.1+ bucket units can also be in microseconds if the histogram
is configured correctly. For example 65536us could be a valid BUCKET.

To utilize SSL/TLS standard auth:

  1. aerospike_nagios.py -h YOUR_ASD_HOST -p YOUR_SECURED_PORT -s STAT_NAME --tls-enable --tls-cafile YOUR_CA_PEM --tls-name YOUR_ASD_CERT_NAME -w WARN_LEVEL -c CRIT_LEVEL

Alert Levels

Warning and Critical thresholds are specified according to Nagios’ format

To not use warning and/or critical levels, set them to 0.

Example usage can be found in the examples/aerospike.cfg file.

Authentication

You can specify User and Password for authentication via the -U/—user and -P/—password parameters.
The Password is also an interactive prompt if you leave it empty.

If this is not preferable, you can also specify a credentials file with -c/—credentials-file.
It is a simple 2 line file, with the username and password on each line, in that order.
With this method, the credentials file can be secured via other means (eg: chmod 600) and prevent snooping.

AuthMode is optional parameter to specify authentication mode. It’s default value is INTERNAL.

Note:

The previous implementation of the nagios plugin has been moved to the
legacy branch.