项目作者: JaSei

项目描述 :
nagios/icinga check of OOM killed containers
高级语言: Go
项目地址: git://github.com/JaSei/check_docker_OOMkiller.git
创建时间: 2016-12-17T22:54:00Z
项目社区:https://github.com/JaSei/check_docker_OOMkiller

开源协议:MIT License

下载


check_docker_OOMkiller

SYNOPSIS

  1. $ check_docker_OOMkiller -l /tmp/check_docker_oomkiller
  2. WARNING: Container 1da1ac35179e4a431244118025a4806f88ca172ffdba9e035076add06e912e6f (stress) was killed by OOM killer
  3. exit status 1

DESCRIPTION

this is plugin for nagios, icinga or compatible monitoring

this plugin list non-running containers and check which was killed by OOM killer (means container have less memory then need)

because I don’t each run of this plugin report same OOMKilled containers, is possible save last checked container (id) to file

this plugin comunicate with docker API via unix:///var/run/docker.sock and must have right rights (root/docker-group)

how it works?

this plugin list do inspect to all non-running container and check State.OOMKilled flag

this plugin do same like this command:

  1. docker ps -a -q --filter=status=exited --filter=status=dead --filter=since=$LAST_CHECKED_CONTAINER_ID | xargs docker inspect --format "{{.State.OOMKilled}} {{.ID}} {{.Config.Image}}" | grep -E '^true'

OPTIONS

  • -l - file path to persist last checked container id
  • -w - OOM killed container is report as warning (default)
  • -c - OOM killed container is report as critical
  • --format - Format of output use go-templates like docker inspect (default “Container {{.ID}} ({{.Config.Image}}) was killed by OOM killer”)
  • slack - slack token
  • slackChannel - slack (fallback) channel(s) to post message
  • slackUser - slack bot custom username (default are OOM killer)
  • debug - Enable debug mode. Debug prints are print to STDERR
  • debugFile - Redirect debug prints to file (must be set debug option too)

slack support

since version 1.1.0 this check supports slack

since version 2.0.0 we have breaking changes of slack support

  1. create new bot via https://YOUR-SLACK.slack.com/apps/manage/custom-integrations

  2. insert your token to option --slack YOUR_TOKEN

  3. use option slackChannel as fallback channel (if docker/container don’t have set SLACK_CONTACT) to send message
    (support multiple channels) --slackChannel A --slackChannel B

  4. to send DM or docker specific channel, use docker image or container label SLACK_CONTACT

    • for send to DM use @user syntax
    • for send to channel use #channel syntax
    • is possible send to more DM/channel - use coma (,) as separator

example image label (Dockerfile)

  1. FROM ...
  2. LABEL SLACK_CONTACT "@user,@other_user"
  3. ...

example container label

  1. docker run --label SLACK_CONTACT="@user,@otheruser"

WHY EXISTS THIS PLUGIN?

of course is possible find OOM killer in /var/log/messages (dmesg)

  1. Dec 17 00:34:31 fanatica kernel: stress invoked oom-killer: gfp_mask=0x24000c0(GFP_KERNEL), order=0, oom_score_adj=0
  2. Dec 17 00:34:31 fanatica kernel: stress cpuset=docker-1da1ac35179e4a431244118025a4806f88ca172ffdba9e035076add06e912e6f.scope mems_allowed=0
  3. Dec 17 00:34:31 fanatica kernel: CPU: 1 PID: 12906 Comm: stress Tainted: P OE 4.8.11-200.fc24.x86_64 #1
  4. Dec 17 00:34:31 fanatica kernel: Hardware name: Dell Inc. Precision T1600/06NWYK, BIOS A07 10/17/2011
  5. Dec 17 00:34:31 fanatica kernel: 0000000000000286 00000000250c8593 ffffa232b5bb3c50 ffffffffbb3e5f4d
  6. Dec 17 00:34:31 fanatica kernel: ffffa232b5bb3d30 ffffa23554033d00 ffffa232b5bb3cb8 ffffffffbb24c308
  7. Dec 17 00:34:31 fanatica kernel: ffffa2359d219580 ffffa23474e29e80 ffffffffbb1bcd86 0000000000080000
  8. Dec 17 00:34:31 fanatica kernel: Call Trace:
  9. Dec 17 00:34:31 fanatica kernel: [<ffffffffbb3e5f4d>] dump_stack+0x63/0x86
  10. Dec 17 00:34:31 fanatica kernel: [<ffffffffbb24c308>] dump_header+0x5c/0x1d5
  11. Dec 17 00:34:31 fanatica kernel: [<ffffffffbb1bcd86>] ? find_lock_task_mm+0x36/0x80
  12. Dec 17 00:34:31 fanatica kernel: [<ffffffffbb1bd97c>] oom_kill_process+0x20c/0x3d0
  13. Dec 17 00:34:31 fanatica kernel: [<ffffffffbb23ccb5>] ? mem_cgroup_iter+0x105/0x2d0
  14. Dec 17 00:34:31 fanatica kernel: [<ffffffffbb23f2be>] mem_cgroup_out_of_memory+0x2ce/0x310
  15. Dec 17 00:34:31 fanatica kernel: [<ffffffffbb24022b>] mem_cgroup_oom_synchronize+0x33b/0x350
  16. Dec 17 00:34:31 fanatica kernel: [<ffffffffbb23abb0>] ? get_mem_cgroup_from_mm+0xa0/0xa0
  17. Dec 17 00:34:31 fanatica kernel: [<ffffffffbb1be01c>] pagefault_out_of_memory+0x4c/0xc0
  18. Dec 17 00:34:31 fanatica kernel: [<ffffffffbb062274>] mm_fault_error+0x94/0x190
  19. Dec 17 00:34:31 fanatica kernel: [<ffffffffbb0627f4>] __do_page_fault+0x484/0x4d0
  20. Dec 17 00:34:31 fanatica kernel: [<ffffffffbb062870>] do_page_fault+0x30/0x80
  21. Dec 17 00:34:31 fanatica kernel: [<ffffffffbb804dc8>] page_fault+0x28/0x30
  22. Dec 17 00:34:31 fanatica kernel: Task in /system.slice/docker-1da1ac35179e4a431244118025a4806f88ca172ffdba9e035076add06e912e6f.scope killed as a result of limit of /system.slice/docker-1da1ac35179e4a431244118025a4806f88ca172ffdba9e035076add06e912e6f.scope
  23. Dec 17 00:34:31 fanatica kernel: memory: usage 1048576kB, limit 1048576kB, failcnt 70428
  24. Dec 17 00:34:31 fanatica kernel: memory+swap: usage 2097080kB, limit 2097152kB, failcnt 17
  25. Dec 17 00:34:31 fanatica kernel: kmem: usage 4708kB, limit 9007199254740988kB, failcnt 0
  26. Dec 17 00:34:31 fanatica kernel: Memory cgroup stats for /system.slice/docker-1da1ac35179e4a431244118025a4806f88ca172ffdba9e035076add06e912e6f.scope: cache:0KB rss:1043868KB rss_huge:0KB mapped_file:0KB dirty:0KB writeback:82044KB swap:1048504KB inactive_anon:521952KB active_anon:521912KB inactive_file:0KB active_file:0KB unevictable:0KB
  27. Dec 17 00:34:31 fanatica kernel: [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name
  28. Dec 17 00:34:31 fanatica kernel: [12870] 0 12870 1866 0 9 3 23 0 stress
  29. Dec 17 00:34:31 fanatica kernel: [12906] 0 12906 526155 239002 1031 5 284061 0 stress
  30. Dec 17 00:34:31 fanatica kernel: Memory cgroup out of memory: Kill process 12906 (stress) score 999 or sacrifice child
  31. Dec 17 00:34:31 fanatica kernel: Killed process 12906 (stress) total-vm:2104620kB, anon-rss:956008kB, file-rss:0kB, shmem-rss:0kB

but you see only name of application in docker (for java application you see java invoked oom-killer: ...)

if you can pair with container, you find docker id (for example cpuset=docker-<ID>.scope) and then run docker inspect $ID

this plugin do this without the need parsing /var/log/messages file (and without parsing related problems like logrotate, format, localization, …)