项目作者: linkedin

项目描述 :
Xinfra Monitor monitors the availability of Kafka clusters by producing synthetic workloads using end-to-end pipelines to obtain derived vital statistics - E2E latency, service produce/consume availability, offsets commit availability & latency, message loss rate and more.
高级语言: Java
项目地址: git://github.com/linkedin/kafka-monitor.git
创建时间: 2016-03-30T19:42:49Z
项目社区:https://github.com/linkedin/kafka-monitor

开源协议:Apache License 2.0

下载




Xinfra Monitor

Build Status
Greetings
Mark stale issues and pull requests
Pull Request Labeler

Xinfra Monitor (formerly Kafka Monitor) is a framework to implement and execute long-running kafka
system tests in a real cluster. It complements Kafka’s existing system
tests by capturing potential bugs or regressions that are only likely to occur
after prolonged period of time or with low probability. Moreover, it allows you to monitor Kafka
cluster using end-to-end pipelines to obtain a number of derived vital stats
such as



  1. End-to-end latency


  2. Service availability


  3. Produce and Consume availability


  4. Consumer offset commit availability


  5. Consumer offset commit latency


  6. Kafka message loss rate


  7. And many, many more.

You can easily
deploy Xinfra Monitor to test and monitor your Kafka cluster without requiring
any change to your application.

Xinfra Monitor can automatically create the monitor topic with the specified config
and increase partition count of the monitor topic to ensure partition# >=
broker#. It can also reassign partition and trigger preferred leader election
to ensure that each broker acts as leader of at least one partition of the
monitor topic. This allows Xinfra Monitor to detect performance issue on every
broker without requiring users to manually manage the partition assignment of
the monitor topic.

Xinfra Monitor is used in conjunction with different middle-layer services such as li-apache-kafka-clients in order to monitor single clusters, pipeline desination clusters, and other types of clusters as done in Linkedin engineering for real-time cluster healthchecks.

These are some of the metrics emitted from a Xinfra Monitor instance.

  1. kmf:type=kafka-monitor:offline-runnable-count
  2. kmf.services:type=produce-service,name=*:produce-availability-avg
  3. kmf.services:type=consume-service,name=*:consume-availability-avg
  4. kmf.services:type=produce-service,name=*:records-produced-total
  5. kmf.services:type=consume-service,name=*:records-consumed-total
  6. kmf.services:type=produce-service,name=*:records-produced-rate
  7. kmf.services:type=produce-service,name=*:produce-error-rate
  8. kmf.services:type=consume-service,name=*:consume-error-rate
  9. kmf.services:type=consume-service,name=*:records-lost-total
  10. kmf.services:type=consume-service,name=*:records-lost-rate
  11. kmf.services:type=consume-service,name=*:records-duplicated-total
  12. kmf.services:type=consume-service,name=*:records-delay-ms-avg
  13. kmf.services:type=commit-availability-service,name=*:offsets-committed-avg
  14. kmf.services:type=commit-availability-service,name=*:offsets-committed-total
  15. kmf.services:type=commit-availability-service,name=*:failed-commit-offsets-avg
  16. kmf.services:type=commit-availability-service,name=*:failed-commit-offsets-total
  17. kmf.services:type=commit-latency-service,name=*:commit-offset-latency-ms-avg
  18. kmf.services:type=commit-latency-service,name=*:commit-offset-latency-ms-max
  19. kmf.services:type=commit-latency-service,name=*:commit-offset-latency-ms-99th
  20. kmf.services:type=commit-latency-service,name=*:commit-offset-latency-ms-999th
  21. kmf.services:type=commit-latency-service,name=*:commit-offset-latency-ms-9999th

Getting Started

Prerequisites

Xinfra Monitor requires Gradle 2.0 or higher. Java 7 should be used for
building in order to support both Java 7 and Java 8 at runtime.

Xinfra Monitor supports Apache Kafka 0.8 to 2.0:

  • Use branch 0.8.2.2 to work with Apache Kafka 0.8
  • Use branch 0.9.0.1 to work with Apache Kafka 0.9
  • Use branch 0.10.2.1 to work with Apache Kafka 0.10
  • Use branch 0.11.x to work with Apache Kafka 0.11
  • Use branch 1.0.x to work with Apache Kafka 1.0
  • Use branch 1.1.x to work with Apache Kafka 1.1
  • Use master branch to work with Apache Kafka 2.0

Configuration Tips


  1. We advise advanced users to run Xinfra Monitor with
    ./bin/xinfra-monitor-start.sh config/xinfra-monitor.properties. The default
    xinfra-monitor.properties in the repo provides an simple example of how to
    monitor a single cluster. You probably need to change the value of
    zookeeper.connect and bootstrap.servers to point to your cluster.



  2. The full list of configs and their documentation can be found in the code of
    Config class for respective service, e.g. ProduceServiceConfig.java and
    ConsumeServiceConfig.java.



  3. You can specify multiple SingleClusterMonitor in the xinfra-monitor.properties to
    monitor multiple Kafka clusters in one Xinfra Monitor process. As another
    advanced use-case, you can point ProduceService and ConsumeService to two different Kafka clusters that are connected by MirrorMaker to monitor their end-to-end latency.



  4. Xinfra Monitor by default will automatically create the monitor topic based on
    the e.g. topic-management.replicationFactor and topic-management.partitionsToBrokersRatio
    specified in the config. replicationFactor is 1 by default and you probably
    want to change it to the same replication factor as used for your existing
    topics. You can disable auto topic creation by setting produce.topic.topicCreationEnabled to false.



  5. Xinfra Monitor can automatically increase partition count of the monitor topic
    to ensure partition# >= broker#. It can also reassign partition and trigger
    preferred leader election to ensure that each broker acts as leader of at least
    one partition of the monitor topic. To use this feature, use either
    EndToEndTest or TopicManagementService in the properties file.



  6. When using Secure Sockets Layer (SSL) or any non-plaintext security protocol for AdminClient, please configure the following entries in the single-cluster-monitor props, produce.producer.props, as well as consume.consumer.props. https://docs.confluent.io/current/installation/configuration/admin-configs.html

    1. ssl.key.password

    2. ssl.keystore.location

    3. ssl.keystore.password

    4. ssl.truststore.location

    5. ssl.truststore.password


Build Xinfra Monitor

  1. $ git clone https://github.com/linkedin/kafka-monitor.git
  2. $ cd kafka-monitor
  3. $ ./gradlew jar

Start XinfraMonitor to run tests/services specified in the config file

  1. $ ./bin/xinfra-monitor-start.sh config/xinfra-monitor.properties

Run Xinfra Monitor with arbitrary producer/consumer configuration (e.g. SASL enabled client)

Edit config/xinfra-monitor.properties to specify custom configurations for producer in the key/value map produce.producer.props in
config/xinfra-monitor.properties. Similarly specify configurations for
consumer as well. The documentation for producer and consumer in the key/value maps can be found in the Apache Kafka wiki.

  1. $ ./bin/xinfra-monitor-start.sh config/xinfra-monitor.properties

Run SingleClusterMonitor app to monitor kafka cluster

Metrics produce-availability-avg and consume-availability-avg demonstrate
whether messages can be properly produced to and consumed from this cluster.
See Service Overview wiki for how these metrics are derived.

  1. $ ./bin/single-cluster-monitor.sh --topic test --broker-list localhost:9092 --zookeeper localhost:2181

Run MultiClusterMonitor app to monitor a pipeline of Kafka clusters connected by MirrorMaker

Edit config/multi-cluster-monitor.properties to specify the right broker and
zookeeper url as suggested by the comment in the properties file

Metrics produce-availability-avg and consume-availability-avg demonstrate
whether messages can be properly produced to the source cluster and consumed
from the destination cluster. See config/multi-cluster-monitor.properties for
the full jmx path for these metrics.

  1. $ ./bin/xinfra-monitor-start.sh config/multi-cluster-monitor.properties

Run checkstyle on the java code

  1. ./gradlew checkstyleMain checkstyleTest

Build IDE project

  1. ./gradlew idea
  2. ./gradlew eclipse

Wiki