项目作者: open-cluster-management

项目描述 :
Operator for monitor the health of your managed clusters
高级语言: Go
项目地址: git://github.com/open-cluster-management/multicluster-observability-operator.git
创建时间: 2020-03-26T02:41:14Z
项目社区:https://github.com/open-cluster-management/multicluster-observability-operator

开源协议:Apache License 2.0

下载


Observability Overview

Build
Quality Gate Status

This document attempts to explain how the different components in Open Cluster Management Observabilty come together to deliver multicluster fleet observability. We do leverage several open source projects: Grafana, Alertmanager, Thanos, Observatorium Operator and API Gateway, Prometheus; We also leverage a few Open Cluster Mangement projects namely - Cluster Manager or Registration Operator, Klusterlet. The multicluster-observability operator is the root operator which pulls in all things needed.

Conceptual Diagram

Conceptual Diagram of the Components

Associated Github Repositories

Component Git Repo Description
MCO Operator multicluster-observability-operator Operator for monitoring. This is the root repo. If we follow the Readme instructions here to install, the code from all other repos mentioned below are used/referenced.
Endpoint Operator endpoint-metrics-operator Operator that manages setting up observability and data collection at the managed clusters.
Observatorium Operator observatorium-operator Operator to deploy the Observatorium project. Inside the open cluster management, at this time, it means metrics using Thanos. Forked from main observatorium-operator repo.
Metrics collector metrics-collector Scrapes metrics from Prometheus at managed clusters, the metric collection being shaped by configuring allow-list.
RBAC Proxy rbac_query_proxy Helper service that acts a multicluster metrics RBAC proxy.
Grafana grafana Grafana repo - for dashboarding and metric analytics. Forked from main grafana repo.
Dashboard Loader grafana-dashboard-loader Sidecar proxy to load grafana dashboards from configmaps.
Management Ingress management-ingress NGINX based ingress controller to serve Open Cluster Management services.
Observatorium API observatorium API Gateway which controls reading, writing of the Observability data to the backend infrastructure. Forked from main observatorium API repo.
Thanos Ecosystem kube-thanos Kubernetes specific configuration for deploying Thanos. The observatorium operator leverages this configuration to deploy the backend Thanos components.

Quick Start Guide

Prerequisites

Note: By default, the API conversion webhook use on the OpenShift service serving certificate feature to manage the certificate, you can replace it with cert-manager if you want to run the multicluster-observability-operator in a kubernetes cluster.

Use the following quick start commands for building and testing the multicluster-observability-operator:

Clone the Repository

Check out the multicluster-observability-operator repository.

  1. git clone git@github.com:stolostron/multicluster-observability-operator.git
  2. cd multicluster-observability-operator

Build the Operator

Build the multicluster-observability-operator image and push it to a public registry, such as quay.io:

  1. make docker-build docker-push IMG=quay.io/<YOUR_USERNAME_IN_QUAY>/multicluster-observability-operator:latest

Run the Operator in the Cluster

  1. Create the open-cluster-management-observability namespace if it doesn’t exist:
  1. kubectl create ns open-cluster-management-observability
  1. Deploy the minio service which acts as storage service of the multicluster observability:
  1. kubectl -n open-cluster-management-observability apply -k examples/minio
  1. Replace the operator image and deploy the multicluster-observability-operator:
  1. make deploy IMG=quay.io/<YOUR_USERNAME_IN_QUAY>/multicluster-observability-operator:latest
  1. Deploy the multicluster-observability-operator CR:
  1. kubectl apply -f operators/multiclusterobservability/config/samples/observability_v1beta2_multiclusterobservability.yaml
  1. Verify all the components for the Multicluster Observability are starting up and running:
  1. kubectl -n open-cluster-management-observability get pod
  2. NAME READY STATUS RESTARTS AGE
  3. minio-79c7ff488d-72h65 1/1 Running 0 9m38s
  4. observability-alertmanager-0 3/3 Running 0 7m17s
  5. observability-alertmanager-1 3/3 Running 0 6m36s
  6. observability-alertmanager-2 3/3 Running 0 6m18s
  7. observability-grafana-85fdc8c48d-j67j6 2/2 Running 0 7m17s
  8. observability-grafana-85fdc8c48d-wnltt 2/2 Running 0 7m17s
  9. observability-observatorium-api-69cfff4c95-bpw5s 1/1 Running 0 7m2s
  10. observability-observatorium-api-69cfff4c95-gbh7b 1/1 Running 0 7m2s
  11. observability-observatorium-operator-5df6b7949c-kbpmp 1/1 Running 0 7m17s
  12. observability-rbac-query-proxy-d44df47c4-9ccdn 2/2 Running 0 7m15s
  13. observability-rbac-query-proxy-d44df47c4-rtcgh 2/2 Running 0 6m50s
  14. observability-thanos-compact-0 1/1 Running 0 7m2s
  15. observability-thanos-query-79c4d9488b-bd5sf 1/1 Running 0 7m3s
  16. observability-thanos-query-79c4d9488b-d7wzt 1/1 Running 0 7m3s
  17. observability-thanos-query-frontend-6fdb5d4946-rgblb 1/1 Running 0 7m3s
  18. observability-thanos-query-frontend-6fdb5d4946-shsz2 1/1 Running 0 7m3s
  19. observability-thanos-query-frontend-memcached-0 2/2 Running 0 7m3s
  20. observability-thanos-query-frontend-memcached-1 2/2 Running 0 6m37s
  21. observability-thanos-query-frontend-memcached-2 2/2 Running 0 6m33s
  22. observability-thanos-receive-controller-6b446c5576-hj6xl 1/1 Running 0 7m3s
  23. observability-thanos-receive-default-0 1/1 Running 0 7m2s
  24. observability-thanos-receive-default-1 1/1 Running 0 6m20s
  25. observability-thanos-receive-default-2 1/1 Running 0 5m50s
  26. observability-thanos-rule-0 2/2 Running 0 7m3s
  27. observability-thanos-rule-1 2/2 Running 0 6m27s
  28. observability-thanos-rule-2 2/2 Running 0 5m56s
  29. observability-thanos-store-memcached-0 2/2 Running 0 7m3s
  30. observability-thanos-store-memcached-1 2/2 Running 0 6m37s
  31. observability-thanos-store-memcached-2 2/2 Running 0 6m33s
  32. observability-thanos-store-shard-0-0 1/1 Running 2 7m3s
  33. observability-thanos-store-shard-1-0 1/1 Running 2 7m3s
  34. observability-thanos-store-shard-2-0 1/1 Running 2 7m3s

What is next

After a successful deployment, you can run the following command to check if you have OCP cluster as a managed cluster.

  1. kubectl get managedcluster --show-labels

If there is no vendor=OpenShift label exists in your managed cluster, you can manually add this label with this command kubectl label managedcluster <managed cluster name> vendor=OpenShift

Then you should be able to have metrics-collector pod is running:

  1. kubectl -n open-cluster-management-addon-observability get pod
  2. endpoint-observability-operator-5c95cb9df9-4cphg 1/1 Running 0 97m
  3. metrics-collector-deployment-6c7c8f9447-brpjj 1/1 Running 0 96m

Expose the thanos query frontend via route by running this command:

  1. cat << EOF | kubectl -n open-cluster-management-observability apply -f -
  2. kind: Route
  3. apiVersion: route.openshift.io/v1
  4. metadata:
  5. name: query-frontend
  6. spec:
  7. port:
  8. targetPort: http
  9. wildcardPolicy: None
  10. to:
  11. kind: Service
  12. name: observability-thanos-query-frontend
  13. EOF

You can access the thanos query UI via browser by inputting the host from oc get route -n open-cluster-management-observability query-frontend. There should have metrics available when you search the metrics :node_memory_MemAvailable_bytes:sum. The available metrics are listed here

Uninstall the Operator in the Cluster

  1. Delete the multicluster-observability-operator CR:
  1. kubectl -n open-cluster-management-observability delete -f operators/multiclusterobservability/config/samples/observability_v1beta2_multiclusterobservability.yaml
  1. Delete the multicluster-observability-operator:
  1. make undeploy
  1. Delete the minio service:
  1. kubectl -n open-cluster-management-observability delete -k examples/minio
  1. Delete the open-cluster-management-observability namespace:
  1. kubectl delete ns open-cluster-management-observability

Rebuild Image: Wed Jan 25 15:08:26 EST 2023