summary

Following last article Thanos Deployment and practice More than half a year after the release , With the development of technology , This series has another update . This article will show how to combine Kvass And Thanos, To better achieve the monitoring of large-scale container cluster scenarios .

Yes Thanos Not enough ?

Some students may ask ,Thanos No, it's to solve Prometheus The distributed problem of , With Thanos No, we can achieve large-scale Prometheus Did you monitor it ? Why do you need another Kvass?
Thanos It's solved Prometheus The problem of distributed query and storage , But it didn't solve Prometheus The problem of distributed collection , If there are too many tasks and data collected , It will still make Prometheus The bottleneck reached , But for this question , We're in the first part of the series On a large scale Prometheus The means of optimization Some optimization methods are described in :

  1. Split the collection task from the service dimension to different Prometheus example .
  2. Use Prometheus Self contained hashmod Slice the acquisition task .

however , These optimization methods still have some disadvantages :

  1. Complicated configuration , Every Prometheus The collection configuration of the instance needs to be configured separately .
  2. You need to estimate the data size in advance to configure .
  3. Different Prometheus Instance collection tasks are different , The load is likely to be uneven , If the control is not good, some instances may be overloaded .
  4. If you need to Prometheus Expand and shrink , It needs to be adjusted manually , Can't automatically expand and shrink .

Kvass It was born to solve these problems , It's also the focus of this article .

What is? Kvass ?

Kvass The project is the lightweight of Tencent cloud open source Prometheus Horizontal expansion and reduction scheme , It cleverly separates service discovery from collection , And use Sidecar Dynamic to Prometheus Generate configuration files , So as to achieve different without manual configuration Prometheus Collect the effects of different tasks , And it can load balance the collection task , To avoid part of Prometheus The instance load is too high , Even if the load is high, it can be automatically expanded , Combined with Thanos Global view of , It is easy to build a super large cluster monitoring system using only one configuration file . Here is Kvass+Thanos The architecture of the figure :

img

More about Kvass Detailed introduction , Please refer to How to use Prometheus Monitoring 100000 container Of Kubernetes colony , This paper introduces the principle and application effect in detail .

The deployment of practice

preparedness

First download Kvass Of repo And enter examples Catalog :

git clone https://github.com/tkestack/kvass.git
cd kvass/examples

In the deployment Kvass Before that, we need to have service exposure indicators in order to collect , We provided one metrics Data generator , You can specify that a certain number of series, In this case , We will deploy 6 individual metrics Generator copy , Each generates 10045 series, One click deployment to the cluster :

kubectl create -f  metrics.yaml

Deploy Kvass

Then we'll deploy Kvass:

kubectl create -f kvass-rbac.yaml # Kvass  The required  RBAC  To configure 
kubectl create -f config.yaml # Prometheus  The configuration file 
kubectl create -f coordinator.yaml # Kvass coordinator  Deployment configuration 

among ,config.yaml Of Prometheus The configuration file , With the deployment of the metrics Acquisition of generator :

global:
  scrape_interval: 15s
  evaluation_interval: 15s
  external_labels:
    cluster: custom
scrape_configs:
- job_name: 'metrics-test'
  kubernetes_sd_configs:
    - role: pod
  relabel_configs:
  - source_labels: [__meta_kubernetes_pod_label_app_kubernetes_io_name]
    regex: metrics
    action: keep
  - source_labels: [__meta_kubernetes_pod_ip]
    action: replace
    regex: (.*)
    replacement: ${1}:9091
    target_label: __address__
  - source_labels:
    - __meta_kubernetes_pod_name
    target_label: pod

coordinator.yaml We give Coordinator Set the maximum of each partition in the startup parameter of head series No more than 30000:

--shard.max-series=30000

Then deploy Prometheus example ( contain Thanos Sidecar And Kvass Sidecar), You can start with a single copy :

kubectl create -f prometheus-rep-0.yaml

If you need to store data in an object store , Please refer to the previous article Thanos Deployment and practice Yes Thanos Sidecar The configuration is modified .

Deploy thanos-query

In order to get global data , We need to deploy one thanos-query:

kubectl create -f thanos-query.yaml

Based on the above calculation , Total monitoring targets 6 individual target, 60270 series, According to our settings, each slice cannot exceed 30000 series, It is expected that 3 A shard . We found that ,Coordinator Will succeed StatefulSet The number of copies changed to 3.

$ kubectl get pods
NAME                                READY   STATUS    RESTARTS   AGE
kvass-coordinator-c68f445f6-g9q5z   2/2     Running   0          64s
metrics-5876dccf65-5cncw            1/1     Running   0          75s
metrics-5876dccf65-6tw4b            1/1     Running   0          75s
metrics-5876dccf65-dzj2c            1/1     Running   0          75s
metrics-5876dccf65-gz9qd            1/1     Running   0          75s
metrics-5876dccf65-r25db            1/1     Running   0          75s
metrics-5876dccf65-tdqd7            1/1     Running   0          75s
prometheus-rep-0-0                  3/3     Running   0          54s
prometheus-rep-0-1                  3/3     Running   0          45s
prometheus-rep-0-2                  3/3     Running   0          45s
thanos-query-69b9cb857-d2b45        1/1     Running   0          49s

We'll pass it again thanos-query To view the global data , It was found that the data was complete ( among metrics0 Indicator name generated for the indicator generator ):

img

img

If you need to use Grafana Panel view monitoring data , You can add thanos-query Address as Prometheus data source : http://thanos-query.default.svc.cluster.local:9090.

Summary

This paper introduces how to combine Kvass And Thanos To realize the monitoring of super large scale container cluster , If you use Tencent cloud container service , You can directly use the Cloud native monitoring service , This service is based on Kvass Built products .