当前位置:网站首页>What kind of monitoring can really show that there is something wrong with the system?

What kind of monitoring can really show that there is something wrong with the system?

2020-12-07 06:27:22 58 Shen Jian

Monitoring does not alarm , There must be no problem with the system ? What kind of monitoring , It really shows that there is something wrong with the system ? Today, I'd like to talk to you about multi-dimensional monitoring .

 

What is multidimensional stereoscopic monitoring ?

Different companies have more or less automated monitoring tools , for example :

(1)http Interface monitoring ;

(2)log Keyword monitoring ;

(3) operating system , process , port ;

(4)http Status code ;

(5) Service survives ;

(6) Interface processing time ;

(7)RPC Interface monitoring ;

(8) User level monitoring ;

 

If you monitor only one or a few dimensions :

(1) When an exception is detected , I'm pretty sure there's something wrong with the system ;

(2) In turn, , No anomalies are detected , I can't be sure there's nothing wrong with the system ;

 

for example :

(1) Monitoring the operating system CPU100%, There's something wrong with the system , but CPU normal , It doesn't mean the system is working , for example tomcat Hang up ,CPU It must be normal , But operating system monitoring doesn't detect , So the process is needed , port , Other monitoring, such as survivability, can assist ;

(2) process , Port monitoring exception , There's something wrong with the system , But the process is running , The port is listening , It doesn't mean the system is working , For example, program deadlock , Processes and ports are normal , So we need the interface processing time and other monitoring to assist ;

(3) Interface processing time monitoring to timeout , There's something wrong with the system , But the interface processing time does not time out , It doesn't mean the system is working , For example, the database is hung up , Database connection cannot be obtained , Each interface in the service layer returns quickly , No overtime ;

 

there Point of view yes : Single dimension monitoring is easy to miss the report , Multidimensional and three-dimensional monitoring is the fundamental way of monitoring platform .

 

The two articles mentioned above :

How to be in 12 Hours , Get it done http monitor ?

How to be in 12 Hours , Finish the log monitoring ?

In design, they all pay attention to general use + Scalable .

The next four dimensions of monitoring , In design, we also value “ Universal ”“ Non-invasive ”, That is, the monitored sites and services do not need any burial point , No changes needed , The person in charge of the monitored module does not need to cooperate with anything , You can do it in all directions cover live .

 

Dimension one , How to operate the operating system , process , Port monitoring ?

Monitoring requirements

(1) Systematic The Internet Whether it's full or not , disk Whether there is space ,CPU Is it busy , Memory Whether to use up , load Is the value too high ,JVM If there is something wrong ;

(2) Whether the service process is running ;

(3) Whether the listening port is normal ;

(4) Whether the machines are connected ;

 

Common scheme one :zabbix

People who do operation and maintenance understand , No more details , Talk too much, afraid of being scolded .

 

Common plan two :shell

Write some very simple scripts , You can get access to the Internet 、 disk 、CPU、 Memory 、load、JVM Information about , In combination with some threshold configuration , It can realize the function of over threshold alarm .

If it is combined with cluster information management service , adopt ps, netstat, telnet Wait for the order , It can also quickly implement the process , port , Simple monitoring of connectivity .

 

To achieve the point

(1) Focus on scalability , Configurability , Non-invasive ;

(2) Cluster information management service ( perhaps , Cluster information profile );

 

Dimension two , How to do 404 Status code monitoring ?

Monitoring requirements : monitor http Abnormal status code .

 

Monitoring plan :nginx Log unified monitoring

If it's done http Interface unified monitoring ,404 The need for monitoring is not so strong , But after all, the implementation is simple , It doesn't take much time to make a universal one .

 

In chat activity monitoring , Before interface processing time monitoring , More about the system architecture , If the framework and components are unified , Unified monitoring can save a lot of effort .

The figure above is a typical hierarchical architecture of the Internet :

(1) The upstream is APP and browser;

(2) Reverse agent layer yes nginx, Unified http404 State code monitoring is implemented in this layer ;

(3)web layer , Let's assume that we have studied web-framework;

(4)service layer , Let's assume that we have studied service-framework,web Layers will pass through RPC-client call service;

(5) The data layer db, Let's assume that we have studied Daojia-DAO Component calls db;

(6) Cache layer cache, Let's assume that we have studied Daojia-KV Component calls cache;

 

D-DAO and D-KV The two components are not as complicated as you think , At the beginning, it was just a simple package .

 

Dimension three , How to monitor service survivability ?

Monitoring requirements : Process and port monitoring , Only ensure that the process is in , Port in , But it's not certain whether the service can respond to the request , Need to identify services “ Alive ”.

 

Monitoring plan :ping-pong Type monitoring , In the site framework , Unified implementation of service framework level , Provide keepalive Interface :

(1) At the framework level ping-pong Interface ;

(2) The monitoring center manages services through cluster information ( Or a configuration file ) Get cluster type (web/service), colony IP list ;

(3) The monitoring center sends the built-in ping-pong request ;

 

Emphasize two points

(1) If the open source framework doesn't provide ping-pong Interface , It can be redeveloped ( Be careful , Secondary development of any open source framework , It's the beginning of the pit );

(2) Unified cluster information management service , perhaps , Unified cluster information management profile , It's really important , It is the cornerstone of the unity of technology system ;

 

Dimension four , How to monitor the interface execution time ?

Monitoring requirements

(1)http Does the site interface time out ;

(2)RPC Does the service interface time out ;

(3)db Did the visit time out ;

(4)cache Did the visit time out ;

(5) Except for overtime , Also monitor the execution time of the same interface 、 The big fluctuation of month on month , for example : The average response time of an interface is 100ms, Suddenly one day it increased to 300ms, Even if there's no overtime , There is also reason to suspect that there is a problem with the interface ;

 

Monitoring plan : Unified reporting of framework components ( Pictured above 1,2,3,4).

(1) stay web-framework in , For all http Interface for data reporting , You can report url, Parameters , execution time And core data ;

(2) stay service-framework in , For all RPC Interface for data reporting , You can report Interface , Parameters , execution time And core data ;

(3) stay DAO in , For all databases SQL Access for data reporting , You can report sql, Parameters , execution time And core data ;

(4) stay KV-client in , For all cache Access for data reporting , You can report key, execution time And core data ;

Unified reporting is the way of thinking , Details of reporting , It's through flume Brush the log , still storm/spark Real time streaming , Fine .

 

summary

Monitoring is a technical activity :

(1) The idea of monitoring platform is Multidimensional three-dimensional monitoring ;

(2)“ Unified operating system 、http404, Service survives , Interface processing time ” The design core of the four categories of unified monitoring is “ Non-invasive ”, No one is required to modify it , A technology platform that can realize many functions , It's a good technology platform ;

(3) Unified cluster information management service , Unified personnel information management service , Unified alarm policy service ( Or configuration files ), Is the cornerstone of a unified technology system ;

Architect's way - Share technical ideas

Thinking is more important than conclusion , I hope you all get something .

research

Your monitoring , What aspects are covered ?

版权声明
本文为[58 Shen Jian]所创,转载请带上原文链接,感谢
https://chowdera.com/2020/12/202012070626198187.html