当前位置:网站首页>Sentinel go, the same flow control degradation component of alishuang 11, is officially GA, contributing to the stability of cloud native services

Sentinel go, the same flow control degradation component of alishuang 11, is officially GA, contributing to the stability of cloud native services

2020-12-07 15:21:44 Alibaba cloud native

 The first figure .png

author | Zhao Yihao ( Where to stay )  Sentinel Open source project leader
source | Alibaba cloud official account

Preface

The stability of microservices has always been a topic of great concern to developers . With the evolution of business from single architecture to distributed architecture and the change of deployment mode , Dependencies between services are becoming more and more complex , Business systems are also facing huge high availability challenges .

1.png

In the production environment, you may have encountered various unstable situations , such as :

  • The system exceeds the maximum load due to the instantaneous peak flow during the large acceleration ,load soar , The system crashed and the user couldn't place an order .
  • “ Dark horse ” Hot commodity breakdown cache ,DB Be defeated , Squeeze the normal flow .
  • The caller is overwhelmed by unstable third-party services , Thread pool is full , Call stack , Cause the whole call link to be stuck .

These unstable scenarios can lead to serious consequences , But a lot of time we are easy to ignore these and traffic / Rely on relevant high availability protection . You may want to ask : How to prevent the influence of these unstable factors ? How to protect traffic from high availability ? How to guarantee service “ Stability is like a rock ”? At this time, we will ask for the high availability protection middleware of Ali double 11 —— Sentinel. In the past year, tmall double 11 In great haste ,Sentinel Perfect guarantee of Ali thousands of services double 11 Stability of peak flow , meanwhile Sentinel Go The version has also been officially announced recently GA. Let's get to know Sentinel Go And the community's exploration of cloud nativity .

Sentinel Introduce

Sentinel  Alibaba open source , Traffic control components for distributed service architecture , The main pointcut is traffic , From current limiting 、 Traffic shaping 、 Fusing the drop 、 System adaptive protection and other dimensions help developers to ensure the stability of microservices .Sentinel To undertake Alibaba near 10 Year of double 11 The core scenario of boosting traffic , For example, seckill 、 Cold start 、 Cut the peak and fill the valley 、 Cluster flow control 、 Real time fuse downstream unavailable service, etc , It is a powerful tool to ensure high availability of microservices , Native support Java/Go/C++ multilingual , And to provide Istio/Envoy Global flow control support for Service Mesh The ability to provide highly available protection .

2.png

At the beginning of the year ,Sentinel The community announced Sentinel Go Release of version , by Go The microservices and basic components of the language provide native support for high availability protection and fault tolerance , Mark the Sentinel A new step towards diversification and cloud Nativity . In this half a year , The community launched near 10 A version , Gradually align core high availability protection and fault tolerance , At the same time, it is also expanding the open source ecology , And dubbo-go、 Ant MOSN And open source communities to build together .

In the recent ,Sentinel Go 1.0 GA edition Official release , Mark the Go The version has officially entered the production stage .Sentinel Go 1.0 The versions are aligned Java High availability protection and fault tolerance of version core , Including current limiting 、 Traffic shaping 、 concurrency control 、 Fusing the drop 、 System adaptive protection 、 Hot spot protection and other features . meanwhile Go Version has covered the mainstream open source Ecology , Provides Gin、gRPC、go-micro、dubbo-go etc. The adaptation of common microservice frameworks , And provides etcd、Nacos、Consul etc. Dynamic data source extension support .Sentinel Go It's also evolving in the direction of cloud origin ,1.0 Some cloud native aspects are also explored in the version , Include Kubernetes CRD data-source, Kubernetes HPA etc. . about Sentinel Go In terms of version , The flow control scenario we expect is not limited to the micro service application itself . Cloud native infrastructure components Go Language ecology accounts for a high proportion , And these cloud native components often lack fine granularity 、 Adaptive protection and fault tolerance mechanism , At this time, you can combine some extension mechanisms of components , utilize Sentinel Go To protect your own stability .


Sentinel The bottom layer uses the high-performance sliding window to count the second level call index , combination token bucket, leaky bucket And adaptive flow control algorithm to reveal the core of the high availability of protection capabilities .

3.png

So how do we use Sentinel Go To ensure the stability of our micro service ? Let's take a look at several typical application scenarios .

The core scenario of high availability protection

1. Flow control and deployment

Traffic is very random 、 unpredictable . The last second may be calm , There may be a flood peak in the next second ( for example double 11 Zero point scene ). However, the capacity of our system is always limited , If the sudden flow exceeds the capacity of the system , It may cause the request to be unable to process , The processing of stacked requests is slow ,CPU/Load soar , Finally, the system crashes . therefore , We need to limit this burst of traffic , Handle requests as much as possible while ensuring that services are not overwhelmed , This is flow control . Flow control scenarios are very general , Scenarios like pulse flow are applicable .

Usually in Web Entry or service provider (Service Provider) In the scene of , We need to protect the service provider itself from the flow peaks . At this time, the flow control is usually based on the service capability of the service provider , Or for specific service callers . We can evaluate the bearing capacity of the core interface in combination with the previous pressure test , To configure QPS The flow control rules of the pattern , When the number of requests per second exceeds the set threshold , Will automatically reject redundant requests .

Here's the simplest one Sentinel Configuration example of current limiting rules :

_, err = flow.LoadRules([]*flow.Rule{
    {
        Resource:          "some-service", //  Buried resource name 
        Count:             10, //  The threshold for  10, The default is second level dimension statistics , That is, the request per second does not exceed  10  Time 
        ControlBehavior:   flow.Reject, //  The control effect is direct rejection , Do not control the time interval between requests , No line up 
    },
})

2. Warm-Up Preheating flow control

When the system is at low water level for a long time , When the flow suddenly increases , Directly raising the system to high water level may crush the system in an instant . For example, just launched services , The database connection pool may not be initialized yet , The cache is also empty , At this time, the surge of traffic is very easy to cause service crash . If the traditional current limiting mode is adopted , No smoothing / Peak clipping limit , In fact, there is a risk of being suspended ( For example, the concurrency is very high in a moment ). For this scenario , We can use it Sentinel Of Warm-Up Flow control mode , Control the flow through slowly increasing , Gradually increase to the upper limit of the threshold within a certain period of time , Not in a moment , At the same time, combined with request interval control + The control effect of queuing To prevent a large number of requests being processed at the same time . This gives the cooling system a warm-up time , Avoid crushing the cold system .

4.png

3. Concurrency control and fuse degradation

A service often calls other modules , Maybe another remote service 、 database , Or a third party API etc. . for example , At the time of payment , You may need to call the API; Query the price of a product , Database queries may be required . However , The stability of this dependent service is not guaranteed . If the dependent service is unstable , Request response times get longer , Then the response time of the method calling the service will be longer , Threads will pile up , Finally, the thread pool of the business itself may be exhausted , The service itself becomes unavailable .

5.png

Modern microservice architecture is distributed , It's made up of a lot of services . Different services call each other , Make up a complex call link . The above problems will produce amplification effect in the link call . A link on a complex link is unstable , It could cascade , Eventually, the whole link is unavailable .Sentinel Go The following capabilities are provided to avoid non availability caused by unstable factors such as slow calls :

  • concurrency control (isolation modular ): As a lightweight means of isolation , Control the number of concurrent calls ( That's the number in progress ), Prevent too many slow calls from crowding normal calls .
  • Fusing the drop (circuitbreaker modular ): Automatically fuse and demote unstable weak dependency calls , Temporarily cut off unstable calls , Avoid avalanche caused by local instability .

Sentinel Go The fuse degradation characteristic is based on the idea of fuse mode , There are unstable factors in the service ( If the response time becomes longer , The error rate goes up ) To temporarily cut off the service call , Wait a while and try again . On the one hand, we should prevent unstable services “ worse ”, On the other hand, the service is not protected .Sentinel Support two fuse strategies : Based on response time ( Slow call ratio ) And based on error ( Error ratio / Wrong number ), It can effectively protect against various unstable scenes .

6.png

Note that the fuse is suitable for general mode Weak dependency calls , That is, the main process of the business will not be affected after the downgrade , Developers need to design the degraded fallback Logic and return values . Another thing to note , Even if the service caller introduces the fuse degradation mechanism , We still need to be in HTTP or RPC Client configuration request timeout , To make a protective cover .

4. Hot spot protection

The traffic is random , unpredictable . In order to prevent being overwhelmed by the high traffic , We usually configure current limiting rules for core interfaces , But in some scenarios, it is not enough to configure common flow control rules . Let's look at a scene like this —— When the peak value is greatly promoted , There will always be a lot of “ hotspot ” goods , These hot products have very high instantaneous visits . In general , We can predict a wave of hot goods in advance , And cache the product information “ preheating ”, So that when there are a large number of visits, you can quickly return without hitting DB On . But every time there is a big push, there are some “ Dark horse ” goods , these “ Dark horse ” Commodities are things we can't predict in advance , Not warmed up . When these “ Dark horse ” When there's a surge in product visits , A large number of requests can crash the cache , Direct call DB layer , Lead to DB The visit was slow , Crowding the pool of resources for normal commodity requests , Finally, it may cause the system to hang up . Now , utilize Sentinel Of Hot parameter flow control , Automatically identify hotspot parameters and control access to each hotspot value QPS Or concurrency , Can effectively prevent “ heat ” Parameter access to the normal call resource .

7.png

For example, in some scenarios, we want to restrict each user to call a certain API The frequency of , take API name +userId As a buried resource name is obviously not appropriate . At this time, we can give API When burying the spot, pass through  WithArgs(xxx)  take userId Passed in as a parameter to API Buried in the ground , Then configure the hotspot rules to limit the call frequency for each user ; meanwhile ,Sentinel It is also supported to configure the current limiting value separately for some specific values , Fine flow control . Like any other rule , Hot flow control rules also support dynamic configuration through dynamic data sources .
Sentinel Go Provided RPC Framework integration module ( Such as Dubbo、gRPC) Will automatically change RPC The parameter list of the call is attached to the buried point , Users can directly configure the hot spot flow control rules according to the corresponding parameter location . Note that if you need to configure specific value current limiting , Restricted by the type system , Currently, only basic types and string type .

Sentinel Go Hot traffic control based on cache elimination mechanism + The token bucket mechanism is implemented .Sentinel Through the elimination mechanism ( Such as LRU、LFU、ARC Strategy, etc. ) To identify hot spot parameters , The token bucket mechanism is used to control the access amount of each hotspot parameter . current Sentinel Go Version adopts LRU Policy statistics hotspot parameters , The community has also submitted contributions to optimize the elimination mechanism PR, In future versions, the community will introduce more cache elimination mechanisms to adapt to different scenarios .

5. System adaptive protection

With the above traffic protection scenarios , Is everything ok ? It's not , Most of the time, we can't accurately evaluate the exact capacity of an interface in advance , You can't even predict the traffic characteristics of the core interface ( If there is pulse or not ), At this time, the pre configured rules may not be able to effectively protect the current service node ; In some cases, we may suddenly find the machine's Load and CPU usage When it starts to soar , But there is no way to quickly identify what causes , It's too late to handle the exception . What we need to do at this time is to stop the loss quickly , Let's go through some Automatic means of protection , Microservices on the verge of collapse “ PULL ” Come back . For these situations ,Sentinel Go A system adaptive protection rule is provided , Combining system metrics and service capacity , Adaptive dynamic adjustment of traffic .

8.png

Sentinel The adaptive protection strategy of the system draws lessons from TCP BBR The idea of algorithm , Combined with systematic Load、CPU And the usage rate of the service QPS、 Response time, concurrency and other dimensions of monitoring indicators , Through adaptive flow control strategy , Make the system's inlet flow and the system's load reach a balance , Let the system run at the maximum throughput and ensure the overall stability of the system . System rules can be used as a protection strategy for the whole service , Guarantee service does not hang up , Yes CPU Dense scenes will have a better effect . meanwhile , The community is also combining automation control theory with reinforcement learning , Continuously improve the effect and applicable scenarios of adaptive flow control . In future versions , The community will also launch more experimental adaptive flow control strategies , To meet more usability scenarios .

Cloud native exploration

Cloud primitives are Sentinel Go The most important part of version evolution . stay GA In the process of ,Sentinel Go The community is also Kubernetes and Service Mesh Some explorations have been made in the scene .

1. Kubernetes CRD data-source

In the production environment, we generally need to dynamically manage various rule configurations through the configuration center . stay Kubernetes In the cluster , We can use it naturally Kubernetes CRD To manage the application of Sentinel The rules . stay Go 1.0.0 The community provides basic Sentinel The rules CRD Abstraction and the corresponding Data source implementation . Users just need to import Sentinel The rules CRD The definition file , Access Sentinel Register the corresponding data-source, And then according to CRD Define the format of writing YAML Configure and kubectl apply To the corresponding namespace Dynamic configuration rules can be realized in the following . Here is an example of a flow control rule :

apiVersion: datasource.sentinel.io/v1alpha1
kind: FlowRules
metadata:
  name: foo-sentinel-flow-rules
spec:
  rules:
    - resource: simple-resource
      threshold: 500
    - resource: something-to-smooth
      threshold: 100
      controlBehavior: Throttling
      maxQueueingTimeMs: 500
    - resource: something-to-warmup
      threshold: 200
      tokenCalculateStrategy: WarmUp
      controlBehavior: Reject
      warmUpPeriodSec: 30
      warmUpColdFactor: 3

Kubernetes CRD data-source Module address https://github.com/sentinel-group/sentinel-go-datasource-k8s-crd

The follow-up community will be further improved Rule CRD Define and discuss with other communities the abstractions of standards related to high availability protection .

2. Service Mesh

Service Mesh It is one of the trends of microservices evolving to cloud . stay Service Mesh Under the architecture , Some service governance and policy control capabilities are gradually sinking to data plane layer . last year Sentinel Community in Java 1.7.0 There are some attempts in the version , Provides Envoy Global Rate Limiting gRPC Service The implementation of the —— Sentinel RLS token server, With the help of Sentinel Cluster current limiting token server for Envoy Service grid provides cluster traffic control capability . As this year Sentinel Go The birth of the version , Community and more Service Mesh Product cooperation 、 Integrate . We're with ants MOSN Community building , stay MOSN Mesh Zhongyuan Sheng supported Sentinel Go Flow control degradation capability of , It's also landing inside the ants . The community is also exploring more general solutions , Such as utilization Istio Of Envoy WASM Extension mechanism implementation Sentinel plug-in unit , Give Way Istio/Envoy Service grid can use Sentinel The original flow control degradation and adaptive protection capabilities to ensure the stability of the entire cluster services .

9.png

3. Kubernetes HPA based on Sentinel metrics

There are many ways to ensure service stability , In addition to various rules for traffic “ control ” outside ,“ elastic ” It's also a way of thinking . For deployment in Kubernetes In the application of , You can use Kubernetes HPA Ability to carry out horizontal expansion and reduction of services .HPA Support multiple system indicators by default , And support user-defined index statistics . At present, we are already in Alibaba cloud Kubernetes The container service is combined with AHAS Sentinel Support service based averaging QPS、 The response time is used as a condition for elastic expansion . The community is trying to do something about it , Will some Sentinel Service level index statistics of ( Through the amount of , Rejection amount , Response time, etc ) adopt Prometheus or OpenTelemetry The standard way to reveal , And adapt to Kubernetes HPA in .

10.png

Based on Sentinel 's flexibility is not a panacea , It only applies to certain scenarios , For example, it is suitable for stateless services with fast startup speed (Serverless scene ). For services that start slowly , Or non service capacity problems ( If you depend on DB Not enough capacity ), Elastic solutions can't solve the problem of stability well , It may even exacerbate the deterioration of services .

Let's start hacking!

Understand the above scenarios of high availability protection , as well as Sentinel Some exploration in the cloud native direction , I believe you have a new understanding of the fault tolerance and stability of microservices . Welcome to play with it demo, Connect microservices to Sentinel To enjoy high availability protection and fault tolerance , Let service “ Stability is like a rock ”. meanwhile Sentinel Go 1.0 GA The release of the version is inseparable from the contribution of the community , Thank you all for your contribution .

This time GA We have also joined two awesome ones committer —— @sanxun0325 and  @luckyxiaoqiang, You are in 1.0 The evolution of versions has brought about Warm Up Flow control 、Nacos Dynamic data sources and a series of functional improvements and performance optimizations , Very active in helping the community answer questions and questions and review Code . Congratulations to you ! The community will continue to explore and evolve towards cloud native and adaptive intelligence in future versions , More students are welcome to join the contribution group , Join in Sentinel The evolution of the future , Create infinite possibilities . We encourage any kind of contribution , Including but not limited to :

  • bug fix
  • new features/improvements
  • dashboard
  • document/website
  • test cases

Developers can use the GitHub above good first issue Choose the ones you are interested in on the list issue To participate in discussions and contribute . We will focus on developers who are actively contributing , Core contributors will be nominated as Committer, Leading the development of the community together . We also welcome any questions and suggestions , Both can pass GitHub issue、Gitter Or a bunch of nails ( Group number :30150716) And so on .Now start hacking!

版权声明
本文为[Alibaba cloud native]所创,转载请带上原文链接,感谢
https://chowdera.com/2020/12/20201207150442489g.html