High concurrency , Experience that almost every programmer wants to have . The reason is simple ： As the flow increases , There are all kinds of technical problems , For example, the interface response timed out 、CPU load elevated 、GC frequent 、 Deadlock 、 Big data storage and so on , These problems can drive us to continuously improve the depth of technology .
In past interviews , If the candidate does too much concurrent projects , I usually ask them to talk about their understanding of high concurrency , But not many people can answer this question systematically , It can be divided into several categories ：
1、 There is no concept of data-based indicators ： It's not clear what metrics to choose to measure high concurrency systems ？ Can't tell the difference between concurrency and QPS, You don't even know the total number of users of your system 、 Active users , The peak of peace QPS and TPS And other key data .
2、 Designed some programs , But the details are not clear ： I can't tell you the technical points and possible side effects of the scheme . For example, if there is a bottleneck in reading performance, cache will be introduced , But the cache hit rate is ignored 、 hotspot key、 Data consistency and so on .
3、 One sided understanding , Equate high concurrency design with performance optimization ： Talk about concurrent programming 、 Multi level cache 、 Asynchronism 、 Levels increase , But ignore high availability design 、 Service governance and operation and maintenance guarantee .
4、 Master the big plan , But ignore the most basic things ： Can explain the vertical stratification clearly 、 Horizontal zoning 、 Cache, etc , But I don't know whether the data structure is reasonable , Whether the algorithm is efficient , I didn't think about it from the most fundamental IO And computing two dimensions to do detail optimization .
This article , I want to combine my experience in high concurrency projects , Systematically summarize the knowledge and practice of high concurrency , I hope it helped you . The content is divided into the following 3 Parts of ：
- How to understand high concurrency ？
- What is the goal of high concurrency system design ？
- What are the high concurrency practices ？
01 How to understand high concurrency ？
High concurrency means high traffic , Need to use technical means to resist the impact of traffic , These methods are like operating traffic , It can make the flow more smoothly handled by the system , Give users a better experience .
Our common high concurrency scenarios are ： Taobao's double 11、 Ticket grabbing during the Spring Festival 、 Weibo is big V Hot news, etc . Except for these typical things , A second kill system with hundreds of thousands of requests per second 、 Ten million orders per day 、 Information flow system, etc , Can be classified as high concurrency .
Obviously , The high concurrency scenario mentioned above , The amount of concurrency varies , How much concurrency is high concurrency ？
1、 You can't just look at numbers , It depends on the specific business scenario . Can't say 10W QPS Second kill is high concurrency , and 1W QPS Is not highly concurrent . Information flow scenarios involve complex recommendation models and various human strategies , Its business logic may be more complex than the seckill scenario 10 More than twice . therefore , Not in the same dimension , There's no comparison .
2、 Business is all from 0 To 1 It's done , Concurrency and QPS It's just a reference indicator , most important of all ： In the business volume gradually become the original 10 times 、100 In the process of doubling , Do you use highly concurrent processing to evolve your system , From architecture design 、 coded 、 Even product solutions and other dimensions to prevent and solve problems caused by high concurrency ？ Instead of just upgrading hardware 、 Add machines to do horizontal expansion .
Besides , The business characteristics of each high concurrency scenario are completely different ： There are more read and less write information flow scenarios 、 There are more read write more trading scenarios , Is there a general technical solution to solve the high concurrency problem in different scenarios ？
I think big ideas can be used for reference , Other people's plans can also refer to , But in the process of landing , There will be countless holes in the details . in addition , Because of the hardware and software environment 、 Technology stack 、 And product logic can't be completely consistent , All of these lead to the same business scenario , Even with the same technical solution, there will be different problems , These pits have to be drilled one by one .
therefore , In this article, I will focus on the basics 、 General idea 、 And the effective experience that I have practiced , Hope to let you have a deeper understanding of high concurrency .
02 What is the goal of high concurrency system design ？
First make clear the goal of high concurrency system design , On this basis, it is meaningful and pertinent to discuss the design scheme and practical experience .
2.1 Macro goals
High concurrency does not mean pursuing only high performance , This is one-sided understanding of many people . From a macro perspective , There are three goals for high concurrency system design ： High performance 、 High availability , And highly scalable .
1、 High performance ： The performance reflects the parallel processing ability of the system , With limited hardware investment , Improving performance means saving money . meanwhile , Performance also reflects the user experience , The response times are respectively 100 Milliseconds and 1 second , It's totally different for users .
2、 High availability ： It indicates the time when the system can be operated normally . A year-round non-stop 、 No fault ; Another accident happened on the outgoing line every three to five 、 Downtime , Users definitely choose the former . in addition , If the system can only do 90% You can use , It's also a big drag on business .
3、 High expansion ： It indicates the expansion ability of the system , Whether the expansion can be completed in a short period of time when the traffic peak , More stable to undertake peak flow , Such as double 11 Activities 、 Star divorce and other hot issues .
this 3 One goal needs to be considered as a whole , Because they're connected 、 And even interact with each other .
for instance ： Consider the scalability of the system , You will design services as stateless , This cluster design ensures high scalability , In fact, it also indirectly improves the performance and availability of the system .
Again for instance ： To ensure availability , The service interface is usually set to timeout , To prevent the system avalanche caused by a large number of threads blocking on slow requests , How much time-out is reasonable ？ commonly , We will refer to the performance performance of dependent services to set .
2.2 Micro goals
From a microscopic point of view , High performance 、 What are the specific indicators for high availability and high expansion ？ Why do we choose these indicators ？
2.2.1 Performance indicators
Performance metrics can be used to measure current performance problems , At the same time as the evaluation basis of performance optimization . Generally speaking , The interface response time over a period of time will be used as an indicator .
1、 Mean response time ： The most commonly used , But the flaws are obvious , Insensitive to slow requests . such as 1 Ten thousand requests , among 9900 Next is 1ms,100 Next is 100ms, The average response time is 1.99ms, Although the average time spent has only increased 0.99ms, however 1% The response time for requests has increased 100 times .
2、TP90、TP99 Equal quantile value ： Sort the response times from small to large ,TP90 Means to be in the 90 Quantile response time , The larger the quantile value , The more sensitive to slow requests .
3、 throughput ： It's inversely proportional to the response time , For example, the response time is 1ms, Then the throughput is per second 1000 Time .
Usually , Performance targets are set with both throughput and response time in mind , For example, to say ： Every second 1 Ten thousand requests ,AVG Control in 50ms following ,TP99 Control in 100ms following . For highly concurrent systems ,AVG and TP Quantile values must be taken into account at the same time .
in addition , From a user experience perspective ,200 Millisecond is considered the first cut-off point , Users don't feel the delay ,1 Second is the second cut-off point , Users can feel the delay , But it's acceptable .
therefore , For a healthy, highly concurrent system ,TP99 It should be controlled in 200 Within milliseconds ,TP999 perhaps TP9999 It should be controlled in 1 Within seconds .
2.2.2 Availability metrics
High availability means that the system has high fault free operation ability , Usability = Normal operation time / The total running time of the system , Generally, several are used 9 To describe the usability of the system .
For highly concurrent systems , The most basic requirement is ： Guarantee 3 individual 9 perhaps 4 individual 9. The reason is simple , If you can only do 2 individual 9, It means that there is 1% The breakdown time of , For example, some big companies often spend more than 100 billion yuan a year GMV Or income ,1% Namely 10 Billion level business impact .
2.2.3 Scalability metrics
In the face of sudden traffic , It's impossible to temporarily transform the architecture , The fastest way is to add machines to increase the processing power of the system linearly .
For business clusters or basic components , Extensibility = Performance improvement ratio / The proportion of machines increased , The ideal expansion capability is ： Resources have increased several times , Performance is improved several times . Generally speaking , The ability to expand should be maintained at 70% above .
But from the perspective of the overall architecture of high concurrency system , The goal of extensions is not just to design services as stateless , Because when the traffic increases 10 times , Business services can be expanded rapidly 10 times , But the database may become a new bottleneck .
image MySQL This kind of stateful storage service is usually a technical difficulty for expansion , If the architecture is not planned in advance （ Split vertically and horizontally ）, It will involve the migration of a large amount of data .
therefore , High scalability needs to be considered ： Service cluster 、 database 、 Middleware such as caching and message queuing 、 Load balancing 、 bandwidth 、 Dependent third parties, etc , When and developed to a certain order of magnitude , Each of these factors can be a bottleneck for expansion .
03 What are the high concurrency practices ？
Understanding of high concurrency design 3 After the big goal , And then systematically summarize the high concurrency design scheme , It will start from the following two parts ： First, summarize the general design method , And then around high performance 、 High availability 、 The high extension gives the concrete practice plan .
3.1 General design method
The general design method is mainly from 「 The longitudinal 」 and 「 The transverse 」 Starting from two dimensions , Commonly known as the two axes of high concurrency processing ： Vertical and horizontal expansion .
3.1.1 Vertical expansion （scale-up）
Its goal is to improve the processing capacity of a single machine , The plan also includes ：
1、 Improve the hardware performance of a single machine ： By increasing memory 、 CPU Check the number 、 storage capacity 、 Or put the disk Upgrade to SSD Wait for the pile to be hard Pieces of Of Fang type Come on promote .
2、 Improve the software performance of single machine ： Use cache to reduce IO frequency , Use concurrent or asynchronous methods to increase throughput .
3.1.2 Horizontal scaling （scale-out）
Because there are always limits to single machine performance , So finally, we need to introduce horizontal expansion , Cluster deployment is used to further improve the concurrent processing capability , It also includes the following 2 A direction ：
1、 Do a good job in layered architecture ： This is the advance of scale out , Because high concurrency systems often have complex business , Complex problems can be simplified by layering , It's easier to scale out .
The diagram above is the most common hierarchical architecture of the Internet , Of course, the real high concurrency system architecture will be further improved on this basis . For example, we can separate the static and dynamic and introduce CDN, The reverse proxy layer can be LVS+Nginx,Web Layers can be unified API gateway , Business service layer can further microservice according to vertical business , The storage layer can be a variety of heterogeneous databases .
2、 Horizontal expansion of each layer ： Stateless horizontal expansion , There is state to do piecewise routing . Business clusters can usually be designed as stateless , And databases and caches are often stateful , Therefore, it is necessary to design partition key to make storage partition , Of course, it can also be synchronized by master-slave 、 Read write separation scheme improves read performance .
3.2 The concrete practice plan
Let's combine my personal experience , For high performance 、 High availability 、 High expansion 3 In terms of , Summarize the practical solutions that can be implemented .
3.2.1 High performance practices
1、 Cluster deployment , Reduce the pressure of single machine by load balancing .
2、 Multi level cache , Including the use of static data CDN、 Local cache 、 Distributed cache, etc , As well as the hot spots in the cache scene key、 Cache penetration 、 Cache concurrency 、 Data consistency and other issues .
3、 Database, table and index optimization , And with the help of search engines to solve complex query problems .
4、 consider NoSQL Use of database , such as HBase、TiDB etc. , But the team has to be familiar with these components , And have strong operation and maintenance ability .
5、 Asynchronism , Passing secondary processes through multithreading 、MQ、 Even delay tasks for asynchronous processing .
6、 Current limiting , It is necessary to consider whether the service is allowed to limit current （ For example, the second kill scene is allowed ）, Including front-end current limiting 、Nginx Current limiting of access layer 、 Service side current limiting .
7、 On the flow of Peak shaving and valley filling , adopt MQ Take the flow .
8、 Concurrent processing , Multithreading will be serialized through .
9、 Precomputation , For example, the scene of robbing red envelopes , You can calculate the amount of red packets in advance and cache them , When you send a red packet, you can use it directly .
10、 Cache preheating , Through asynchrony Mission advance Warm up data to local cache or distributed cache .
11、 Reduce IO frequency , For example, batch read and write of database and cache 、RPC Batch interface support for 、 Or take out by redundant data RPC call .
12、 Reduce IO The size of the data packet when , Including the use of lightweight communication protocols 、 The right data structure 、 Remove redundant fields from the interface 、 Reduce cache key Size 、 Compressed cache value etc. .
13、 Program logic optimization , For example, the judgment logic of blocking the execution process of probability is put forward 、For Loop computing logic optimization , Or a more efficient algorithm .
14、 The use of various pooling technologies and the setting of pool size , Include HTTP Request pool 、 Thread pool （ consider CPU Intensive or IO Set the core parameters intensively ）、 Database and Redis Connection pool, etc .
15、JVM Optimize , Including the size of the new and old generations 、GC The choice of algorithm, etc , Reduce... As much as possible GC Frequency and time .
16、 Lock selection , Read more and write less with optimistic lock , Or consider reducing lock conflicts by segmenting locks .
The above plan is nothing but calculation and IO Two dimensions consider all possible optimization points , It is necessary to have a supporting monitoring system to understand the current performance in real time , And support you for performance bottleneck analysis , And then follow the 28 principles , Grasp the main contradiction and optimize .
3.2.2 High availability practices
1、 Failure over of peer nodes ,Nginx Both the service governance framework and the service governance framework support the access of one node to another node after failure .
2、 Failure over of non peer nodes , Through the heartbeat detection and implementation of the main standby switch （ such as redis Sentry mode or cluster mode of 、MySQL And so on ）.
3、 Interface level timeout settings 、 Retrial strategy and idempotent design .
4、 Degraded processing ： Core services , Sacrifice non core services , Fuse if necessary ; Or when there is a problem with the core link , There are alternative links .
5、 Current limiting treatment ： Directly reject or return the error code to the request that exceeds the processing capacity of the system .
6、MQ Message reliability assurance for scenarios , Include producer The retrying mechanism of the client 、broker Side persistence 、consumer Terminal ack Mechanism, etc .
7、 Grayscale Publishing , It can support small traffic deployment by machine dimension , Observe system logs and business indicators , Wait until the operation is stable, and then push the whole amount .
8、 Monitoring and alarming ： Comprehensive monitoring system , Including the most basic CPU、 Memory 、 disk 、 Network monitoring , as well as Web The server 、JVM、 database 、 All kinds of middleware monitoring and business indicators monitoring .
9、 Disaster recovery drill ： Similar to the current “ Chaos Engineering ”, Do something destructive to the system , Observe if local failures cause usability problems .
High availability solutions mainly come from redundancy 、 trade-offs 、 The system operational 3 Think about it in one direction , At the same time, it is necessary to have a matching on duty mechanism and troubleshooting process , When there is an online problem , It can be followed up in time .
3.2.3 Highly scalable practices
1、 Reasonable hierarchical architecture ： For example, the most common hierarchical architecture of the Internet mentioned above , In addition, we can further access the layer according to the data 、 The business logic layer provides a more fine-grained layer of microservices （ But performance needs to be evaluated , There will be more hops in the network ）.
2、 Split the storage layer ： Vertical split according to business dimension 、 Further horizontal split according to the data feature dimension （ Sub database and sub table ）.
3、 Business layer split ： The most common method is to split... According to the business dimension （ For example, goods and services in the e-commerce scene 、 Order service, etc ）, It can also be disassembled according to the core interface and non core interface , You can also disassemble according to the request （ such as To C and To B,APP and H5 ）.
At the end
High concurrency is really a complex and systematic problem , Due to limited space , Such as distributed Trace、 Full link piezometry 、 Flexible transactions are all technical points to be considered . in addition , If the business scenario is different , There are also differences in high concurrency landing schemes , But the overall design idea is basically similar to the scheme that can be used for reference .
High concurrency design should also adhere to the architecture design 3 Principles ： Simple 、 Fit and evolve ." Premature optimization is the root of all evils ", It can't be divorced from the actual situation of the business , And don't over design , The right solution is the perfect .
I hope this article can give you a more comprehensive understanding of high concurrency , If you also have the experience that can draw lessons from and think deeply , Welcome to comment area for discussion .
Author's brief introduction ：985 master , Former Amazon Engineer , present 58 Transfer to technical director . Welcome to follow my personal public number ：IT People's career advancement