当前位置:网站首页>From search engine to core transaction database, explain how Alibaba cloud dragon supports double 11

From search engine to core transaction database, explain how Alibaba cloud dragon supports double 11

2020-12-07 19:17:36 Legend of Hulan River

2020 Years of double 11, Tmall has set a new record : Orders peaked at a record high 58.3 Ten thousand brush / second , Sales hit an all-time high 4982 Billion , Alibaba cloud dragon successfully carried the global traffic peak again .2020 Year is double 11 The first year of comprehensive cloud original biochemistry , It is also the successful support of DPCA 11 The third year of .

 From search engines to core transaction databases , Explain how alicloud dragon supports double 11

This year, double 11, Based on the third generation dragon Architecture , Search for 、 Advertising and core transaction database and other storage and network high load business complete comprehensive cloud , DPCA exports tens of millions of cores CPU Ability to calculate , Completed all business load of Alibaba economy 100% Deploy in DPCA public cloud .

Two years ago. , Alibaba cloud dragon supports double for the first time 11 Great promotion ; last year , double 11 The core system is comprehensive on Dragon ; This year, , double 11 All services are deployed in DPCA public cloud . DPCA has successfully supported double... For three consecutive years 11, What doesn't change every year is the same performance as before , It provides users with a silky shopping experience , What's changing is the continuous upgrading and iteration of DPCA architecture .

This article will reveal the secret for you 11 The most challenging search advertising 、 How to migrate the core transaction database of financial business to the third generation dragon Architecture , Then explain in detail how dragon architecture supports the implementation of Alibaba's largest scale cloud native practice , Finally, it is about how the DPCA architecture can pass the test of downtime 、 Prepare for war 11 The story behind .

The most challenging 、 Not one of them. , The search advertising business is upgraded to the third generation dragon architecture

For e-commerce platform , The search function is the core function , The result display delay of 100 microseconds will directly affect the final transaction conversion of platform users , The user experience is critical . therefore , Search advertising business is extremely demanding in terms of computing and network performance , And this is the most challenging business for DPCA architecture , Not one of them. .

This year, double 11, Search advertising business supports thousands of venue scenes , The average daily commodity exposure is 100 billion times ; Tens of thousands of daily model releases , Single model capacity 1TB+, The parameters of the model reach 100 billion level , Update every minute in real time 1 Billion model parameters ; The average daily sample data processing reaches 100PB, A single request exceeds 200 Billion floating point operations . Behind the data , The search business team presents two major challenges to the underlying infrastructure .

1、 Extreme performance requirements , It requires two directions G All line speed processing network traffic

Based on historical data, predict , double 11 In the morning, the network bandwidth of online search advertising business will reach two-way 100G Full speed limit , Therefore, infrastructure resources are required to provide two-way 100G Full line speed (line speed) The ability to handle traffic bandwidth , Zero flow to ensure smooth support . Actually in double 11 Zero o'clock of the day , Most of the online traffic comes from elastic bare metal instances of the search advertising business , Network bandwidth has reached the limit of full line speed as expected .

Third generation dragon structure , The whole line speed processing of network bandwidth is realized through network hardware acceleration , Can provide 100Gbps network bandwidth 、2400 ten thousand PPS Network forwarding and 100 Wanyunpan IOPS, Excellent to meet the search advertising business two-way 100G Processing requirements of full line fast traffic bandwidth , Not only help the search advertising business successfully carried double 11 Zero flow peak , At the same time, it also improves the utilization of resources .

2、 Further improve the service quality of offline search and online search

Search advertising business is divided into online search and offline search , The resource requirements of these two systems are naturally mutually exclusive : Offline search business requires high throughput , We need to make sure that hundreds of millions of data can be stored in 15 Finish processing in minutes ; Online search has a high demand for time delay , Need to ensure 1000 The real-time and high availability of sub second processing of ten thousand data .

The third generation dragon architecture introduces advanced QoS characteristic , Multi level scheduling network and storage QoS, Realize multi-dimensional precise scheduling , Excellent support for both offline and online search advertising , Finally, it helps the search advertising business to achieve the mixed business goal of low delay online service and high throughput offline service .

in fact , Found in the actual business scene of Alibaba group , In the case of the same resource allocation , DPCA bare metal is better than ordinary physical machine QPS Can improve 30%, The delay can be reduced 96.3%, There has also been a significant increase in resource utilization .

Carry on 58.3 Ten thousand brush / Second new peak , Core transaction database on Dragon

11 month 11 It's just past zero 26 second , Tmall double 11 The order reached 58.3 Ten thousand brush / The peak of a second , yes 2009 It's the first time that I've ever had a double 11 Of 1457 times , Each transaction will go through a series of core transaction database processing , How to ensure the order of massive orders in the world's largest trading peak 、 Accuracy and smoothness become the challenge of the core transaction database .

as everyone knows , The database itself is a business of re storage , The core transaction database is more about resources IOPS、 Performance indicators such as delay are extremely sensitive . double 11 The reason why the core transaction database chooses the Dragon Architecture , Because it can satisfy “ High concurrency 、 Low latency 、 High stability ” Three needs .


High concurrency : In double 11 In this way, the world's rare super large-scale concurrency scenario , Computing power is a key factor . Upgrade the third generation dragon architecture after iteration , Both storage and network performance are up to 500% promote ,VPC The cloud network is forwarding all the way , Storage IOPS Can be up to 100 ten thousand , Storage throughput per second can reach 5GB, It can fully meet the order processing requirements of the trading peak of the core trading system .
Low latency : Thanks to the acceleration ability of DPCA chip , The sixth generation enhanced instance based on Dragon architecture has the lowest read latency 200 μs, Write delay ability 100μs, The minimum latency of each packet is 20μs. In the real world , It meets the delay requirement of core transaction database very well .
High stability : Unlike other stateless businesses , Core transaction database requires financial level stability and disaster recovery . Stability is exactly what DPCA attaches great importance to , Dragon architecture has developed a very lightweight Dragonfly Hypervisor, One millionth of jitter in computing . Thanks to this , DPCA architecture helps core transaction database smoothly support double 11 Shopping season .

Dragon structure , Provide support for the largest cloud native practice in the world

2020 Annual double 11 The most important thing is to complete the largest cloud native practice in the world , Created a lot of “ The first cloud native ”:80% The core business is deployed in the Alibaba cloud container ACK On , Can be found in 1 Expand millions of containers in hours ; First large-scale application Serverless, Improved elastic performance 10 More than times ; The peak call volume of cloud native middleware exceeds 10 billion QPS.

meanwhile , And it's constantly breaking records : Real time computing Flink The processing peak is up to 40 Billion bars / second , It's like watching it in a second 500 All the information of ten thousand Xinhua dictionaries ;MaxCompute The amount of data in a single day is up to 1.7EB, Equivalent to global 70 More than a billion people each deal with 230 HD photos .

Dragon architecture is a real computing platform for cloud native scenes , It provides a solid foundation for this largest cloud native practice . Dragon architecture through I/O offload Chip acceleration , It is very suitable for containers and other products , Flexible scalable containerized products with efficient scheduling and automation , Have in 3 Minutes to start 50 Thousands of nuclear vCPU It's very resilient .

in fact , From design to implementation , Dragon architecture is “ Born for clouds ”, It not only makes the performance of alicloud servers more powerful than traditional physical servers , It can also greatly help customers save computing costs . Final , Dragon architecture has brought surging power and ultimate efficiency to this cloud original biochemical movement : For every 10000 peak transactions IT The cost is down from four years ago 80%, The efficiency of large-scale application delivery has doubled .

Single instance availability 99.975% Bottom gas , Deal with it smoothly “ Downtime ” Take a big test

The full link pressure test drill is a preparation for war 11 The essential link , We've designed for the raid drill App, Simplify it into one “ Button ”, It connects various technical structures and business means of Alibaba economy . This year's drill has some unexpected live ammunition raids , Including the network disconnection attack 、 Cluster downtime attack and data center power failure attack . The assault was so fierce , Let the technical engineers not have a trace of protection .

10 In the early hours of the month 2 spot ,“ Button ” Pressed , DPCA cloud server is injected with fault code , A cluster with nearly 1000 servers went down instantly .

Less than 2 minute , The large screen of operation and maintenance monitoring shows that the network value drops rapidly , The technical support team quickly identified the source of the failure 、 Start the emergency plan , Emergency deployment repair , Then confirm the master-slave switch .

10 minute , The primary and standby ECS complete the switch , Everything's back to normal .

It seems crazy , But it allows companies to prepare for all kinds of failures, including downtime , To minimize its impact , At the same time, it forces Ali technology to continue to evolve , Including dragon Architecture .

DPCA architecture is outstanding in this outage raid , Architecture robustness withstood the big test , This is due to ECS Provides host migration function , Implementation depends on configuration portability 、 Resource portability , The network can be migrated , Storage migration and other key technologies , Can minimize customer business disruption .

meanwhile , Dragon architecture also collects the historical failure data of millions of servers accumulated by Alibaba cloud in the past ten years 、 Anomaly prediction algorithm and the combination of hardware and software fault isolation 、 Hardware accelerated thermal migration and other capabilities , Be able to guarantee 70% The above conventional hardware and software failures are eliminated before they occur . These also make alicloud dare to raise the single instance availability target to 99.975%、 The availability target of multi availability zone and multi instance is set as 99.995% Where is the bottom line of , This is also a pair of 11 One of the reasons why all businesses dare to go to the cloud .

As the largest cross department collaborative project of Alibaba cloud's basic product department , The iterative upgrade of the third generation dragon architecture involves dragon computing 、ECS、VPC、 Storage 、AIS The server and AIS Physical networks and many other teams , After two years of pre research evaluation 、 Product approval 、 Technology development and gray scale testing , Finally, it completed all the business load of Alibaba economy 100% Deploy in DPCA public cloud . double 11 It's an alicloud product 、 The biggest technology and service “ Testing ground ”, Full load and smooth load of double 11 Promoting all businesses is the best proof of DPCA's architecture capability .

At present , The Dragon Cloud Server developed by Alibaba cloud supports various traffic peaks : Such as 12306 The Spring Festival transport ticket grabbing 、 The surging traffic of microblog hotspots 、 nailing 2 Hour expansion 10 Ten thousand cloud servers, etc . future , After years of double 11 The practice tested dragon architecture will be committed to better help customers achieve rapid business innovation and leap .


This article is the original content of Alibaba cloud , No reprint without permission .

版权声明
本文为[Legend of Hulan River]所创,转载请带上原文链接,感谢
https://chowdera.com/2020/12/202012061225193098.html