brief introduction ： This article focuses on Hologres How to land Alibaba flying pig real-time warehouse scene , And help the flying pig pair 11 Real time data screen 3 Take off in seconds , The entire 0 fault .
Abstract ： Just finished 2020 Tmall double 11 in ,MaxCompute Interactive analysis （ Hereinafter referred to as Hologres）+ Real time computing Flink For the first time, the cloud native real-time data warehouse was established in the core data scene , Set a new record for big data platform . On this occasion , We will launch cloud native real-time data warehouse in succession 11 Actual combat series content , This article focuses on Hologres How to land Alibaba flying pig real-time warehouse scene , And help the flying pig pair 11 Real time data screen 3 Take off in seconds , The entire 0 fault .
The biggest change of this year's double 11 is that the overall rhythm of the activities has changed from the original “ Single section ” Adjust to this year's “ Bijie ”, Two peaks of flow are naturally formed , The statistical cycle of big screen and marketing data becomes longer , The index dimension becomes more , At the same time, the group GMV For the first time, the large media screen directly reuses the link data of Feizhu large screen , So how to protect the group GMV Big media screen 、 Big screen of flying pig data and real-time data of double eleven 、 accuracy 、 Stability is a big challenge .
This double 11 fly pig real-time screen zero 3 Take off in seconds , The entire 0 fault , Successfully escort Alibaba group media screen , The indicators are accurate 、 Stable service 、 Feedback in real time .
All of these are inseparable from the technology upgrade and guarantee of real-time data full link behind the large screen . The overall architecture of flying pig real-time data is shown in the figure below ：
Next, I will introduce , In order to achieve fast 、 accurate 、 Steady double 11 Real time data screen , What upgrading, improvement and optimization have been made for the whole link of real-time data .
One 、 Public floor reinforcement , Against the peak flow
Resist the flood peak of the double 11 flood , First of all, the real-time data common layer . After nearly two years of iterative improvement , Multi terminal 、 Multi source real-time data common layer has realized logging 、 transaction 、 Marketing interaction 、 Global coverage of service domain , Job throughput and resource efficiency are also constantly improving , This double 11 in order to smooth through the flow of double peaks , It has carried on the multi round full link pressure test and further tamping reinforcement ：
1） Dimension table migration , Decentralized hotspot dependence
Dimension table is the core logical and physical dependency of real-time common layer , The hot spot dimension table may be the risk point and bottleneck of the service when it is promoted greatly . Flying pig commodity list is the most dependent of all kinds of real-time operations Hbase Dimension table , Among them, there is a big boost when the flow of flying pig Taobao end traffic public layer operations . Last year through the Taoxi Department PV Deep logic optimization of traffic extraction , The dimension table of daily QPS By dozens of w Down to a few w, However, with the subsequent click on the common layer and other business operations, the new dependency , daily QPS It's going up to 5w+, During the pressure measurement, it soared to more than ten w, And the dimension table is located in Hbase The cluster is old and public , The greater the risk of stability . therefore Transfer the "flying pig commodity table" and other large traffic common layer dependency dimension tables to better performance lindorm colony , Keep other non core application layer dimension tables as they are habse colony , Disperse the pressure on the surface of the flood peak .
2） Work isolation , Prevent resource squeeze
The resource consumption of real-time jobs also conforms to the 28 principle , A small number of jobs consume most of the computing resources . The exposure operation of feizhutao series needs at least 1000+CU Security resources ,PV Common level tasks require 600CU, The whole traffic common layer 9 Each job needs at least half of the resources of the cluster to guarantee . In order to prevent the flood peak from happening, multiple large operation resources in a single queue will be over used （ When the traffic is large, it will consume more resources ） When it's squeezed , Impact throughput , Distribute large jobs into different resource queues . Similarly, for transactions of various categories, the common layer tasks will also be distributed in various resource queues , Prevent a single cluster from sudden extreme anomalies , Leading to the index data falling 0.
3） Double eleven performance performance
Double 11 period , The real-time public layer successfully resists the source of Taoxi system compared with the daily transaction flow 250 Times and log traffic 3 times , The total flow of the public layer is about tens of millions / Second of flood peak impact , There is no delay in the task of the public layer of Taoxi transaction , The time delay of the traffic common layer is minute level and dissipates quickly .
Two 、 Architecture upgrade , Improve efficiency development
The core of the promotion of double 11 is pre-sale in three stages 、 preheating 、 formal , The most important thing at the formal stage is to pay the balance . The big change of the double 11 business side is to pay the balance from one day to three days , As a result, the marketing data of last year's balance payment cannot be reused . Except to keep the previous market 、 category 、 God 、 Hour and other multidimensional balance payment indicators , We also need to add new products 、 The final payment of business granularity , At the same time, because the period of balance payment becomes longer , In order to make the business more efficient , More dimension data may need to be added temporarily （ On the last day of the final payment, we received the demand for the details of the outstanding balance order ）. Therefore, in order to deal with the long period of data indicators of the balance of the double 11 、 multidimensional 、 Changing challenges , Upgrade the data structure of large screen and marketing data from pre calculation mode to pre calculation mode + On the spot batch query mode , Overall development efficiency at least improved 1 More than times , And it can easily adapt to the change of requirements .
1） New marketing data architecture ：
- Ad hoc inquiry section ：Hologres+Flink Streaming and batch data architecture , Used Hologres Partition table and instant query capability . Write the real-time detailed data of the common layer into the current partition , The detailed data of the public floor on the offline side is determined by MaxCompute Direct import overlay Hologres The next day coverage partition （ For scenarios where accuracy and stability are not critical , You can choose to remove both offline merge Steps for ）, At the same time, pay attention to configuring the primary key override when writing , Prevent real-time tasks from being abnormal , You can brush back . Index calculation of each dimension , Directly in Hologres Pass through sql polymerization , Instant Return of query results , It is very convenient to adapt to the change of statistical indicators .
- The precomputing part ： It keeps the mature ones before Flink+Hbase+Onservice The calculation of 、 Storage and service architecture . Mainly through Flink Real time aggregation metrics , And write Hbase,onservice Query service and link switching . High availability and stability scenarios , Building primary and secondary links , It may also cooperate with offline index data backflow , Repair possible anomalies and errors in real-time links .
2） Simple and efficient index calculation
from Hologress Built ad hoc query service , In addition to the simple and efficient architecture , The calculation of indicators is even more simple and outrageous , It greatly liberates the development efficiency of real-time index data .
For the balance payment part , There's a very routine , But if it passes Flink SQL To achieve the target that will compare chicken ribs or cost more , Is from zero to each hour accumulated balance payment amount or payment rate .flink Of group by The essence of grouping is grouping , It's very convenient to group and aggregate the hourly data , But it's hard to deal with from 0-2 spot ,0-3 spot ,0-4 Point to this type of cumulative data to build groups to calculate , Only build 0 To the current hour max(hour) Data grouping for aggregation . The problem is , Once the data is wrong, it needs to be flashback , Only the last hour's data will be updated , You can't update the accumulated hours in the middle .
And for passing through Hologres Real time query engine , You just need to aggregate the hours and then have a window function , A statistic sql Get it done , Greatly improved development efficiency . Examples are as follows ：
select stat_date,stat_hour,cnt,gmv _-- Hour data _ ,sum(cnt) over(partition by stat_date order by stat_hour asc) as acc_cnt _-- Hourly cumulative data _ ,sum(gmv) over(partition by stat_date order by stat_hour asc) as acc_gmv from( select stat_date,stat_hour,count(*) cnt,sum(gmv) as gmv from dwd_trip_xxxx_pay_ri where stat_date in('20201101','20201102') group by stat_date,stat_hour ) a ;
3、 ... and 、 Link and task optimization , Guarantee stability
1） Main and standby double chain 3 Backup
Alibaba Group GMV The media screen has always been owned by the group DT The team controls , This year's double 11 group big screen , In order to ensure the consistency and integrity of the caliber , It is the first time to directly reuse the real-time link data of flying pigs , Therefore, higher requirements are put forward for large screen index calculation and link stability and timeliness .
In order to ensure high availability of the system , Transactions of various categories are from the source database DRC Synchronization to the transaction details public layer, respectively build Zhangbei 、 Nantong cluster main standby dual link , For the application layer GMV Statistical tasks and Hbase The results are stored on the basis of double chain and the backup of Shanghai cluster is added . The overall link architecture is as follows ：
meanwhile , Cooperate with the real-time task abnormal monitoring and alarm of the whole link , When an exception occurs, the link can be switched in seconds , System SLA achieve 99.99% above .
2） zero 3s Take off optimization
To ensure zero 3s Take off , Detailed optimization of the task's full link data processing .
- The source part is optimized DRC After synchronization binlog Of TT write in , Will source TT More queue Cut it down to single queue, Reduce data interval delay . Early development did not correctly assess the data flow of various categories of transactions , And will be TT Of queue The number is set too large , Lead to single queue The internal flow is very small ,TT When collecting, the default is cache size And frequency , As a result, the data interval delay is very large , Thus, the delay of the whole link is enlarged .TT many queue After shrinkage , The data interval delay is reduced to less than seconds .
- The middle part optimizes the processing logic of the public layer of all kinds of purpose transactions , Reduce logic processing delay . First edition TTP transaction ( International air tickets 、 Train tickets, etc ) Public level , In order to reuse more dimensions, it completely imitates the processing of offline common layer , Link the complicated and long delay segment information together , As a result, the processing time of the whole task is delayed to more than ten seconds . In order to accurately balance the delay and reusability , Will the old multi flow join Post unified output , Change to multi-level join Output , take gmv The processing delay is reduced to 3s within . The overall process is as follows ：
- Task node part , Adjust the parameter configuration , Lower the buffer and IO Processing delay . Public floor and GMV Statistics , adjustment miniBatch Of allowLatency、cache size,TT Output flush interval,Hbase Output flushsize wait
TopN Has been real-time marketing analysis of common statistical scenarios , Because it's hot statistics in itself , So it's easier to skew the data 、 Performance and stability issues . After the double 11 pre-sale , conference hall 、 merchants 、 The exposure flow of goods TopN Homework began to appear back pressure in succession , Homework checkpoint Timeout failed , The delay is huge and easy failover, Statistics are basically unavailable . The initial judgment is that the flow is rising , Not enough job throughput , Scaling up resources and concurrency has little effect , Back pressure is still concentrated in rank And the resources are abundant . Careful investigation revealed that rank The node execution algorithm degenerates to the poor performance RetractRank Algorithm , Before group by Later row_number() Take out after reversing topN The logic of , Can't automatically optimize to UnaryUpdateRank Algorithm , The performance has dropped sharply （ The reason is UnaryUpdateRank Operators have accuracy risks inside Flink3.7.3 The version is offline ）. After many adjustments and tests , Determine that the problem cannot be optimized by configuration , Finally, it is resolved through multiple logic optimization .
- Classified exposure of the venue 、 The task of business goods is logically divided into two tasks , Prevent product or business logic rank Node data skew , So the overall data can't come out .
- First, we did the first level aggregation calculation UV, Reduce TOP Sort the amount of data , Then do the secondary polymerization optimization to UpdateFastRank Algorithm , Final checkpoint Second level , It only needs to be 10 minute .
- Of course, there is another strategy to do secondary TopN, First
- Link to the original text
This article is the original content of Alibaba cloud , No reprint without permission .
The big data screen has always been the representative of high requirements for real-time scenes , Every double 11 business brings challenges and challenges , It will bring new breakthroughs to the whole real-time data system and links . meanwhile , Flying pig's real-time data is more than just lighting up the big media screen , Improve marketing analysis and venue operation , It consists of real-time common layer and feature layer 、 Real time marketing analysis 、 Real time tags and RTUS Real time data system composed of services , It's all about 、 Multidimensional search with the ability to search 、 recommend 、 marketing 、 Touch touch and user operation and other core business .
Author's brief introduction ： Wang Wei （ The name of the flower is Yanchen ）, Alibaba flying pig technology department Senior Data Engineer .