brief introduction ： This article will focus on Hologres Best practice in Alibaba Taobao marketing activity analysis scenarios , reveal Flink+Hologres It is the first time that the combination of flow and batch has landed in alishuang 11 The technical test behind the big screen of marketing analysis .
Summary ： Just finished 2020 Tmall double 11 in ,MaxCompute Interactive analysis （ Hereinafter referred to as Hologres）+ Real time computing Flink For the first time, the cloud native real-time data warehouse was established in the core data scene , Set a new record for big data platform . On this occasion , We will launch cloud native real-time data warehouse in succession 11 Actual combat series content , This article will focus on Hologres Best practice in Alibaba Taobao marketing activity analysis scenarios , reveal Flink+Hologres It is the first time that the combination of flow and batch has landed in alishuang 11 The technical test behind the big screen of marketing analysis .
One 、 Background introduction
In the operation of Taoxi business , Big promotion is a very important scenario in business operation and user growth , And marketing activity analysis products are used for service decision-making during the promotion period 、 Core data products that guide operations , Before covering activities 、 in 、 Post full link analysis , Among them, it needs to meet the needs of different roles in different stages , Different requirements for data timeliness and data flexibility , The overall product picture is as follows ：
The old marketing activity analysis is based on the conventional real-time offline data system &FW The product architecture of , In all kinds of activities before , It also exposed a lot of problems , The core problems can be classified into three categories ：
- Real time and offline data are inconsistent ： The real-time and offline data of the same caliber are inconsistent , Including data logic caliber is not unified 、 The data interface is not uniform , Due to the fragmentation of real-time and offline data development （ Developers and interfaces ）, It not only increases the operation and maintenance cost of the overall data , At the same time, the burden of product construction is also greatly increased .
- Maintenance costs are high ： With the increase of business volume , The original database can't be fast 、 Flexible and flexible application scenarios . The conventional Hbase、Mysql、ADB database , Only a single point to meet the massive data 、 High concurrency memory point search 、OLAP Inquire about , So in the face of extremely complex business , Need to rely on multiple databases , The overall maintenance and dependency costs will be very high .
- Expandability of ： stay FW The logic complexity of building products under the framework is high 、 Scalability is poor , The cost of maintenance during the event is very high
therefore , How to quickly respond to frequently changing business demands , And it's becoming more and more important to deal with data issues during activities more efficiently , The new generation of marketing activity analysis architecture needs to meet the following advantages ：
1. The real-time data warehouse and the offline data warehouse can be unified （ Real time offline logic unification ）、 Unified interface （ data storage 、 Data retrieval is unified ）, To achieve the integration of batch and flow
2. There needs to be more powerful silos , It can not only satisfy the massive data concurrent write query , It can also meet the timely query function of business
3. Simplify the existing product building logic 、 Reduce the complexity of product implementation
Based on the appeal background , We need to restructure the current architecture and find alternative products to address the business pain points . After a long time of calling and trying , Finally, we chose to be based on real-time computing Flink+Hologres+FBI（ Ali's internal visual analysis tool ） To realize the Framework Reconstruction of tmall marketing activity analysis .
Two 、 Flow batch integration technology solution
Through in-depth analysis of business data requirements , As well as multi-dimensional data model exploration and data warehouse research , Finally, the overall technical framework of product reconfiguration for marketing activity analysis is determined , As shown in the figure below , The key points are ：
- Upgrade through streaming batch architecture , Stream batch is realized SQL Logic & Unified computing engine level
- adopt Hologres It realizes the unification of data storage and query
- utilize FBI Product ability , Meet the high flexibility of the business while reducing the construction cost , At the same time, it meets the needs of different roles
below , We will introduce in detail several key technical solutions in the whole technical scheme ： Stream batch integration 、Hologres、FBI
1. Flow batch integration technology framework
The traditional data warehouse architecture is shown in the figure below , The core problem of traditional data warehouse architecture ：
- The storage layer is split between streaming batches , colony 、 surface 、 The fields are all separate , It is necessary to write different access logic when the application layer is connected .
- The processing logic between flow batches cannot be reused ,SQL The standards are different , The computing engine is different , As a result, real-time and offline need to be developed separately , In fact, in many cases , The logic is similar to each other , But the system couldn't switch flexibly before , Lead to duplication of work
- Cluster separation of computing layer , The peak usage time of resources is different between real-time and offline , As a result, the utilization rate of resources is not high enough , The peaks and troughs are very obvious
The architecture of flow batch integrated data warehouse is shown in the figure below , The upgraded architecture mainly has the following core points to focus on ：
- First , Several positions DWD Although the layer is different on the storage medium , But we need to ensure the equivalence of the data model , Then encapsulate the logical table （ One logical table maps two physical tables , Real time DWD And offline DWD）, The data calculation code is based on the logic table development
- secondly , Development table based on logic 、 flow 、 Personalized configuration of batch computing mode 、 And different scheduling strategies , There needs to be a development platform （Dataphin Flow batch unified development platform ） As a support , Form a convenient development 、 Integration of operation and maintenance
- Last , be based on OneData Standardized storage layer unification , It's not just model normalization , Or the unity of storage media , It's seamless
This year, double 11, Real time computing Flink A record high peak of discharge was handled per second 40 100 million records , The amount of data has also reached a staggering per second 7TB, be based on Flink The application of streaming batch integrated data in marketing activity analysis scenarios is emerging , And in stability 、 Both performance and efficiency have withstood severe production tests
whole Flink Flow and Flink batch The task shows great stability during the activity , The entire 0 Link capacity 、 Machine single point 、 Network bandwidth and other issues occur
2. Hologres Flow and batch are all on the ground
The integrated data architecture of streaming and batch realizes the unification of the whole data layer , Also need to choose a product to make the overall storage unified , This product needs to support high concurrent write , And it can meet the needs of timely inquiry , At the same time, it can support OLAP analysis .
In the old version of the architecture, each page module involved data query of one or more databases , Such as Mysql、Hbase、ADB3.0「 The old version HybridDB」 etc. . because Hbase High concurrency writes for 、 High performance point search and other features , So most of the real-time data goes into Hbase in ; And because the Mysql Table management is convenient 、 Simple inquiry and other advantages , Dimension table data 、 Offline data is usually stored in it ; in addition , Data involved in some modules of the product , Small amount of data 、 Many dimensions and so on 「 Such as marketing play data 」, Will choose ADB As OLAP The database of multidimensional analysis . such , There will be two pain points ： The separation of real-time data and offline data 、 The disorderly management of multi database and multi instance .
New marketing campaign analysis product construction , One goal is to achieve unified storage , Reduce operation and maintenance costs and improve R & D efficiency ; Another goal is high performance 、 High stability 、 Low cost .
After benchmarking with various products , Choose the Hologres As a unified product for the analysis of the entire marketing campaign .Hologres As a compatible PostgreSQL 11 One stop real time data warehouse of protocol , Seamless connection with big data Ecology , Support PB Level data is highly concurrent 、 Low latency analysis processing , You can use existing... Easily and economically BI Tools for multi-dimensional analysis of data perspective and business exploration , In such a complex business scenario Hologres The advantage of the performance is extremely prominent .
Through the analysis of the overall marketing activities of the module depth analysis , And combined with the business side of the data timeliness requirements , As a whole, the data of several major modules of marketing activity analysis are developed to make a specific real-time link scheme ：
- Live broadcast of the event 、 Open to booking 、 purchased 、 Flow monitoring and other core modules , We chose Hologres Real time check ability of ,
- In the face of complex and changeable marketing scenarios , We chose Hologres Of OLAP Instant query capabilities
Analysis of the marketing activities required for the ability to check and OLAP Analytical ability , Tmall's marketing activity analysis has established dt-camp and dt-camp-olap library , among dt_camp Because of the need to store some historical data during the activity for a long time, it is used for comparison of activities , The overall data is in the near order of magnitude 40TB; Marketing play OLAP In the library , It contains some detailed data of playing methods , The overall data order of magnitude is nearly 100 TB, Due to the marketing method, the accuracy of the overall data is required to be very high , Therefore, there is no query method with loss of precision , Higher requirements are put forward for the query performance of the whole data warehouse .
In order to improve Hologres Overall performance of , According to the analysis of marketing activities, digital warehouse mainly makes several kinds of optimization strategies ：
- Set up distribution key： about count(distinct user_id) The situation will be user_id Set to distribution key stay hologres Every one of them shard do count distinct, Avoid large amounts of data shuffle, Greatly improve query performance .
- Try to reduce count distinct frequency ： Through multiple layers group by Operation conversion SQL Reduce count distinct cost
- shard prunning： In some scenarios , The query will specify the... Of a table pk Some of key The query , If you take these scenes key The combination is set to distribution key, You can determine which query will hit when processing the query shard, Reduce RPC Number of requests , For high QPS The scene matters
- Generate the best plan： Marketing activity analysis has point search or range query based on aggregate data , There are... Based on raw data OLAP Inquire about , In addition, after aggregation of a single table topn Query for , For different query types ,Hologres Can be based on the statistical information collected , Generate the optimal execution plan , Ensure that the query is QPS and Latency
- Write optimization ： The writing of marketing activity analysis is based on column storage table UPDATE operation , The operation is in hologres First of all, according to the specified pk Find the corresponding uniqueid, And then according to uniqueid Find the corresponding record mark to delete , And then look up a new record , In this case, if you can set an incremental segment key, You can query according to segment key Quickly locate the file , Promotion is based on pk Go to the recorded speed , Improve write performance , The peak value of the marketing activity analysis system can reach 800W/s Update
- Small file merge ： Some tables that are not written very frequently are updated over a period of time key It's quite fixed , This leads to memory table flush It's a relatively small file , and Hologres default compaction The strategy doesn't do this for these files compaction, As a result, there are more small files , Through in-depth optimization compaction Parameters , increase compaction The frequency of , Reduce small files , For query performance has been significantly improved
Hologres During the period of double 11 , The peak value of the point query scenario is dozens w/s, Hundreds of services w/s,OLAP Write peak 400w/s, Service capability, 500w/s. At the same time, single point query &OLAP Almost all queries can satisfy the requirement that a single query is less than ms As much as 99.7% above , So throughout the event ,Hologres The overall performance was very smooth , It can support quick check and fast OLAP analysis .
3. FBI Analyze the big screen
FBI As the preferred data visualization platform in Ali Ecology , That is, it can quickly support the construction of various reports for data analysis , It can also support fast access and expansion of multiple data sets , There are also advanced functions to support the construction of various analytical data products 【 Product building 】.
stay FBI In the core process of product building , Can pass 4 Core functions greatly reduce construction cost ：
1） Real time offline Integration “ Real time hour minute model ”, Automatically realize accurate trend and comparison of real-time data
The underlying data of batch flow integration defined for marketing activities , In order to meet the user analysis of real-time data , Real time comparison , The flexibility of hour contrast ,FBI Abstract a set of real-time offline integration of the standard data model , After creating this model, we can realize the accurate comparison of real-time data , Trend analysis automatic routing minute table , The ability to route hourly trends directly to the hour table .
2）FBI Original FAX function , Minimalist definition outputs various complex indicators
For complex indicators ： Such as the proportion of channels , The proportion of categories , Year on year contribution , Cumulative turnover of the event , In the last version, we used sql set sql Define , Not only does it lead to SQL Length guarantee , At the same time, the stability and maintainability of the product are greatly reduced . To solve this kind of problem ,FBI Build a set of analysis that is easy to learn and understand DSL, be known as FAX function （ Year on year differences 、 Contribution rate 、 Activity accumulation, etc 20+ Analysis function ）, A simple one line statement can define various complex indicators used in marketing activity analysis .
3） Through the analysis of capability configuration and proprietary logic plug-in , Greatly save page building time
Product page building is a very core link , How to save user configuration ,FBI The way is ：
a、 Configuration of general analysis capability ： For the most commonly used crosstab 、 Activity comparison , Date variable transfer parameter analysis scenario , Abstract upgrade to simple configuration item , Can complete the corresponding period comparison and year-on-year difference analysis .
b、 Plug in of proprietary logic ： For activity parameters , Show hidden , The result sorting and other functions on the customization ability of blocks , It can be covered by data plug-ins .
4、 Create precipitation FBI High security system of , Upgraded release control , Monitoring early warning , Change tips, etc , Support 1-5-10
3、 ... and 、 Test side escort
In order to further guarantee the marketing activities, analyze the product quality , Test end from details -> Summary -> The product side has done strict data comparison and verification , At the same time, for the core data of big promotion , We have carried out all-round monitoring
During the active inspection activity, the ability of data detection is greatly improved , And the ability to identify core issues in a timely manner , Greatly improved the quality and stability of the entire data product during the event
Four 、 Business feedback & value
The whole double 11 period , Based on real-time computing Flink+Hologres Flow batch marketing activities analysis products It has not only supported thousands of tmall business groups + There are hundreds of sophomores per capita PV High frequency access , Even more 0 P1/P2 The target of the failure , At the same time, the whole product showed several advantages compared with the previous years ：
- Enrich ： Real time data is widely used in marketing activity analysis products , Core dimensions can down To the event product 、 Business label stratification and other dimensions , At the same time, additional businesses have been added to the purchase and pre-sale 、 Real time data of commodity dimension , More friendly support for the business side of the business BD
- Stable ： be based on Hologres Continuous high stable output , During the whole double 11 period, whether it is real-time data writing 、 Or data reading shows a strong stability ; At the same time, the engineering side monitors the efficiency of user access and data response in real time , Real time analysis to solve business problems ; Product inspection covers the core data of the product , Further guarantee the stability of the whole product
- Efficient ： The application of streaming batch technology , as well as Hologres The United docking of , Not only greatly improved the demand access efficiency during the activity period （ During the period of double 11 this year, the overall demand carrying capacity was last year's 3 times ）, At the same time, it improves the effectiveness of problem feedback and solution （ Compared with the past activities, it has improved 3-4 times ）
5、 ... and 、 Future outlook
Although it has gone through a great promotion test , But there's no end to technology , We need continuous improvement to deal with more complex business scenarios ：
1）Dataphin Flow batch integration of products to further improve , Reduce the cost of manual intervention , At the same time, further guarantee the data quality
2）Hologres Resource isolation , Read and write resource isolation , Better guarantee the query of SLA; Get through Hologres And MaxCompute, Support metadata interoperability , Provide higher guarantee for product metadata ; Dynamic capacity , Be flexible in dealing with peak and daily business needs .
3）FBI Product tools , Can improve the product version management function , The same page can be edited by more than one person , Support product building more efficiently
Link to the original text
This article is the original content of Alibaba cloud , No reprint without permission .