当前位置:网站首页>Dry goods | disclosure of tdsql-a core architecture

Dry goods | disclosure of tdsql-a core architecture

2021-09-15 07:00:39 Tencent cloud database

5 month 18 Japan , Tencent cloud's first distributed analytical database TDSQL-A Officially release the public cloud version .

TDSQL-A As a leading analytical database , It is Tencent's first distributed analytical database , Adopt full parallel architecture without sharing , With self-developed storage engine , Support row column hybrid storage , Adapt to massive OLAP Association analysis query scenario . It can support 2000 Cluster size of more than physical servers , The storage capacity can reach 100% of a single database instance P level .

TDSQL-A It has powerful real-time analysis ability of massive data , And fully compatible PostgreSQL grammar 、 Highly compatible Oracle grammar , At the same time, it has high safety 、 High availability 、 Super large scale cluster support and full transaction capability , It can provide efficient service for enterprise users 、 Low cost data analysis solutions .

among ,TDSQL-A Complete distributed transaction processing capability , It can guarantee the global read consistency of multi plane data in real time 、 reliability ; It supports multi-level disaster recovery and multi-dimensional resource isolation , It also provides a powerful multi-level security system , Provide perfect enterprise management ability , Provide disaster recovery for users 、 Backup 、 recovery 、 monitor 、 Security 、 Audit and other solutions . meanwhile TDSQL-A Provide flexible expansion and contraction capacity , Apply to GB-PB It's a huge amount of OLAP scene , It is a market leading enterprise analytical database engine product .

TDSQL-A There are so many attractive features , How do these features ensure completeness 、 How to solve the above demand problems gracefully ? The following is Mr. Li yuesen, technical director of Tencent cloud database TDSQL-A Sharing of product core architecture .

One 、TDSQL-A Scene positioning

TDSQL-A Tencent is based on PostgreSQL Self developed distributed online relational data warehouse , The business scenario is for online data analysis .

TDSQL-A It is a distributed database rooted in Tencent's internal business , At first we used TDSQL-PG To do some small-scale data analysis on big data platforms . As the business grows , We find that the stand-alone database can not meet the needs of business , We came up with the idea of developing distributed databases . We first served wechat payment , And with the development of the business , We have gradually independently developed column storage , To enhance TDSQL Analytical ability .

meanwhile , With the expansion of Tencent cloud business ,TDSQL-A In the process of serving external customers , Accumulated a large number of excellent cases and practices .

Two 、TDSQL-A Core technology architecture

TDSQL-A Overall architecture and PostgreSQL Compared with the overall architecture of , It is both closely following the ecology , At the same time, there is deep customization and transformation .

TDSQL-A Database kernel , Divided into sections :

The first is the upper GTM Transaction manager , It is mainly responsible for the management of global transactions , Coordinate cluster transactions , Manage all objects of the cluster at the same time . The upper right corner is the coordination node , It is a business access portal . The coordination node module is horizontally equivalent , That is, the service is connected to any coordination node , Can get a consistent database view .

The second is the data node below . The data node is also the node that actually stores data , Each data node will only store pieces of data , The data nodes are added together to form a complete data view .

The data node and coordination node are the data interaction bus of the cluster . The cluster data interaction bus is in TDSQL-PG and TDSQL-A There are differences between —— Both are called cluster interactive bus in terms of system name , But in these two different product forms , The implementation logic is completely different —— stay TDSQL-A Inside, through the data interaction collector based on the physical network card , And complete this work through the reuse of complete network connection . The purpose of cluster interactive bus is to connect the nodes in the cluster , So as to complete the whole query interaction .

Last , The operation and maintenance control system of the database kernel is described on the left . When the database is running, an operation and maintenance control system is needed to support the operation , Including operation and maintenance management 、 Real-time monitoring 、 Real time alarm 、 Security audit and data governance, etc .

2.1 TDSQL-A Self developed row column hybrid storage capacity

Let's introduce TDSQL-A All self-developed row and column hybrid storage capacity .

There are two ways to store a database , One is storing by line 、 One is column storage :

Store table by row : Every row of data stores all the columns 、 One disk IO You can access all the columns in a row 、 fit OLTP scene .

Store tables by column : Each column is stored separately , Multiple columns logically form a row ; One disk IO Contains only one column of data ; Convenient for data compression ; fit OLAP scene .

TDSQL-A It supports two ways to create tables: storage by column and storage by row , Between the list and the row table at the same time , Users don't have to perceive whether the lower level table is built by row table or list , Seamless interoperability between row tables and lists —— Including the interrelationship 、 Exchange data with each other , There's no need to sense the underlying storage logic at all .

In addition to the convenience of operation , Mixed queries between row tables and lists can also maintain complete transaction consistency , That is to say, while the query is running , The whole business (ACID) The ability of the company is also fully guaranteed .

2.2 TDSQL-A Column storage compression capacity

Column storage module , Let's talk about column storage compression capabilities .

TDSQL-A There are two types of compression for column storage :

The first is lightweight compression . Lightweight compression first perceives the specific content of the data , So according to the characteristics of the data to choose different compression methods to improve the compression ratio , Lower the cost of the business , At present, we support RLE Compression mode of .

The second is transparent compression . This compression method is used directly, including zstd and gzip Direct compression , This kind of compression has no explicit requirement for the storage content of data , Any information can be compressed . Through data compression , The volume of data can be greatly reduced , On the one hand, reduce the cost of users , On the other hand, it can be reduced when a large number of queries are analyzed IO Traffic volume , Improve our query efficiency .

2.3 TDSQL-A Execution engine : The principle of delayed materialization

Above the storage tier , Is the execution engine of the database . In the execution engine module , Data exchange and analysis in large-scale query analysis scenarios IO、 Network overhead is a very big concern , Because it has a great impact on system performance and overall scalability .

TDSQL-A After investigating the technology trend and development direction of the industry , Delay materialization is introduced into the engine . As opposed to delayed materialization , It's common to materialize in advance . Materialization in advance refers to the time when the query executor performs scanning again —— Here we briefly understand that these queries include Scan、Join、Project etc. .

Here is an example , There are two tables in a scene ——tbl_a and tbl_b, There are... On both tables f1 and f2 Two , The distribution columns are f1. according to tbl_a Of f1 Column and tbl_b Non distributed column of f2 To correlate —— On the left is the calculation method of materialization in advance , Project Need to return tbl_b Of f1, Conduct Join When it's connected, it needs to be tbl_b Of f2, So in the tbl_b Conduct Scan When , It will tbl_b Of f1 and f2 All materialized . The so-called materialization , Is to read out the two listed in the file , Form a virtual record tuple in memory , Then transmit up . You can actually take a look at , When you put data in the top layer , Only projection tbl_b Of f1. In the process , If in the middle Join The filtering ratio of association is very high , For example, only 1% Meet the requirements , There's a lot of tbl_b Of f1 Column data does not have to be transferred in .

so , Materialization in advance causes a waste of network bandwidth :

JOIN The choice rate is equal to 0.01

TBL_B There is 20 One hundred million records

JOIN Yes 20 Billion (1 – 0.01) sizeof(TBL_B .f1) = 7.4GB Invalid data transfer for .

This phenomenon is in OLAP Common overhead in scenarios , because OLAP When doing various complex queries, many are wide tables , And most of the time, only a few columns in the wide table will be accessed , At this time, if you encounter a typical selection rate is very low 、 When the projection rate is very small , The cost can't be ignored .

TDSQL-A Delay materialization technology is the optimization scheme proposed for the problem just now .TDSQL-A The delayed materialized query plan will be performed at the lower level Scan When , in the light of Join The unnecessary target column in only passes the location information of the physical tuple to the upper node . Only when the upper nodes are finished Join After correlation , To record the location information that meets the conditions , In the projection stage, pull the required data information from the lower layer , And then to the outside .

It has been proved by tests that , This way can greatly save CPU Computing overhead and network IO、 disk IO The cost of .

Effect of delayed materialization on performance improvement : When the test scenario is 20 node 、20 A node 、1TB The data of , The choice rate is 10%, The projection rate is 60%, Two tables Join. It is obvious that , Compared with time consumption, it is one fifth of materialization in advance , The occupation of network bandwidth is only half of that materialized in advance . Suppose you put the table Join The number changed from two to three , The time consumed is its 30%, The network occupancy ratio is close to 40%—— In other words, it saves 60% Network occupation of .

therefore , The test results show that this is for IO And network overhead . And through these two cost savings , It can further affect the improvement of performance .

In addition, we specially made a complex Join Test of : The choice rate is 1%、 The projection rate is 100%; Two tables Join, Abscissa is 100GB To 1TB. From the above two tables Join It can be seen that , The bigger the watch, the bigger it is 1TB When , Compared to open source GP Yes 5.2 Double performance improvement ; Suppose you change two tables into three tables , Then there are 18 Performance improvement . so , As queries become more complex 、 The bigger the table becomes , In the scenario of delayed materialization, the performance improvement is more obvious .

2.4TDSQL-A New design of asynchronous actuator decoupling control and data interaction

Initially, our goal was to make TDSQL—A Support thousands of server clusters . Support the scale of thousands of servers , There are some obstacles that must be crossed , One of the obstacles is that there are too many network connections .

In order to solve the problem of too many connections ,TDSQL-A Newly designed asynchronous actuator .TDSQL-A The actuator is a new self-developed actuator , There are two main characteristics : The first is asynchronous execution ; The second is the separation of control logic and data transmission logic .

say concretely , The system will generate a unified execution plan during the query optimization stage 、 Resources required for unified implementation , This is a TDSQL-A Control logic of . At the same time, the system abstracts the whole network communication , Abstract into the blue below Router——Router It is mainly responsible for data collection within the cluster . Between different processes , For example, two floors Join Or three floors Join, Processes at different levels execute completely asynchronously , And complete the data interaction by pushing data . Suppose there is N Nodes , Yes M layer join, It has a total of M×N A process .

Based on the asynchronization of the actuator and the layering of control data ,TDSQL-A The data interaction logic is fully realized , This is the data interaction bus (Forward Node). It is mainly responsible for data interaction between nodes . It can be considered as the logical network card of our cluster .

FN By sharing memory and CN、DN Data interaction —— Of course, there are also local logical interactions , Suppose the data needs to interact locally and internally , You can complete the interaction directly in memory without going through the network , Further improve performance .

To enable the FN after , Suppose there is N Nodes ,M×Join No matter how complicated , The number of connections is only (N-1)×S—— This “S” Is a positive integer , This means that each server and other servers are built N A connection , Generally speaking, I will take 2, In this way, the network connection within the whole cluster can be completely abstracted and simplified . The number of connections established between servers becomes a very small value related to the cluster size : Suppose we have 2000 Servers , This value will only be 4000 about .

In many business scenarios, businesses require more reading and less writing —— Very complex queries read more and write less , For example, up to five or six 20 Million tables , There are dozens at the same time 、 Even more concurrent . This situation will bring great challenges to the computing resources of the database . generally speaking , Before you have this technology , The general practice is whenever computing resources are insufficient , Just build a new database cluster to save the complete data , The re query of redundant data is realized by capacity expansion on the new replica . There are big problems with this approach —— Build a new database , Data consistency is a big challenge . When the capacity is expanded, the scale becomes very large , At the same time, every new database cluster , You need disaster tolerance 、 All resources such as backup are expanded at the same time , As a result, the overall cost of the database is greater 、 The cost of higher .

In response to this question ,TDSQL-A A multi plane scheme is designed .

2.5 TDSQL-A The multiplanar capability of provides consistent read scalability

So called multiplanar : A plane corresponds to a complete copy of the data , A complete copy of data can provide a complete consistent read extension service —— The read extension can be a single query , It can also be complex OLAP Related query of , In this way TDSQL-A It can provide low-cost read extension services .

stay TDSQL-A In the whole architecture , Multi plane technology can ensure the data consistency between various planes in the database cluster , At the same time, it can also ensure the consistency of data transactions in each plane when reading .

2.6 TDSQL-A How to achieve high performance computing — Full parallel capability

The key point for database is undoubtedly high performance computing . The following is an introduction TDSQL-A Work on high speed parallel computing .

In the real-time analysis scenario of massive data , We must give full play to 、 The best effect can be achieved by fully squeezing resources . This way , stay TDSQL-A It's called full parallelism .

TDSQL-A Full parallel is divided into three levels :

The first layer is node level parallelism . The so-called node level parallelism is , After the system gets a query , The query will be distributed to different DN, adopt DN The node level parallelism can be achieved by the query in different areas ;

The second is to parallelize the operator after the actuator gets the allocation , That is to use as much as possible and allow more CPU Resources to complete the query work , Pass many CPU Methods to improve the efficiency of the query ;

The third level is the instruction level , Including for CPU Special instructions for 、SMD Instruction, etc , By simple arithmetic or evaluation , And improve the query efficiency by optimizing and parallelizing the specified values .

In conclusion , Full parallel computing is the only way for the system to drain the potential of hardware , Is to do complex queries 、 The only way to real-time online query .

In addition to high-performance computing , With the industry on OLAP In depth study of Technology , In recent years, vectorization has attracted more and more attention . stay TDSQL-A System , Vectorization capability is also realized : More data , The more obvious the vectorization result is in the column storage scenario , The best result is that the running time of column storage vectorization will reach half of that of column storage non vectorization 、 About one eighth of the row storage time . Vectorization is also one of the necessary ways to realize real-time online analysis in column storage engine .

3、 ... and 、 Conclusion

At present , yes TDSQL-A A new starting point for , future TDSQL-A The overall plan is divided into two parts : On the one hand, it is gradually based on PG10 Version of ,merge To PG11、PG12、PG13 And later , Continuously follow up the rich features of the community version to improve the user experience , Create more value for customers . On the other hand ,TDSQL-A Hope to introduce new hardware , To enhance product competitiveness , Provide better service for customers .

版权声明
本文为[Tencent cloud database]所创,转载请带上原文链接,感谢
https://chowdera.com/2021/09/20210909124223330T.html

随机推荐