当前位置:网站首页>Sub database and sub table

Sub database and sub table

2021-01-23 20:53:26 MXC Xiao

One 、 What are the sub databases and sub tables

   Take the common table design as an example , The current data is the seller database , Contains a list of goods 、 Shop tables and district tables :

  

   When you need to query the store and geographic information of the goods , Even the table query SQL by :

SELECT p.*,r.[ Geographical area name ],s.[ Shop name ],s.[ credibility ]
FROM [ Commodity information ] p 
LEFT JOIN [ Geographical area ] r ON p.[ Place of Origin ] = r.[ Geographic region coding ]
LEFT JOIN [ Store information ] s ON p.id = s.[ The store it belongs to ]
WHERE p.id = ?

   Relational database itself is easy to become the bottleneck of the system , stand-alone storage capacity The number of connections processing capacity Are limited . When the amount of data in a single table reaches 1000W or 100G in the future , Due to many query dimensions , Even if Add slave Library 、 Optimized index , When you do a lot of things Performance is still down serious . 

Solution 1:

   adopt Improve server hardware capabilities To improve data processing capabilities , For example, increase the storage capacity 、CPU etc. , This kind of scheme costs a lot , And if the bottleneck is MySQL In itself, so is the improvement of hardware There are Of .

Solution 2:
   hold The data is distributed in different databases in , Make the data quantity of single database smaller to alleviate the performance problem of single database , So as to improve the performance of database , Here's the picture : Divide the e-commerce database into several independent databases , And for the Large tables are also divided into several small tables , Through this kind of database Split method to solve the problem of database performance .
  

   Sub database and sub table Just for Solve the problem of database performance degradation due to the large amount of data The problem of , Split the original independent database into several databases , take The big data table is divided into several data tables form , Make a single database 、 The amount of data in a single data table becomes smaller , So as to improve the performance of database .

Two 、 The way of sub database and sub table

   Sub database and sub table include sub database and sub table , In production, it usually includes : Vertical sub database 、 Horizontal sub database 、 Vertical sub table 、 Horizontal sub table Four ways .

1, Vertical sub table

a) Definition

   Divide a table into multiple tables by field , Each table stores some of these fields

  

b) benefits

  • in order to avoid IO Compete and reduce the chance of locking the watch , Viewing the product list information and product details does not affect each other
  • Give full play to the operational efficiency of popular data , The high efficiency of commodity information operation will not be hindered by the low efficiency of commodity description . 

   Generally speaking , In a business entity The access frequency of each data item is different Of , Some data items may take up more storage space BLOB or TEXT. For example, in the example above Commodity Description . therefore , When the table has a large amount of data , You can cut the table by fields , take Hot fields The unpopular fields are separated In different tables , These libraries can be placed on different storage devices , avoid IO Scramble for . The performance improvement brought by vertical segmentation is mainly focused on the operation efficiency of hot data , And disk contention is reduced . 

c) The principle of vertical split

  1. hold Uncommon fields On a separate sheet ;
  2. hold text,blob etc. Large field Split it out Put it in the attached table in ;
  3. often The columns of the combined query are placed in one table in .

2, Vertical sub database

a) Definition

   Vertical sub database refers to according to Business classifies tables , Distributed to different databases above , Each library can be placed on a different server , Its core idea is It's special for the special warehouse .

  

   The seller database (SELLER_DB), Split into commodity databases (PRODUCT_DB) And store databases (STORE_DB), And distribute the two libraries to different servers .

b) benefits

  • Solve business level coupling , Business is clear  
  • It can process data of different services Hierarchical management 、 maintain 、 monitor 、 Expand etc.
  • High concurrency scenarios , Vertical sub database A certain degree of improvement IO Number of database connections Reduce the bottleneck of single machine hardware resources .

3, Horizontal sub database

a) Definition

   The horizontal sub bank is the Data from the same table According to certain rules Split into different databases , Each library can be placed on a different server .
  

   With the growth of business volume ,PRODUCT_DB( Commodity bank ) The data stored in a single database has exceeded the estimate , Try the horizontal sub database . If the shop ID Even numbers , Map this operation to RRODUCT_DB1( Commodity bank 1); If the shop ID It's odd , Map operations to RRODUCT_DB2( Commodity bank 2). The expression of the database name to be accessed by this operation is RRODUCT_DB[ The store ID%2 + 1] .

b) benefits

  • Solved single database big data , The performance bottleneck of high concurrency . 
  • Improve the stability and availability of the system . 

   When an application is no longer fine-grained for vertical segmentation , Or after segmentation The amount of data is huge , There is Single library read write 、 Storage performance bottlenecks , It's time to do Horizontal sub database 了 , After the optimization of horizontal segmentation , Can often solve the single inventory reserves and performance bottlenecks . But because the same table is assigned in different databases , need Additional routing work for data operation , So big Increased system complexity .

4, Horizontal sub table

a) Definition

   The level table is in In the same database , hold Data from the same table According to certain rules Split into multiple tables in .

  

   It is similar to the idea of horizontal sub database , But this time, The goal of the operation is the table , Product information and product description are divided into two sets of tables . If the goods ID Even numbers , Map this operation to product information 1 surface ; If the goods ID It's odd , Map operations to product information 2 surface . The expression to access the table name for this operation is Commodity information [ goods ID%2 + 1]

b) benefits

  • Optimization of a single table caused by too much data performance problem  
  • avoid IO Scramble and reduce lock tables The risk of

   The level table in the library , It solves the problem of too much data in a single table , The separated small table contains only a part of the data , Thus, the data quantity of a single table is reduced , Improve retrieval performance .

5, summary

  • Vertical sub table : You can put a The fields of the wide table Press visit The frequency of Whether it's a big field The principle is divided into Multiple tables , This not only makes the business clear , It can also improve some performance . After break up , Try to avoid associated queries from a business perspective , Otherwise, the performance will not be worth it .
  • Vertical sub database : You can put multiple tables According to the tightness of business coupling , Separate deposit In different libraries , These libraries can be distributed on different servers , Thus, the access pressure is loaded by multiple servers , Greatly improve performance , At the same time, it can improve the business clarity of the overall architecture , Different business libraries can customize the optimization scheme according to their own situation . But it needs to solve all the complex problems brought about by cross Library .
  • Horizontal sub database : Sure Put the data in a table ( Press the data line ) It's divided into different warehouses , Each library has only This watch Part of the data , These libraries can be distributed on different servers , Thus, the access pressure is loaded by multiple servers , Greatly improve performance . It not only needs to solve all the problems brought about by cross Library complex problem , And we have to deal with the data route The problem of ( Data routing problem is introduced later ).
  • Horizontal sub table : Sure Put the data in a table ( Press the data line ) be assigned to Multiple tables in the same database in , Each table has only part of the data of this table , This can slightly improve performance , It is only used as a supplementary optimization of the horizontal sub database .

   Generally speaking , In system design phase It should be based on Business coupling is loose and tight To make sure Vertical sub database , Vertical sub table programme , When the amount of data and access pressure is not particularly large , First consider cache 、 Read / write separation 、 Indexing technology And so on . If the amount of data is huge , And continue to grow , Consider the level of sub library level sub table programme .

3、 ... and 、 The problem of sub database and sub table

1, Transaction consistency issues

   Because sub database and sub table distribute data in different databases or even different servers , Inevitably, there will be distributed transaction problems .

2, Cross node Association query

   Before there is no sub Treasury , We can search the products through the following SQL Query the store information :

SELECT p.*,r.[ Geographical area name ],s.[ Shop name ],s.[ credibility ] 
FROM [ Commodity information ] p 
LEFT JOIN [ Geographical area ] r ON p.[ Place of Origin ] = r.[ Geographic region coding ] 
LEFT JOIN [ Store information ] s ON p.id = s.[ The store it belongs to ] 
WHERE...ORDER BY...LIMIT...

   but Vertical sub database after [ Commodity information ] and [ Store information ] Not in a database , Not even on a server , Unable to perform association query .

   The original associated query can be divided into two queries , The result set of the first query finds the associated data id, And then according to id Initiate a second request to get the associated data , Finally, the obtained data are assembled .

3, Paging across nodes 、 Sorting function

   Cross node Multi database query when ,limit Pagination 、order by Sort Other questions , It becomes more complicated . You need to be in Different partition nodes sort the data and return , Then the result sets returned by different partitions are processed Aggregate and reorder .
   Such as , The commodity warehouse after horizontal warehouse division , Press ID Reverse sort paging , Take the first page :
    

   The above process is to take the data from the first page , Performance has little impact , However, due to the distribution of commodity information, the data in each database may be random , If it's the third N page , All nodes need to be front N Page data are taken out and merged , And then the whole The order of bodies , Operational efficiency As one can imagine . So the larger the number of pages requested , The worse the performance of the system .

   In the use of Max、Min、Sum、Count Functions like that When you do the calculation , Same as sort paging , You also need to execute the corresponding function on each partition first , Then sum up and recalculate the result set of each segment , Finally return the result to . 

4, The primary key is not important

   In the sub database and sub table environment , because The data in the table exists in different databases at the same time in , The self growth of primary key value in normal use will be useless , Generated by a partitioned database ID There's no guarantee that it's globally unique . Therefore, the global primary key needs to be designed separately , With Avoid duplicate primary keys across libraries problem .

5, Public table

   In the actual application scenario , Parameter table 、 Data dictionary table The amount of data is small , Little change , And it belongs to Dependency table of high frequency joint query . In the example Geographical area table It also belongs to this type .

   This kind of table can be Keep one copy in every database , All of the The update operation is sent to all sub databases at the same time .
   To be continued , use Sharding-JDBC To solve these problems .

 

版权声明
本文为[MXC Xiao]所创,转载请带上原文链接,感谢
https://chowdera.com/2021/01/20210123205303284c.html

随机推荐