One 、 What are the sub databases and sub tables
Take the common table design as an example , The current data is the seller database , Contains a list of goods 、 Shop tables and district tables ：
When you need to query the store and geographic information of the goods , Even the table query SQL by ：
SELECT p.*,r.[ Geographical area name ],s.[ Shop name ],s.[ credibility ] FROM [ Commodity information ] p LEFT JOIN [ Geographical area ] r ON p.[ Place of Origin ] = r.[ Geographic region coding ] LEFT JOIN [ Store information ] s ON p.id = s.[ The store it belongs to ] WHERE p.id = ?
Relational database itself is easy to become the bottleneck of the system , stand-alone storage capacity 、 The number of connections 、 processing capacity Are limited . When the amount of data in a single table reaches 1000W or 100G in the future , Due to many query dimensions , Even if Add slave Library 、 Optimized index , When you do a lot of things Performance is still down serious .
adopt Improve server hardware capabilities To improve data processing capabilities , For example, increase the storage capacity 、CPU etc. , This kind of scheme costs a lot , And if the bottleneck is MySQL In itself, so is the improvement of hardware There are Of .
Sub database and sub table Just for Solve the problem of database performance degradation due to the large amount of data The problem of , Split the original independent database into several databases , take The big data table is divided into several data tables form , Make a single database 、 The amount of data in a single data table becomes smaller , So as to improve the performance of database .
Two 、 The way of sub database and sub table
Sub database and sub table include sub database and sub table , In production, it usually includes ： Vertical sub database 、 Horizontal sub database 、 Vertical sub table 、 Horizontal sub table Four ways .
1, Vertical sub table
in order to avoid IO Compete and reduce the chance of locking the watch , Viewing the product list information and product details does not affect each other
- Give full play to the operational efficiency of popular data , The high efficiency of commodity information operation will not be hindered by the low efficiency of commodity description .
Generally speaking , In a business entity The access frequency of each data item is different Of , Some data items may take up more storage space BLOB or TEXT. For example, in the example above Commodity Description . therefore , When the table has a large amount of data , You can cut the table by fields , take Hot fields 、 The unpopular fields are separated In different tables , These libraries can be placed on different storage devices , avoid IO Scramble for . The performance improvement brought by vertical segmentation is mainly focused on the operation efficiency of hot data , And disk contention is reduced .
c） The principle of vertical split
- hold Uncommon fields On a separate sheet ;
- hold text,blob etc. Large field Split it out Put it in the attached table in ;
- often The columns of the combined query are placed in one table in .
2, Vertical sub database
Vertical sub database refers to according to Business classifies tables , Distributed to different databases above , Each library can be placed on a different server , Its core idea is It's special for the special warehouse .
The seller database (SELLER_DB), Split into commodity databases (PRODUCT_DB) And store databases (STORE_DB), And distribute the two libraries to different servers .
Solve business level coupling , Business is clear
It can process data of different services Hierarchical management 、 maintain 、 monitor 、 Expand etc.
High concurrency scenarios , Vertical sub database A certain degree of improvement IO、 Number of database connections 、 Reduce the bottleneck of single machine hardware resources .
3, Horizontal sub database
With the growth of business volume ,PRODUCT_DB( Commodity bank ) The data stored in a single database has exceeded the estimate , Try the horizontal sub database . If the shop ID Even numbers , Map this operation to RRODUCT_DB1( Commodity bank 1); If the shop ID It's odd , Map operations to RRODUCT_DB2( Commodity bank 2). The expression of the database name to be accessed by this operation is RRODUCT_DB[ The store ID%2 + 1] .
Solved single database big data , The performance bottleneck of high concurrency .
Improve the stability and availability of the system .
When an application is no longer fine-grained for vertical segmentation , Or after segmentation The amount of data is huge , There is Single library read write 、 Storage performance bottlenecks , It's time to do Horizontal sub database 了 , After the optimization of horizontal segmentation , Can often solve the single inventory reserves and performance bottlenecks . But because the same table is assigned in different databases , need Additional routing work for data operation , So big Increased system complexity .
4, Horizontal sub table
The level table is in In the same database , hold Data from the same table According to certain rules Split into multiple tables in .
It is similar to the idea of horizontal sub database , But this time, The goal of the operation is the table , Product information and product description are divided into two sets of tables . If the goods ID Even numbers , Map this operation to product information 1 surface ; If the goods ID It's odd , Map operations to product information 2 surface . The expression to access the table name for this operation is Commodity information [ goods ID%2 + 1] .
Optimization of a single table caused by too much data performance problem
- avoid IO Scramble and reduce lock tables The risk of
The level table in the library , It solves the problem of too much data in a single table , The separated small table contains only a part of the data , Thus, the data quantity of a single table is reduced , Improve retrieval performance .
Vertical sub table ： You can put a The fields of the wide table Press visit The frequency of 、 Whether it's a big field The principle is divided into Multiple tables , This not only makes the business clear , It can also improve some performance . After break up , Try to avoid associated queries from a business perspective , Otherwise, the performance will not be worth it .
Vertical sub database ： You can put multiple tables According to the tightness of business coupling , Separate deposit In different libraries , These libraries can be distributed on different servers , Thus, the access pressure is loaded by multiple servers , Greatly improve performance , At the same time, it can improve the business clarity of the overall architecture , Different business libraries can customize the optimization scheme according to their own situation . But it needs to solve all the complex problems brought about by cross Library .
Horizontal sub database ： Sure Put the data in a table ( Press the data line ) It's divided into different warehouses , Each library has only This watch Part of the data , These libraries can be distributed on different servers , Thus, the access pressure is loaded by multiple servers , Greatly improve performance . It not only needs to solve all the problems brought about by cross Library complex problem , And we have to deal with the data route The problem of ( Data routing problem is introduced later ).
Horizontal sub table ： Sure Put the data in a table ( Press the data line ) be assigned to Multiple tables in the same database in , Each table has only part of the data of this table , This can slightly improve performance , It is only used as a supplementary optimization of the horizontal sub database .
Generally speaking , In system design phase It should be based on Business coupling is loose and tight To make sure Vertical sub database , Vertical sub table programme , When the amount of data and access pressure is not particularly large , First consider cache 、 Read / write separation 、 Indexing technology And so on . If the amount of data is huge , And continue to grow , Consider the level of sub library level sub table programme .
3、 ... and 、 The problem of sub database and sub table
1, Transaction consistency issues
Because sub database and sub table distribute data in different databases or even different servers , Inevitably, there will be distributed transaction problems .
2, Cross node Association query
Before there is no sub Treasury , We can search the products through the following SQL Query the store information ：
SELECT p.*,r.[ Geographical area name ],s.[ Shop name ],s.[ credibility ] FROM [ Commodity information ] p LEFT JOIN [ Geographical area ] r ON p.[ Place of Origin ] = r.[ Geographic region coding ] LEFT JOIN [ Store information ] s ON p.id = s.[ The store it belongs to ] WHERE...ORDER BY...LIMIT...
but Vertical sub database after [ Commodity information ] and [ Store information ] Not in a database , Not even on a server , Unable to perform association query .
3, Paging across nodes 、 Sorting function
The above process is to take the data from the first page , Performance has little impact , However, due to the distribution of commodity information, the data in each database may be random , If it's the third N page , All nodes need to be front N Page data are taken out and merged , And then the whole The order of bodies , Operational efficiency As one can imagine . So the larger the number of pages requested , The worse the performance of the system .
In the use of Max、Min、Sum、Count Functions like that When you do the calculation , Same as sort paging , You also need to execute the corresponding function on each partition first , Then sum up and recalculate the result set of each segment , Finally return the result to .
4, The primary key is not important
In the sub database and sub table environment , because The data in the table exists in different databases at the same time in , The self growth of primary key value in normal use will be useless , Generated by a partitioned database ID There's no guarantee that it's globally unique . Therefore, the global primary key needs to be designed separately , With Avoid duplicate primary keys across libraries problem .
5, Public table
In the actual application scenario , Parameter table 、 Data dictionary table The amount of data is small , Little change , And it belongs to Dependency table of high frequency joint query . In the example Geographical area table It also belongs to this type .