当前位置:网站首页>Breaking through the capacity limit: tidb's secret of "senseless expansion" of massive data

Breaking through the capacity limit: tidb's secret of "senseless expansion" of massive data

2020-12-07 12:48:40 Jingdong Zhilian cloud developer

For any business that is growing rapidly , Coping with peak traffic shocks , It has always been a big problem for the technical team . In the face of massive data , Both the database and the business team want to “ Insensitive expansion ”, But the popular sub Library 、 The split table scheme often can't meet the demand in terms of capacity expansion speed and consistency . The industry is looking forward to strong performance 、 Simple and easy to use new database solution , To solve the bottleneck of enterprise's data base during peak traffic .

Industry demand is the biggest driving force of technological innovation . In recent years , from PingCAP Developed TiDB Distributed database is emerging , It has great advantages in the field of massive data processing . In this context , 2020 At the beginning of the year, Jingdong Zhilian cloud United PingCAP, be based on TiDB Built cloud distributed database ——Cloud-TiDB .

11 month 26 Japan , Jingdong Zhilian cloud and Intel jointly held the theme of “ Breakthrough limit ,TiDB The technical architecture and practice of jd.com cloud ” Live online activities of . Live invitation to Mr. Ge Jibin, architect of cloud product R & D Department of jd.com , and PingCAP TiDB Professor Qi Zheng, an expert in ecological technology preaching Each brings sharing , Hope to take this opportunity to help more enterprises and developers develop ideas , Provides a new alternative to the sub database and sub table approach , And understand how to play in production practice TiDB The value of .

This article summarizes from this live broadcast sharing , The content has been adjusted .

One 、TiDB The technical architecture and practice of jd.com cloud

The first part of the broadcast , Mr. Ge deeply analyzed why Jingdong Zhilian cloud chose TiDB database , And introduced TiDB Details of technical architecture and technology ecology on jd.com cloud .

1,TiDB What problems does the database want to solve

Traditional single machine database has exposed more and more limitations in the current big data era , For a fast-growing enterprise , Because the amount of data will expand orderly with the size of the enterprise , A single database will soon encounter multiple bottlenecks :

  • Single table 、 The amount of data in a single database is too large ;
  • The storage capacity of a single machine has reached the upper limit ;
  • Single machine processing capacity reached the bottleneck .
  • Read latency and increased storage requirements , And you can't scale the write performance .
  • Want to continue to improve performance , The traditional method is to divide the database into different tables , But there are many natural constraints in sub database and sub table .
  • stay High availability aspect ,MySQL Need to integrate external programs to handle .
  • MySQL Of Fault detection 、 Master slave judgment and transfer We need to customize the strategy .
  • Strong asynchronous and asynchronous replication , Increased risk of data loss .
  • Besides ,MySQL Of OLAP Data processing capacity is weak , Data analysis needs ETL To the external analysis system . These are the bottlenecks of traditional database solutions .TiDB The birth of , It is to solve the common problems of these traditional databases , Hope to be able to fundamentally break through the single database capacity limit .

Specific to the technical level ,TiDB What good medicine does the database have to deal with the above problems ? The first thing to be clear is ,TiDB Different from the traditional stand-alone Architecture , It's a real distributed database . By calculation 、 Storage separation architecture design , Provide horizontal linear expansion capability . It also has strong consistency 、 High availability , Support automatic fault recovery , Data can be analyzed in real time . In addition, it is highly compatible with MySQL agreement . In terms of the overall architecture ,TiDB It is divided into TiDB Server、 Distributed storage layer and PD The three most :

  • TiDB Server compatible MySQL agreement , Can expand horizontally , So users can put TiDB As MySQL Use .
  • TiDB Storage layer TiKV Is distributed KV Storage , It can be expanded linearly , And through multiple copies and Raft The protocol guarantees strong consistency .TiKV There are also many region, With region Manage for the unit . The data is distributed around TiKV Node , Nodes can be expanded horizontally .
  • PD Responsible for cluster management , This includes scheduling and load balancing , And responsible for generating global TSO Time stamp .PD It is also a cluster with no single point of failure .

TiDB Use TiFlash Column storage engine to support real-time data analysis . It passes through Raft learner Do asynchronous replication , coordination MVCC Provide strong consistent read , It also supports computational push down , Make it AP/TP There is no interference between functions . Use TiFlash when TiDB The optimizer calculates the query cost , Automatically select according to the result TiKV Bank deposit or TiFlash Column to save .

Based on this architecture design ,TiDB The cluster realizes the overall high availability and strong data consistency , Even if a few copies are lost, it can automatically complete data repair and fail over , It doesn't interfere with the business layer .TiDB It can realize multi live deployment across the center .

2, On the cloud TiDB The implementation and function of

In recent years , Jingdong Zhilian cloud's customers' demand for data processing capacity has been increasing . For such needs , Jingdong Zhilian cloud and PingCAP union , be based on TiDB Built cloud distributed database ——Cloud-TiDB , Mainly for high performance 、 Highly reliable 、 High availability scenarios .

The picture above shows Jingdong Zhilian cloud Cloud-TiDB The overall structure of , Based on this architecture ,Cloud-TiDB It provides some functions with high business value , Including horizontal elastic expansion 、 Backup and recovery 、 Real time data analysis 、 Data migration and synchronization 、 Cloud monitoring alarm, etc .

  • Elastic horizontal expansion .TiDB It can dynamically increase or decrease storage and computing nodes online , With nearly unlimited horizontal expansion ability . After scaling operation , The database can also automatically achieve data rebalancing .
  • Backup and recovery .TiDB Support Automation / Manual full backup , And back up the data in OSS in . The data will not cover the original cluster during recovery , It can effectively prevent misoperation .
  • Real time data analysis . In support of OLTP At the same time, it provides real-time analysis and processing of business data .
  • Data migration and synchronization . Support full volume / The incremental migration , You can synchronize data to MySQL、Kafka Downstream storage .
  • Monitoring and alarm .TiDB Provide rich monitoring indicators , Support browser direct access . Cloud monitoring provides resources / Business level monitoring alarm , Cloud log can configure error log monitoring alarm . In addition to the above abilities , In practical application Cloud-TiDB The other advantage is lower operation and maintenance costs .Cloud-TiDB It can well meet the demand of cloud service scaling on demand , So that users can accurately control the amount of resources used , Avoid waste of resources .

Choosing a new technology is also choosing an ecology , The better the ecology , The more efficient development and operation and maintenance will be .TiDB One of the characteristics of ecology is compatibility MySQL agreement , So that you can benefit from mature MySQL Ecological resources .MySQL Of all the database drivers 、 Third party development / Management tools 、 Data exchange / Migration tools, etc , Can be used for TiDB database .

TiDB It can also be easily interconnected with other mainstream data processing technologies . for example TiDB Data can be imported Kafka, Access Flink, Even Hive、HDFS、Amazon S3、Spark etc. . Users don't have to worry about technology lock-in risks , This is also for TiDB And laid the foundation for the ecological prosperity of .

At the end of sharing , Mr. Ge looks forward to the development trend of cloud database :

Distributed technology is one of the important trends in the future , Including the operating system 、 Applications and databases are moving towards distributed .TiDB As a distributed database , It is in line with the development trend of this technology . meanwhile , The cloud on the database can bring a lot of benefits , For example, flexible scheduling 、 And AI combination , You can also better understand the user's business perspective , Realize intelligent optimization of data processing .

In the long run , On the database, the cloud can be developed 、 The operation and maintenance and stability levels have gained a lot . Because of this , Jingdong Zhilian cloud chooses TiDB On the cloud , I hope to bring users a better experience .

Two 、TiDB Applications in large data volume and high concurrency scenarios

After Mr. GE's speech , come from PingCAP Of TiDB Qi Zheng, an expert in ecological technology preaching, shared TiDB Application practice in large data volume and high concurrency scenarios .

1,TiDB And SHARDING stay OLTP Comparison of solutions in scenarios

When enterprises meet the demand of massive data , Often accompanied by the pressure of rapid growth of data volume in the short term . Such a business needs the database to have the ability of rapid expansion and high concurrency , There are high enough levels in response latency and throughput metrics to cope with burst traffic . and OLTP The scene mainly involves online 2C transaction , High requirements for database stability , Database performance fluctuations will directly affect the user experience .

In view of such demand characteristics , Common solutions in the industry are Sharding、New SQL And the middle DB-Based Several types . among ,ShardingSphere and TiDB, They are two typical of the first and third types . Both are currently active open source projects , It represents two major ideas to deal with the demand of massive data . So-called Sharding It's the sub database and sub table , In practice, it is mainly divided into horizontal split and vertical split . Vertical latitude is usually split according to business module or data series , The horizontal latitude can be taken according to the mode 、 Time 、 Cold and hot storage, etc .Sharding Sphere As a representative of the idea of sub database and sub table , The architecture is as follows :

As mentioned above TiDB Architecture compared to ,Sharding Data backup of Architecture 、 High availability 、 Monitoring alarm and other requirements , We need the third-party tools to configure and solve . and TiDB Itself is the complete solution , Can one-stop to meet the user's requirements for high-performance database . Now ShardingSphere Also started in the new version to the overall distributed database solution transformation , It proves that distributed database is the inevitable trend in the future .

2,TiDB Massive data application case of

TiDB The original intention is to solve many problems of sub database and sub table , But some scenes are not very suitable for TiDB transfer . say concretely , In such a scenario, the business will not grow rapidly , Business requests are relatively simple , There is no need for distributed transactions . Except for the scene like this , Most of the massive data demand can be achieved through TiDB Migration is well resolved . Mr. Qi listed several practical cases here .

A community personalized home page and push business . Due to the personalized push business characteristics of massive users , The database needs to be generated every day 30 Billion data , There are trillions of historical data , Traffic is also highly sensitive to throughput and latency . The user's original MySQL The scheme is based on sub database and sub table , but MySQL There are hundreds of instances , Both risk and delay are difficult to meet the demand . Through investigation and research , The user thinks TiDB Is the only one that can satisfy them on high expansion 、 Strong consistency 、 Solutions for high availability requirements , So it was decided to move across the board . During the migration ,PingCAP Developed a quick import tool Lightning combination DM Tools to smooth data transfer , After the migration, a series of optimizations were made , In the end, the demand was well met . In particular, the new architecture has strong scalability , After the migration, the amount of data will change from 1.3 Trillions are growing to 1.8 One trillion , performance 、 Availability remains at a high level , The cost has not increased significantly compared with the past .

A telecom personal billing system . The user's bill summary includes 80 100 million data , High performance requirements , And the original MyCAT The scheme is close to the limit of expansion , It can only store historical data of less than one year . because MySQL The bottleneck of sub database and sub table processing , There will be a lot of problems if we continue to slice , So the user chose TiDB Upgrade . towards TiDB After the move , The amount of data in a single table can reach 100 Billion , The data storage cycle has been extended from half a year to 3-5 year ,QPS And delays have improved significantly .

Mr. Qi also introduced O2O platform PMC Order flow business 、 A financial core accounting system and a mutual financial marketing platform to TiDB Migration cases . The common point of these cases is that the user's original database sub database and sub table encounter the growth bottleneck , More and more negative impact on the business , And move to TiDB After that, the original bottleneck was completely solved , There was no serious failure during the migration process , The cost input is also within the controllable range .

3,TiDB 5.0 Highlight analysis

The last part of sharing , Teacher Qi introduced TiDB 5.0 Version of the performance optimization highlights and details , It mainly includes the following features .

Async Commit. Old edition TiDB Adopt two-phase commit mode , The cost is relatively large .Async Commit Mainly in the second phase, we implemented asynchronous commit , For small things, it's like 1PC The effect of , This has brought a certain performance improvement .

Clustered Index. The new version of this feature is very suitable for queries with conditional ranges containing primary key columns , similar Innodb Cluster index , It can save the cost of this kind of query . according to TPCC test , The new version can bring some performance improvement in this respect .

Compaction Filter. This feature is mainly used to optimize the performance jitter caused by background automatic collation of compressed data , After opening QPS The standard deviation of the fluctuation is reduced to 5% within .

SATA SSD Optimize . combination compaction filter,fsync control,compaction guard Wait for these new features ,5.0 Version in SATA SSD The performance throughput and latency performance on is better than 4.0 There has been a big improvement , because SATA It's caused by the wobble of the disk itself QPS There is also a significant decrease in jitter .

3、 ... and 、 Breaking the capacity limit ,TiDB Break the bottleneck of enterprise database performance

Compared with the traditional sub database and sub table ,TiDB It is a real one-stop solution for distributed database , Can fully meet the rapid growth of enterprise business 、 Massive data is highly concurrent 、 Real time data analysis and high availability of financial data, etc . Through this live broadcast, the wonderful sharing between the two teachers , To the audience TiDB Database capabilities 、 We have a deeper understanding of the implementation details and business implementation practice , I see TiDB The outstanding advantages of database services .

As the two teachers said , Distributed database is an inevitable trend in the industry , and TiDB Following this trend , It will become a good way for more and more enterprises to cure the bottleneck of database performance . meanwhile ,TiDB In the application of Jingdong Zhilian cloud , For enterprises to quickly adopt TiDB、 Enjoy as soon as possible TiDB Income and value open up a convenient channel .

Want to know more about Jingdong Zhilian cloud and TiDB Related content , Get a speech PPT, You can comment in the comments section PPT, We will respond to the access method in time .

Click on Read the original Check the video playback link .

Recommended reading :

Welcome to click Jingdong Zhilian cloud , Learn about the developer community

More wonderful technology practice and exclusive dry goods analysis

Welcome to your attention 【 Jingdong Zhilian cloud Developer 】 official account

版权声明
本文为[Jingdong Zhilian cloud developer]所创,转载请带上原文链接,感谢
https://chowdera.com/2020/12/20201207124546305c.html