Internet application with the development of business , The data volume of some single tables is increasing , Consider service performance and stability , There are sub databases and sub tables 、 Data migration needs , This paper introduces vivo The practice of account number responding to the above needs .
One 、 Preface
Canal It's Alibaba's open source project , About what is Canal？ What can I do ？ I will introduce you one by one in the following article .
In this article, you will learn about vivo Account usage Canal What kind of business pain points have been solved , Based on this, I hope to have some enlightenment for your business .
Two 、Canal Introduce
1. brief introduction
Canal [kə'næl], Waterways / The Conduit / Ditch , The main use is based on MySQL Database incremental log parsing , Provide incremental data subscription and consumption .
In the early days, Alibaba was deployed due to the dual computer rooms in Hangzhou and the United States , There is a business demand for synchronization across computer rooms , The implementation is mainly based on the business trigger Get incremental changes . from 2010 Year begins , The business gradually tries to get incremental changes from database log parsing for synchronization , Thus, a large number of incremental database subscription and consumption businesses are derived .
2. working principle
2.1 MySQL Principle of active / standby replication
Canal The core operating mechanism is relying on MySQL Master / slave replication of , Let's give priority to a brief description of MySQL Principle of active / standby replication .
MySQL master Write data changes to binary log ( binary log, The records are called binary log events binary log events, Can pass show binlog events To view the ).
MySQL slave take master Of binary log events Copy to its trunk log (relay log).
MySQL slave replay relay log Middle event , Change the data to reflect its own data .
2.2 MySQL Binary Log Introduce
MySQL-Binlog yes MySQL Binary log of database , Used to record the user's operations on the database SQL sentence （ In addition to data query statements ） Information .
If we need to configure the master-slave database later , If we need to synchronize the contents of the master database from the database , We can go through Binlog To synchronize .
2.3 Canal working principle
Canal simulation MySQL slave Interaction protocol , Pretend to be MySQL slave, towards MySQL master send out dump agreement .
MySQL master received dump request , Start pushing binary log to slave（ That is to say Canal）.
Canal analysis binary log object ( Originally byte flow ).
Canal Put the parsed binary log Push in a specific format , For downstream consumption .
2.4 Canal The overall architecture
- server Representing one canal Operation example , Corresponds to a jvm
- instance Corresponds to a data queue （1 individual server Corresponding 1..n individual instance)
instance modular ：
- EventParser( Data source access , simulation slave The protocol and master Interact , Protocol analysis )
Interact with the database to simulate the slave database , send out dump binlog request , receive binlog Protocol analysis and data encapsulation , And pass the data down to the next level EventSink For storage , Record binlog Synchronous position .
- EventSink(Parser and Store The linker , Data filtering , machining , Distribution work )
Data filtering 、 Data merge 、 The data processing 、 Data routing storage .
- EventStore( data storage )
Manage data object storage , Including new binlog Object write management 、 Location management of object subscription 、 Receipt location management for successful object consumption .
- MetaManager( Incremental subscription & Consumer information manager )
be responsible for binlog Object overall publish and subscribe Manager , Be similar to MQ.
2.5 Canal data format
Let's take a look Canal Enclosed inside Binlog Object format , Better understanding Canal.
Canal Be able to synchronize DCL、 DML、 DDL.
Business usually cares about INSERT、 UPDATE、 DELETE The resulting data changes .
Entry Header logfileName [binlog file name ] logfileOffset [binlog position] executeTime [binlog The timestamp that records the change ] schemaName [ Database instance ] tableName [ Table name ] eventType [insert/update/delete type ] entryType [ Head of affairs BEGIN/ End of business END/ data ROWDATA] storeValue [byte data , Expandable , The corresponding type is RowChange] RowChange isDdl [ Whether it is ddl Change operation , such as create table/drop table] sql [ Concrete ddl sql] rowDatas [ Specifically insert/update/delete Change data for , There are many ,1 individual binlog event Events can correspond to multiple changes , Like batch processing ] beforeColumns [Column An array of types ] afterColumns [Column An array of types ] Column index [column Serial number ] sqlType [jdbc type] name [column name] isKey [ Is it a primary key ] updated [ Have there been any changes ] isNull [ Value is not null] value [ Specific content , Note for text ]
2.6 Canal Example demo
Let's judge by the actual code logic , see Binlog It can be interpreted as Canal Object data model , Deepen the understanding
- insert sentence
- delete sentence
- update sentence
2.7 Canal HA Mechanism
The stability of online services is extremely important ,Canal It's supporting HA Of , In fact, the implementation mechanism also depends on Zookeeper To achieve , And HDFS Of HA similar .
Canal Of HA In two parts ,Canal server and Canal client There are corresponding HA Realization .
- Canal Server： In order to reduce the mysql dump Request , Different server Upper instance You can only be at the same time running, The others are in standby state .
- Canal Client： In order to ensure order , One copy instance Only one at a time canal client Conduct get/ack/rollback operation , Otherwise, the order of client receiving cannot be guaranteed .
rely on Zookeeper Characteristics of （ This article does not focus on explanation zookeeper characteristic , Please find the corresponding information on the Internet ）：
- Watcher Mechanism
- EPHEMERAL node ( and session Life cycle binding )
General steps ：
Canal server To start something canal instance Always go first to zookeeper Make an attempt to start the judgment ( Realization ： establish EPHEMERAL node , Whoever creates successfully will be allowed to start ).
establish ZooKeeper After node success , Corresponding Canal server Start the corresponding Canal instance, No successful Canal instance Will be in standby state .
once ZooKeeper Find out Canal server A After the created node disappears , Inform the others immediately Canal server Step again 1 The operation of , Choose a new one Canal server start-up instance.
Canal client Every time connect when , First of all ZooKeeper Ask who is currently starting Canal instance, And then link to it , Once the link is not available , Will try again connect.
2.8 Canal Use scenarios
It says Canal The principle and operation mechanism of , Let's look at the actual situation ,Canal What kind of problems can be solved for our business scenarios .
2.8.1 Moving on and off
The business is in its infancy , In order to support business development rapidly , Many data storage designs are extensive , For example, user table 、 The order table may be designed as a single table , At this time, the conventional method will use sub database and sub table to solve the capacity and performance problems .
But the biggest problem with data migration ： Online business needs to run normally , If data changes during migration , How to ensure data consistency is the biggest challenge .
be based on Canal, By subscribing to the database Binlog, This problem can be solved well .
See below vivo The practice of account migration without downtime .
2.8.2 Cache refresh
Internet business data sources are not just databases , such as Redis In the Internet business more commonly used , The cache needs to be refreshed when the data changes , The conventional method is to manually refresh in the business logic code .
be based on Canal, By subscribing to the specified table data Binlog, Can be asynchronously decoupled and flushed .
2.8.3 The task is distributed
Another common application scenario is “ Issue task ”, Other dependent systems need to be informed when data changes .
The principle is that the task system monitors database changes , Then write the changed data to MQ/Kafka Carry out task distribution .
For example, when the account number is cancelled, the downstream business party needs to order this notice , Delete business data for users , Or do data archiving and so on .
be based on Canal It can ensure the accuracy of data distribution , At the same time, the business system will not be scattered with all kinds of distribution MQ Code for , So as to realize the distribution and collection , As shown in the figure below ：
2.8.4 Data heterogeneity
In a large website architecture , Database will use sub database and sub table to solve the capacity and performance problems , But the new problems brought about by the sub database and sub table .
For example, queries of different dimensions or aggregate queries , It's going to be very tricky . Generally, we will solve this problem through data heterogeneous mechanism .
So called data heterogeneity , That's what will be needed join The multiple tables of the query are aggregated in a certain dimension DB in .
be based on Canal Data heterogeneity can be realized , As shown in the figure below ：
3、Canal Installation and use of
Canal Detailed installation of 、 Configuration and use , Please refer to the official documents >\> link
3、 ... and 、 Practice accounts
1、 Practice one ： Sub database and sub table
- difficulty ：
Large amount of table data , Single table 3 Billions more .
Regular timed tasks migrate full data , Long time and damaging to business .
- Core appeal ：
Migration without downtime , Maximize your business without affecting it
“ Change the tires of a car running on the road ”
1.2 Migration plan
1.3 The migration process
The whole process is as follows ：
- Analyze the existing pain points of the account
The amount of data in a single table is too large ： Account list 3 Billion +
Too many unique user IDs
The division of business is unreasonable
- Determine the scheme of sub database and sub table
- Stock data migration scheme
Using traditional scheduled task migration , It's too long , And in order to ensure data consistency during the migration process , Need to shut down for maintenance , It has a great impact on users .
Be sure to use canal Migration , Yes canal Do enough research and evaluation , And middleware and DBA Jointly determine , It can support the full amount of 、 And incremental synchronization .
- The migration process is controlled by a switch , Single table mode → Double write mode → Sub table mode .
- Data migration cycle is long , Some unforeseen problems were encountered during the migration , Multiple migrations have taken place .
- After migration , Officially switch to dual write mode , In other words, the single table and the sub table also write data , At this time, data reading is still in single table mode ,Canal Still subscribe to the original single table , Make data changes .
- After two weeks of operation, no new problems occurred on the line , Switch to separate table mode , At this time, the original single table no longer writes data , In other words, there will be no new Binlog produce , After the switch, there are some problems on the line , Follow up immediately ,“ No surprise ”.
2、 Practice two ： Cross border data migration
stay vivo At the beginning of overseas business , Data of some overseas countries are stored in the neutral Singapore computer room , But as the legal compliance requirements of overseas countries become more stringent , Especially in the European Union GDPR Compliance requirements ,vivo The account number should meet the compliance requirements , A lot of compliance transformation work has been done .
The compliance requirements of some non EU countries have changed accordingly , For example, Australia has local requirements to meet GDPR Compliance requirements , The original data of Australian users stored in the Singapore computer room needs to be migrated to the EU computer room , The overall migration complexity increases , The difficulties involved are ：
- Migration without downtime , The mobile phone users who have been shipped need to be able to access the account service normally .
- Data consistency , The consistency of user changed data needs to be guaranteed .
- Business side influence , It can not affect the normal use of account service in the existing network .
2.2 Migration plan
2.3 The migration process
- Set up standby database in Singapore machine room , Master slave synchronization Binlog.
- build Canal Of server And client End , Synchronous subscription consumption Ｂinlog.
- client End based on subscription Ｂinlog To analyze , Encrypted data transmission to the EU GDPR Computer room .
- The European Union uses data to parse the data transmitted , Floor storage .
- After the data synchronization is completed, the operation and maintenance colleagues will help to transfer the upper domain name DNS Analysis and forwarding to EU computer room , Complete the data switch .
- Observe the Singapore computer room Canal Service operation , Stop when there is no exception Canal service .
- Through the business side , The account side completes the switch .
- After the synchronous handover of the business side is completed , Clear the data of Singapore computer room .
3、 Summary of experience
3.1 Data serialization
Canal Bottom use protobuf As a way to list data ,Canal-client When subscribing to change data , by null Is automatically converted to an empty string , stay ORM When the side data is updated , Because the judgment logic is inconsistent , Causes the data in the final table to be updated to an empty string .
3.2 Data consistency
Account number this time online Canal-client Only a single node , But in the process of data migration , Due to business characteristics , This leads to inconsistencies in the data , Examples are as follows ：
- Users change their mobile phone numbers A.
- Canal At this time, you have not subscribed to this Binlog position.
- The user changed his mobile phone number B.
- At the corresponding moment ,Canal Consumption to update mobile phone number A Of Binlog, As a result, the mobile phone number of the user's new binding has been covered .
3.3 Database master-slave delay
For the sake of data consistency （ The combination of account business data does not meet the need for sub database ）, The account number is divided into tables in the same database , That is to say, during the migration process, the sub table data is continuously written , Increasing the load of the database results in the delay of reading from the database .
Solution ： Increase rate control , Based on the actual situation of the business , Configure different strategies , For example, there is a large amount of business during the day , The write speed can be reduced appropriately , The business volume at night is small , It can increase the writing speed properly .
3.4 Monitoring alarm
In the overall data migration process ,vivo The account number is client Add a simple monitoring method for real-time synchronous data , That is, based on the business table and memory .
The overall monitoring granularity is coarse , Including the above data inconsistency , After data synchronization is complete , No anomalies were found , As a result, there are business problems when switching to the split table mode , Fortunately, logical data can be compensated by compensation and other means , And it has little impact on online data .
Four 、 Expand your thinking
1、 Analysis of existing problems
The above is based on Canal A simple drawing of the existing architecture , Although based on HA Overall high availability , But after careful study, there are still some hidden dangers , It's marked in red X The node of , It can be regarded as a possible failure point .
2、 Common component reuse
Based on the above possible problems , We can try to do the optimization in the figure above .
3、 Extended application - Multi data center synchronization
In the Internet industry , Everyone to “ Different live ” I'm familiar with it , And data synchronization is the foundation of live in different places , All components with data storage capabilities such as ： database 、 cache 、MQ etc. , Data can be synchronized , Form a huge and complex data synchronization topology , Back up each other's data , In order to achieve the true meaning of " Different live ”.
This logic is not within the scope of this discussion , You can refer to the following article , I think the explanation is more detailed ：http://www.tianshouzhi.com/api/tutorials/canal/404
5、 ... and 、 Reference material
author ：vivo Product platform development team