当前位置:网站首页>Glacier open source the first fully open source distributed global ordered sequence number (distributed ID) framework in the whole network!!

Glacier open source the first fully open source distributed global ordered sequence number (distributed ID) framework in the whole network!!

2020-12-06 21:00:54 Ice team

Write it at the front

mykit-serial The design of the framework refers to the open source of Li Yanpeng vesta frame , And completely reconstructed vesta frame , Learning from the snowflake algorithm (SnowFlake) Thought , And on this basis, a comprehensive upgrade and optimization . Support embedded (Jar package )、RPC(Dubbo,motan、sofa、SpringCloud、SpringCloud Alibaba And other mainstream RPC frame )、Restful API( Support SpringBoot and Netty), It can support two modes of maximum peak and minimum granularity .

Open source address :

GitHub:https://github.com/sunshinelyz/mykit-serial

Gitee:https://gitee.com/binghe001/mykit-serial

Why not use database auto increment fields ?

If the auto increment field of the database is used in the business system , Auto increment fields are completely dependent on the database , This is in database migration , Capacity expansion , Data cleaning , The operation of sub database and sub table brings a lot of trouble .

When the database is divided into databases and tables , One way is by adjusting the auto increment field or database sequence To achieve cross database ID Uniqueness , But it's still a strong database dependent solution , There are many limitations , And strong dependence on database type , If we want to add a database instance or migrate business to a different type of database , That's quite troublesome .

Why not UUID?

UUID Although it can guarantee ID Uniqueness , however , It doesn't meet many of the other features that business systems need , for example : The rough order of time , Inverse solution and modeling . in addition ,UUID It's generated using full time data , Poor performance , also UUID A long , Large space , Indirectly lead to database performance degradation , what's more ,UUID It's not orderly , This leads to B+ There will be too many random write operations when the tree index is written ( Successive ID It's going to produce partial sequential writing ), In addition, when writing, it can't produce order append operation , Need to carry out insert operation , This will read the whole B+ Tree node to memory , Then insert the record and write the entire node back to disk , This kind of operation takes up a lot of space when recording , Performance degradation is relatively large . therefore , Not recommended UUID.

Questions to consider

Since the database is self increasing ID and UUID There are many limitations , We need to consider how to design a distributed globally unique serial number ( Distributed ID) service . here , We need to consider the following factors .

Globally unique

A pessimistic strategy for distributed systems to guarantee global uniqueness is to use locks or distributed locks , however , As long as the lock is used , It will greatly reduce performance .

therefore , We can learn from Twitter Of SnowFlake Algorithm , Using the order of time , And in a certain unit of time, the self increasing sequence is used , To achieve global uniqueness .

Roughly ordered

UUID The biggest problem is that it's out of order , Any business wants to generate ID Is ordered , however , In a distributed system, it is necessary to be completely ordered , It's about data aggregation , Locks or distributed locks are needed , Considering efficiency , A compromise is needed , Roughly ordered . There are two mainstream solutions , One is second order , One is millisecond order , Here's another trade-off , We decided to support two ways , One of the ways in which services are used is determined by configuration .

It can be inversely solved

One ID Generated after ,ID It has a lot of information , Online investigation , The first thing we usually see is ID, If according to ID You know when it came into being , Where did you come from , Such an inversely solvable ID It can help a lot .

If ID There's time in it and it can be solved in reverse , At the storage level, it will save a lot of traditional timestamp The space occupied by a class of fields , It's also a design that kills two birds with one stone .

Can make

Even if a system is highly available, it can never be guaranteed that it will never go wrong , What to do if something goes wrong , Handle by hand , What to do if the data is contaminated , Wash the data , But when it comes to manual processing or data washing , If you use database auto increment fields ,ID It has been covered by later business , How to recover to the time window when the system is out of order ?

therefore , The distributed global sequence number we use ( Distributed ID) Services must be replicable , recoverable , Can make .

High performance

No matter which business , Orders are good , The goods are good , If a new record is inserted , That must be the core function of the business , The demand for performance is very high ,ID It depends on the network IO and CPU Performance of ,CPU It's not a bottleneck , Based on experience , A single machine TPS Should reach 10000/s.

High availability

First , Distributed global serial number ( Distributed ID) The service must be a peer-to-peer cluster , A machine goes down , Requests must be able to be forwarded to other machines , in addition , The retrial mechanism is also essential . Last , If the remote service goes down , We need to have a local fault tolerance solution , The dependence of local libraries can be the last barrier to high availability .

in other words , We support RPC Release pattern , Embedded publishing model and REST Release pattern , If a pattern is not available , You can go back to other publishing modes , If Zookeeper Unavailable , You can go back to using a locally configured machine ID. In order to achieve the maximum availability of services .

Telescopic

As a distributed system , What can never be ignored is that the business is growing , The absolute capacity of a business is not the only criterion to measure a system , You know business is always growing , therefore , The system design should not only consider the absolute capacity that it can bear , You have to think about the speed of business growth , Whether the horizontal scaling of the system can meet the growth rate of the business is another important standard to measure a system .

Design and implementation

Overall architecture design

mykit-serial The overall architecture of is as follows .

mykit-serial The meaning of each module of the framework is as follows :

  • mykit-bean: Provide uniform bean Class encapsulation and constants used by the whole framework .
  • mykit-common: Encapsulates the common tool class of the whole framework .
  • mykit-config: Provide global configuration capabilities .
  • mykit-core: The core implementation module of the whole framework .
  • mykit-db: Storing database scripts .
  • mykit-interface: The core abstract interface of the whole framework .
  • mykit-service: be based on Spring The core function of the implementation .
  • mykit-rpc: With RPC How to provide services to the outside world ( Follow up support Dubbo,motan、sofa、SpringCloud、SpringCloud Alibaba And other mainstream RPC frame ).
  • mykit-server: So far Dubbo The way , Later move to mykit-rpc modular .
  • mykit-rest: be based on SpringBoot Realized Rest service .
  • mykit-rest_netty: be based on Netty Realized Rest service .
  • mykit-test: The test module of the whole framework , Through this module, you can quickly master mykit-serial How to use .

Release pattern

According to the way the customer uses , It can be divided into embedded publishing mode ,RPC Release mode and Rest Release pattern .

  1. Embedded publishing mode : Only applicable to Java client , Provide a local Jar package ,Jar Packages are embedded native Services , You need to configure the local machine in advance ID( Or when the service starts , from Zookeeper Dynamically assign a unique distributed sequence number ), But it doesn't depend on the central server .

  2. RPC Release pattern : Only applicable to Java client , A client that provides a service Jar package ,Java Programs like call local API To call , But it depends on the hub's distributed serial number ( Distributed ID) Generating servers .

  3. REST Release pattern : Central server through Restful API Provide services , A confession Java Language client uses .

The release mode is finally recorded in the generated global sequence number .

Serial number type

According to the number of digits of time and sequence number , It can be divided into maximum peak type and minimum grain size type .

1. Maximum peak type : Using second order order , Seconds of time 30 position , The serial number occupies 20 position

Field edition type generation Second time Serial number machine ID
digit 63 62 60-61 30-59 10-29 0-9

2. Minimum grain size type : In millisecond order , Millisecond time occupation 40 position , The serial number occupies 10 position

Field edition type generation Millisecond time Serial number machine ID
digit 63 62 60-61 20-59 10-19 0-9

The maximum peak type can withstand greater peak pressure , But the roughly ordered grain size is a little bit large , The minimum size type has finer grain size , But the theoretical peak that can be tolerated per millisecond is limited , by 1024, If there are more requests in the same millisecond , You have to wait for the next millisecond to respond .

Distributed serial number ( Distributed ID) The type of is specified in the configuration , You need to restart services to switch to each other .

data structure

1. Serial number

Maximum peak type

20 position , Theoretically, on average, it can produce 2^20= 1048576 individual ID, Million levels , If the network of the system IO and CPU Strong enough , Tolerable peaks of millions per millisecond .

Minimum grain size type

10 position , Total serial number per millisecond 2^10=1024 individual , That is to say, every millisecond produces at most 1000+ individual ID, Theoretically, the peak load is not as good as our maximum peak scheme .

2. Second time / Millisecond time

Maximum peak type

30 position , Time in seconds ,2^30/60/60/24/365=34, That is to say, it can be used 30+ year .

Minimum grain size type

40 position , It means millisecond time ,2^40/1000/60/60/24/365=34, It can also be used 30+ year .

3. machine ID

10 position , 2^10=1024, That is, support at most 1000+ Servers . Central distribution model and REST Release mode generally does not have too many machines , According to the design, every machine TPS 1 ten thousand /s,10 Servers can have 10 ten thousand /s Of TPS, It can basically meet most of the business needs .

But considering that we can use embedded publishing in business services , To the machine ID The demand for more , At the most 1024 Servers .

4. generation

2 position , It is used to distinguish three publishing modes : Embedded publishing mode ,RPC Release pattern ,REST Release pattern .

00: Embedded publishing mode
01:RPC Release pattern
02:REST Release pattern
03: Keep unused

5. Serial number type

1 position , To distinguish between two kinds of ID type : The maximum peak type and the minimum grain size type .

0: Maximum peak type
1: Minimum grain size type

6. edition

1 position , It is used to do the temporary scheme of expansion bit or expansion .

0: The default value is , In order to avoid converting to integer and then converting back to string will be truncated
1: Expanding or expanding

As 30 Expand the use of , Or in 30 After year ID Near the end of the day , Expand to seconds or milliseconds to get the porting time window of the system , In fact, we only need to expand one bit , It can be reused 30 year .

Concurrent processing

For central servers and REST How to publish ,ID The process of generation involves the network IO and CPU operation ,ID The generation of is basically a memory to cache operation , No, IO operation , The Internet IO It's the bottleneck of the system .

be relative to CPU In terms of computing speed, the network IO Is the bottleneck , therefore ,ID The resulting service uses a multithreaded approach , about ID The competition point in the generation process time and sequence, A variety of implementations are used here

  1. Use concurrent Bag ReentrantLock Mutually exclusive , This is the default implementation , It is also a compromise plan to pursue performance and stability .
  2. Use traditional synchronized Mutually exclusive , The performance of this method is a little inferior , By passing in JVM Parameters -Dmykit.serial.sync.lock.impl.key=true To open .
  3. Use CAS Methods are mutually exclusive , The performance of this implementation is very high , But in a high concurrency environment CPU The load will be high , By passing in JVM Parameters -Dmykit.serial.atomic.impl.key=true To open .

machine ID The distribution of

We will machine ID It's divided into two sections , A section serves RPC Release mode and REST Release pattern , Another segment serves the embedded publishing model .

0-923: Embedded publishing mode , Preconfigured ,( Or by Zookeeper produce ), Most support 924 Embedded servers
924 – 1023: Central server publishing mode and REST Release pattern , Most support 300 platform , The biggest support 300*1 ten thousand =300 ten thousand /s Of TPS

If the embedded publishing model and RPC Release mode and REST The usage of the publishing model doesn't match this ratio , We can dynamically adjust the values of the two intervals to adapt to .

in addition , There is a natural isolation between vertical businesses , Each business can use the most 1024 Servers .

And Zookeeper Integrate

For the embedded publishing model , Service startup requires connection Zookeeper colony ,Zookeeper Allocate one 0-923 One of the intervals ID, If 0-923 The interval of ID Used up ,Zookeeper Will assign a greater than 923 Of ID, This situation , Refused to start service .

If you don't want to use it Zookeeper The only machine produced ID, We offer default pre configured machines ID Solution , Each uses a unified distributed global serial number ( Distributed ID) A default machine service needs to be pre configured ID.

Time synchronization

Use mykit-serial Generate distributed global serial numbers ( Distributed ID) when , We need to ensure that the server time is normal . here , We can use Linux Scheduled tasks for crontab, Through the timing server virtual cluster ( Global presence 3000 Multiple servers ) To approve the time of the server .

ntpdate -u pool.ntp.orgpool.ntp.org

performance

The final performance verification should ensure that each server has TPS achieve 1 ten thousand /s above .

Restful API file

Generate a distributed global sequence number

  • describe : According to the system time, a globally unique global sequence number is generated and returned in the method body .
  • route :/genSerialNumber
  • Parameters :N/A
  • Non null parameter :N/A
  • Example :http://localhost:8080/genSerialNumber
  • result :3456526092514361344

Inverse global sequence number

  • describe : To produce serialNumber Carry out the inverse solution , Return the inverse solution in the response body JSON character string .
  • route :/expSerialNumber
  • Parameters :serialNumber=?
  • Non null parameter :serialNumber
  • Example :http://localhost:8080/expSerialNumber?serialNumber=3456526092514361344
  • result :{"genMethod":2,"machine":1022,"seq":0,"time":12758739,"type":0,"version":0}

Translation time

  • describe : Convert the time of a long integer into a readable format .
  • route :/transtime
  • Parameters :time=?
  • Non null parameter :time
  • Example :http://localhost:8080/transtime?time=12758739
  • result :Thu May 28 16:05:39 CST 2015

Manufacturing global serial number

Java API file

Generate global sequence number

  • describe : A globally unique distributed serial number is generated according to the system time ( Distributed ID) And return in the method body .
  • class :SerialNumberService
  • Method :genSerialNumber
  • Parameters :N/A
  • Return type :long
  • Example :long serialNumber= serialNumberService.genSerialNumber();

Inverse global sequence number

  • describe : For the generated distributed sequence number ( Distributed ID) Carry out the inverse solution , Return the inverse solution in the response body JSON character string .
  • class :SerialNumberService
  • Method :expSerialNumber
  • Parameters :long serialNumber
  • Return type :SerialNumber
  • Example :SerialNumber serialNumber = serialNumberService.expSerialNumber(3456526092514361344);

Translation time

  • describe : Convert the time of a long integer into a readable format .
  • class :SerialNumberService
  • Method :transTime
  • Parameters :long time
  • Return type :Date
  • Example :Date date = serialNumberService.transTime(12758739);

Manufacturing global serial number (1)

  • describe : A distributed serial number is created by a given distributed sequence number element .
  • class :SerialNumberService
  • Method :makeSerialNumber
  • Parameters :long time, long seq
  • Return type :long
  • Example :long serialNumber= SerialNumberService.makeSerialNumber(12758739, 0);

Manufacturing global serial number (2)

  • describe : By giving ID Elements make ID.
  • class :SerialNumberService
  • Method :makeSerialNumber
  • Parameters :long machine, long time, long seq
  • Return type :long
  • Example :long serialNumber= serialNumberService.makeSerialNumber(1, 12758739, 0);

Manufacturing global serial number (3)

  • describe : Manufacturing by a given distributed sequence number element ID.
  • class :SerialNumberService
  • Method :makeSerialNumber
  • Parameters :long genMethod, long machine, long time, long seq
  • Return type :long
  • Example :long serialNumber= serialNumberService.makeSerialNumber(0, 1, 12758739, 0);

Manufacturing global serial number (4)

  • describe : Manufacturing by a given distributed sequence number element ID.
  • class :SerialNumberService
  • Method :makeSerialNumber
  • Parameters :long type, long genMethod, long machine, long time, long seq
  • Return type :long
  • Example :long serialNumber= serialNumberService.makeSerialNumber(0, 2, 1, 12758739, 0);

Manufacturing global serial number (5)

  • describe : By giving ID Elements make ID.
  • class :SerialNumberService
  • Method :makeSerialNumber
  • Parameters :long version, long type, long genMethod, long machine, long time, long seq
  • Return type :long
  • Example :long serialNumber = serialNumberService.makeSerialNumber(0, 0, 2, 1, 12758739, 0);

FAQ

1. Whether the adjustment time will affect ID Produce function ?

Do not restart the machine, slow down the time ,mykit-serial Throw an exception , Rejection produces ID. Restart the machine and speed up the time , After adjustment, it will produce normally ID, There is no ID produce .

2. What's the impact of the restart slow down or speed up time ?

Restart the machine and slow down the time ,mykit-serial It will be possible to produce repetitive times , System administrators need to make sure that this doesn't happen . Restart the machine and speed up the time , After adjustment, it will produce normally ID, There is no ID produce .

3. Every time 4 Will synchronization run second once a year affect ID Produce function ?

The error between atomic clock and electronic clock is 1 second , That is to say, the electronic clock every 4 The annual meeting is slower than the atomic clock 1 second , therefore , Every four years , The network clock will be synchronized once , But the local machine Windows,Linux It won't automatically synchronize the time , Need to synchronize manually , Or use ntpupdate Synchronization to network clock . Because the clock is set fast 1 second , The adjustment does not affect ID produce , Adjusted 1s No inside ID produce .

Well, that's all for today , I'm glacier , See you next time ~~

Heavy benefits

Search on wechat 【 Glacier Technology 】 WeChat official account , Focus on this deep programmer , Read hard core technology dry goods every day , Reply in official account 【PDF】 I have prepared the interview materials of the first-line large factories and my original superhard core PDF Technical documentation , And I carefully prepared for you a set of Resume Template ( Constantly updating ), I hope you can find the job you want , Learning is a matter of depression , The way to laugh now and then , come on. . If you succeed in getting into the company you want , Don't slack off , Workplace growth is the same as learning new technology , move forward , or you 'll fall behind . If we're lucky, we'll see you again !

in addition , I open source each PDF, I will continue to update and maintain , Thank you for your long-term support for the glacier !!

版权声明
本文为[Ice team]所创,转载请带上原文链接,感谢
https://chowdera.com/2020/12/20201206210052554h.html