当前位置:网站首页>The fastest microservice distributed transaction?

The fastest microservice distributed transaction?

2021-05-04 16:52:52 Jiedao jdon

This paper introduces Adaptive How to implement distributed transactions at the application level , That is how to implement distributed consensus algorithm among application services , This is of great significance for distributed flexible transactions of microservices .Adaptive Based on this technology, we have developed high performance 、 elastic 、 Real time financial trading platform Hydra, It's called “ Real time trading experts ”.

The core of this distributed transaction system is through deterministic execution 、 Copy consensus log and snapshot state , The advantage of such a consensus approach is that it's simple 、 Easy to debug 、 Strong fault tolerance and high tech scalability .

I saw it a few years ago LMAX framework , among Disruptor And then open source .LMAX It made me feel “ Application level consensus ” Ideas , Since we were in 2015 Since we began to explore its potential in , We benefit a lot ,2016 year , We can do it in two trading systems , It includes a set of financial transactions .

LMAX What are the characteristics of ?

LMAX It's a financial trading system for foreign exchange commodities and indexes , The system process can process customer's order in real time . Foreign exchange is very volatile , It changes quickly , So exchanges need to have very low delays ( Sub millisecond ) Deal with a lot of orders Delay . The core system is matching engine , Responsible for matching customers' buying and selling , The matching engine is stateful : It needs to save customer orders until they are matched or cancelled .

The challenge for such systems is that they are highly competitive in resources , But fragmentation is not suitable for parallelism or throughput : for example , EUR/USD( EUR / USD ) It's very unstable in itself , Need to manage all orders for this pair of currencies in the same order book in : You really can't slice this problem , How to solve this problem ?

The traditional way is always trying to calculate is stateless Use the state in a cache or database at the application level , But this design can't solve the problem of high throughput at the same time , So demanding for low latency .

What model solved LMAX problem ?

LMAX The team tried a highly competitive database approach and a variety of other approaches , Include SEDA,Actor Model, etc , Then they tried an idea from the '70s and' 80s : State Machine Replication . Apply the same sequence of messages to the state machine , Because the messages are sequential , Can make state modification effective , Because the state machine It's deterministic .

So that's the general idea : Write matching engine logic ( Business logic ), Follow some simple rules , Ensure certainty . Then deploy the code Several nodes in the network , And then in the same message order ( buy , Sell and Cancel the order, etc ) Apply to all nodes .

At this stage you may want to know how you can build this sequential message ? Orders come from different places (UI,API etc. ) therefore , An out of the box “ Single order ” There is no such thing as . Here's the solution :

1. Place an algorithm to elect a node to be a leader leader, The primary node .

2. The master node processes all incoming messages and sorts them ( client Clusters always talk to leaders ).

3. The master node will be copied sequentially to the slave node .

4. Once a message is received from the node, it is acknowledged .

5. Once the number of secondary node acknowledgments received by the master node meets the quorum of the total number of clusters ( such as 1/2 + 1 individual ), This message will be marked as submitted committed, And ready to be handled by the business Logic .

6. The master also tells the slave that it has submitted a confirmation committed The news of .

7. Followers from nodes can now apply messages to their own state machines ( Business logic ).

Consensus algorithm

If you've heard about consensus algorithms , for example Paxos or Raft Algorithm , Just above LMAX The algorithm is very similar , however LMAX Not used Raft. At that time, the paper had not yet been published .

Consensus algorithms have traditionally been used to build distributed databases or Distributed coordination system (Chubby,Zookeepe r,Etcd,Consul etc. ). LMAX What's interesting about the case is that they didn't use a database level consensus algorithm , It's about consensus algorithms at the application level .

The engineering effort involved is usually fundamentally different from the research and development of distributed databases , I don't think a lot of people will try to use consensus algorithms directly at the service level .

Consensus algorithm solves the problem of distributed system : They guarantee that even if something goes wrong , Node failure or network partition , A group of nodes will also agree and copy the same state .Raft And so on Clients to the cluster (CAP Of the theorem C

), They are also fault tolerant . But most nodes need to be available in the cluster , This is suitable for those who will be consistent (CAP Of C) Better than usability (CAP Of A) The more important system , Like the synchronization system .

The benefits of developing applications

Compared to the database level consensus algorithm , What are the advantages of this application level consensus algorithm in development and application ?


I think when I implement architecture , What surprised me most was that the first time it was so simple . Once the cluster infrastructure is in place , Implementation state machine ( Deterministic business logic ) It's simple , Of course, it's much simpler than what I see in other ways , I don't have Before you see another design . Clear separation of concerns : Consensus module and business logic .

It's also an application DDD Good environment , Our trading business logic code does not use any framework or technical infrastructure : Simple old objects , Data structures and algorithms , All run on one thread . The exchange model we built is a very advanced model , and , Tell the truth , I can't think of any other design that enables us to meet our customers' functional requirements and high availability . At least there's no time to market that fast .


A very important advantage of this architecture is that it can ensure that the state of business logic is consistent among nodes .

“ Tradition ” The way the system handles transactions :

1. Message received from client .

2. From some storage ( database , Cache, etc ) Load the corresponding data . 3. Process the message , Apply some logic to the data and make decisions .

4. Views submit new data to consistent storage , But because we are in a Distributed systems , others ( Another node , Threads, etc ) There may be In the meantime, the data changed , We need to lock in with some form of optimism Or the database is CAS( Compare and exchange ) operation , To make sure we don't Cover the data that has been modified by another channel . 5. If CAS Failure , We have to reload the data and try again .

Connect the above with us in “ Inside ” Implementation of consistent algorithm for comparison :

1. Receive client messages .

2. Apply logic to the data we already have in memory ( In this system We tend to load the thermal data that we need most at run time ).

3. Because the system is consistent , So there's nothing to go back The state is the same in all nodes , Confirm directly to the client

This approach replaces a lot of code that needs to deal with a variety of possible fault situations , Read and write database errors can occur , Can cause dirty data or inconsistencies .

Another consequence of achieving strong consistency in the service layer is : Actively load all the data you need in the business logic without worrying , This can significantly simplify the external system or data store code , And produce better performance , Computing and data exist in different layers of the architecture ,

You don't need a database

“ You don't need a database ” It may sound controversial , It means that you don't really need a database with this design . Like most consensus algorithms ,Raft All messages received by the system are stored locally in each node of the cluster : go by the name of Raft journal .

This log is used in the following situations :

1. If you restart the cluster , The log can be applied to all nodes , So that the playback system returns to the same state before starting - Remember that state machines are deterministic , therefore Reapplying the same sequence of events will produce the same state .

2. If a node in the cluster fails or restarts , You can also use the log : When it comes in , It can replay its local logs , Then query the leader to retrieve any It may have missed its message .

3. This property is also very useful for troubleshooting errors in code : If When the system fails , Just search Raft Log and local replay , The same version of code can be debugged using the debugger , To do so , You will be Reproduce the same problems in the production environment . Anyone with a high degree of experience in diagnosing concurrent systems should understand the significant advantages of this approach .

4. snapshot because Raft Logs can grow indefinitely , So it's usually used with snapshots : System from Raft Take a snapshot of the state of the application from the serial number of one of the logs , And store the snapshot on disk . You can then restart the system , Or fix the failed node by loading the latest snapshot and applying all the snapshots .

No, “ temporary ” elastic

Another fundamental difference from other architectures is how you deal with resilience . Because the application is built on a consistent algorithm , So there's no need to consider elasticity . It eliminates a lot of complexity .

Defined business logic

As mentioned earlier, business logic needs to be determined , When you call your business logic twice in the same order , The result of the system should be the same state , idempotent . This means that your business logic should not :

1. Use system time . If you query the system time , Each time you get a different output . Time needs to be abstracted from the infrastructure and become Raft Part of the log . Each message is timestamped by the leader and copied to the slave log , Business systems should use this logic time instead of system time , In the system “ Inject ” Time also has side effect : It makes time-dependent code very easy to test ( You can fast forward ).

2. Using random numbers that are not carefully defined . If the system needs to generate random numbers , You need to make sure Random generators with the same seed on all nodes . Use the current message time ( Non system time !) As a seed .

3. Using uncertain Libraries . This may sound very strict , however Remember that we are here to discuss the business logic of the system , also According to my experience , Ordinary old objects work very well .

Is it really that easy ?

Simple is not simple - some stay LMAX Working developers say , Martin broke them framework : Once they use it , It's hard to work on any other system Because it's too painful .

Balance Please note that , Strong consistency comes at a price : All nodes in the cluster are Participate in consensus algorithm and process every message received by the system . If your system doesn't need strong consistency , You can use “ No sharing ” framework , Each node processes requests independently . Also pay attention to , A consistent algorithm requires at least three nodes : You will be It takes at least three processes , Running on three different servers , It's better to have three , Besides , Because messages need to be copied to most nodes before being sent to submit , The delay between nodes will directly affect the throughput , For this reason , Nodes are usually deployed in the same area .

Looking for a name I think this architectural style applies to many types of trading systems ( real time Workflow ,RFQ engine ,OMS, Matching engine , Credit check system , intelligence Order router , Hedge engines, etc ), Besides finance and finance, it is also very suitable for related applications . In some cases , Better results than traditional applications .

But of course, like any architecture , It's better for some systems than others .

When I talk about architecture with others , We tend to call it “LMAX” framework , But I think it's worth having its own name . I haven't found a better name to match this “ Application level consensus ” framework .

original text PDF

[ The quilt banq On 2018-09-19 07:46 A modified ]

本文为[Jiedao jdon]所创,转载请带上原文链接,感谢