当前位置:网站首页>One year summary of event traceability and cqrs implementation

One year summary of event traceability and cqrs implementation

2021-05-04 14:51:47 Jiedao jdon

Teiva Harsanyi He shared his year's experience in the key and important field of air traffic management EventSourcing Trace the source of the event and CQRS Implementation experience , This paper expounds the challenges and problems in the process of implementation .

Business environment

The background of the project is with air traffic management (ATM) This business area is about . We're an air traffic management service provider ANSP Designed a solution , Responsible for controlling specific geographic areas . The goal of the application is simple : Calculate and save flight data . The process is roughly as follows .

When the plane arrives ANSP A few hours ago when I was managing the area ,ANSP Information from the European air traffic management organization will be received from the European air traffic management agency . This information contains the planned data , For example, the type of aircraft , Place of departure , Destination , The required route, etc . Once the plane arrives ANSP( Responsible area ,ANSP Control and monitor the flight area ) Of AOR, We can receive input from a variety of sources : Tracking updates ( What is the current flight location ), Request to modify the current route , Events triggered by the trajectory prediction system , Alerts from the conflict detection system, etc .

Although we have to handle multiple concurrent requests at the same time , But in terms of throughput , It is associated with Paypal or Netflix Can't be compared with .

The application is very security oriented , Because if there's a serious failure , We may not lose money to customers , But you could lose your life . therefore , Implement a reliable , Responsible and resilient systems to ensure data consistency / Integrity is clearly the first priority .

CQRS, Event source

Both models are actually easy to understand .

1. CQRS

CQRS( Order the separation of inquiry responsibilities ) It's a separate write ( command ) And read ( Inquire about ) The way . This means that we can have a dedicated database to manage the write part . Query and read section ( Also known as views or projections ) Although it comes from the writing part , But by one or more other databases ( It depends on our use cases ) Specialized management . Most of the time , The queries in the read part are computed asynchronously , This means that neither part is strictly consistent .

CQRS One of the ideas behind it is : It must be admitted that it is almost impossible to effectively manage both read and write operations at the same time relying on a single database (banq notes : If you're not interested in this hypothesis , So maybe you don't have enough experience .). To focus on read and write operations , You can choose different software vendors , Adjust the applied database . for example ,Apache Cassandra In preservation / Writing data is effective , and Elasticsearch Perfect for searching . Use CQRS It's actually a way to take advantage of the solution rather than relying on a single database .

Besides , We may also decide to deal with different data models . It depends, of course, on demand . for example , Manage using a model in the context of the report view , In the persistence phase of the write part, another effective non normalized model is used .

About these ideas , We may decide to do something that has nothing to do with consumers ( For example, exposing specific business objects ), Or some specific business objects for consumers .

2. Event source

according to Martin Fowler The definition of event traceability :

Make sure that all changes to the state of the application are stored as a series of events

This means that we don't store the state of the object , contrary , We store all the events that affect its state ((banq notes : The state is similar to stock , Similar to account balance ; Events are like traffic , Similar to a series of transfer events leading to changes in account balance )). then , To retrieve the state of an object , We have to read different events related to this object , And apply them one by one .(banq notes : Account balance is calculated in real time , Instead of reading a database field directly )

3. CQRS + Event procurement

These two patterns are often combined , stay CQRS Applying event traceability means that every event is saved in the write part of the application , Then the read part is derived from the real-time calculation of the event series .

Sometimes , We implement CQRS It seems that there is no need to trace the source of events .

in fact , For most use cases , When we implement event traceability ,CQRS It's necessary (banq notes : Real time computation is complex for each query , Belong to o(n)), We may just O(1) The calculation degree is directly retrieved from the database status field , Instead of having to calculate every time n A different event . Of course , The exception is the use case for simple audit logs . ad locum , We don't need to manage views ( Or status ), Because we just want to retrieve a series of logs .

4. Domain-driven design

Domain-driven design (DDD) It's a way to deal with software complexity associated with domain models . It's in 2004 Year by year Eric Evans stay “ Domain-driven design : Solve the complexity in the software core ” A Book Introduction .

We won't introduce all the different concepts , But if you're not familiar with it , I would strongly recommend that you look at it . For all that , We're just going to introduce CQRS / In the event traceability application DDD Useful concepts in it .

DDD The first concept that comes with it is aggregation . Aggregation is a set of domain objects , They are considered to be the smallest unit of information about multiple data changes . Transactions within aggregations must remain atomic (banq notes : Aggregation is naturally transactional , Only transactional is aggregation ).

meanwhile , Aggregation uses invariance to enforce its own data consistency / integrity . Invariance is a simple rule , No matter how demand changes , There are always parts that stick together all the time , Or change at the same time , Or not at the same time . for example ,STAR( Standard terminal arrival route , It's basically the scheduled route before landing ) Always connected to a particular airport , This connection is constant . The invariance constraint is : If not changed STAR, You can't change the destination airport , And it's time to STAR Always valid for the airport .

Besides , There is an object that acts as the manager of the aggregation ( Process input and delegate business logic to sub objects ,facade Pattern ) It's called an aggregate root .

Aggregation is made up of a group of objects , About the objects that make up the aggregation , We need to distinguish between entities and value objects . An entity is an object with an identity , It's not defined by its properties , A person will have different ages over a period of time , But he / She's still the same person . On the other hand , Value objects are completely defined by their properties . Different cities have different addresses . Entities are mutable , And the value object is immutable . Besides , An entity can have its own life cycle . for example , A flight is ready to start first , airborne ( flight ), And then land .

In the model definition , Entities should be as simple as possible , And focus on its identity and life cycle . stay CQRS / In the context of the event procurement application , Entity is a key factor , Because most of the time changes made in an aggregation are based on its lifecycle . It is crucial to ensure that each entity implements a function method to determine whether it is equal to another entity instance . It can be done by comparing an identifier or a set of guaranteed identifiers ( Primary key ) To complete .

Now we know the concept of entity , Let's go back to invariance . To define them , We're using a kind of BDD( Behavior driven development ) Format inspired language :

Assume [ Entity ] stay [ state ]

If so [ event ] when

We should be [ Implementation rules ]

I really think it's very effective . Mainly because it's easy for business people to understand .

Last but not least ,DDD It also brings the concept of bounded context . Basically , We don't need to manage a large complex model , We can divide it with clear boundaries in different contexts . I've already mentioned this concept in my article . Why the canonical data model is anti pattern ?

When we have to design a view , We can apply the concept of bounded context . As mentioned earlier , Views can be consumer specific ( Because we need to achieve low latency or for other reasons ) Or common to multiple consumers .

In the latter case , We have to think about the exposed data model . It's a global and shared model for the whole company , Or something in a particular environment , Just like a given domain ?

If it's sharing mode , We need to remember the impact of change on consumers . This can be mitigated by applying a service layer at the top of the view , But I'm in favor of directly contextualizing the view . for example , In the case of model changes , We can make the original view expose the previous model , And create another view to show the new model .

5. Orders and events

In the event traceability Architecture , It's important to distinguish between commands and events . An order represents an intention ( use CreateCustomer The present tense represents an order ), And events represent a fact , What has happened ( use CustomerUpdated This past tense represents an event ).

As a concrete example of my project , One event might be to receive a radar track indicating the current aircraft position . It's hard for the system to reject such an event , Because it has happened ( When on earth does it happen , It may depend on various factors such as delay ).

On the other hand , The flight control command that you want to modify the flight path is the command . It's a user intent , Different from the previous facts , It has to be verified by our application to be executable ( The point is that it hasn't really been implemented yet ).

Most of the time , A command is designed as a synchronous interaction , An event is designed as an asynchronous interaction . But not in all cases .

It's also important to remember the concept of data ownership . Let's imagine two systems A and B Exchange customers between Customer Simple interaction of data . If A Produce an asynchrony CustomerUpdated news , The message was ignored B The captured , however B It's considered a client Customer Owner ( At the customer Customer The current phase of the life cycle ), therefore ,B May have the right to reject the change , because B It's the clients Customer owner . Even by A The event message sent out looks like a domain event , But in the end it just makes B It's just a command executed by the system (banq notes :CustomerUpdated yes A Output events for , But it is B Input command for UpdateCustomer,B You can refuse to execute this command , because B yes Customer The owner of the data ).

The implementation of

The design is based on Axon frame . I'm not going to talk about this framework again , Because this article is based on a technology independent design discussion . however , If you are Java Implementing applications in the environment , I strongly recommend that you look at it . in my opinion ,Axon Framework For implementation CQRS / Event Sourcing The app is great .

Let's look at internal application design :


In short , The application receives commands and publishes internal events . These events are stored in the event memory and published to the corresponding handler , The corresponding event response handler is responsible for updating the view . We may also decide to implement a service layer on top of the view ( It's called a read handler ).

Now? , Let's take a closer look at the different situations .

1. Create aggregations

The command handler receives a CreateFlight Command and check if an instance exists in the domain Repository . The domain repository manages various aggregation instances . It first checks the cache , If the object does not exist , It will check the event Repository . An event repository is a database that holds a series of events . We'll see later what I think is a good event storage database . under these circumstances , The event repository is still empty ( Because it's the creation aggregation phase , The world started empty ), So the repository doesn't return anything .

The command handler is responsible for triggering invariance ( Business ), If something goes wrong , We can synchronously return exceptions indicating business problems . otherwise , The command handler publishes one or more events to the event bus . The number of events depends on the use cases of the internal data model granularity . In our scenario , We're going to assume that a FlightCreated event .

The first component triggered by this event is the domain handler . This component is responsible for updating the domain aggregation according to the implemented logic . Generally speaking , Logical delegation to aggregate root ( As a facade facade, But it may also delegate underlying logic to child domain objects ). please remember , Aggregation must always be consistent , Data integrity must also be enforced by verifying invariance (banq notes : Transaction mechanism is here guaranteed by business transaction ).

If the handler succeeds ( No business error was raised ), The event is saved in the event Repository , And update the cache with the latest aggregation instance .

then , The view handler is triggered to update its corresponding view . It's like in a normal release - It's the same in subscription mode , A view can only subscribe to the events it is interested in . Maybe in our case , View 2 It's the only way to FlightCreated Events of interest .

2. Summary update

The second case is an update to an existing aggregation . On receiving UpdateFlight On command , The command handler will ask the repository to return the latest aggregate instance ( If any ).

If the instance is cached , There is no need to interact with the event store . otherwise , The repository will trigger the so-called rehydration process ( Add the aggregation to the cache ).

This procedure is a way to calculate the current state of an aggregate instance based on the stored sequence of events . Every event retrieved in the event store ( such as FlightCreated,DepartureUpdated and ArrivalUpdated) Will be published in the event bus . from FlightCreated The first domain handler triggered instantiates a new aggregation ( Create a new object instance in memory based on the information from the event itself ). Then other domain handlers ( from DepartureUpdated and ArrivalUpdated Events trigger ) The newly created aggregation instance will be updated . Last , We can calculate state based on stored events .

Once the state is calculated , The object instance is put into the cache and returned to the command handler . then , The rest of the process is the same as in the collection creation scenario .

One more thing to add about rehydration . If the aggregation is not cached , And we store... For a specific aggregation instance 1000 What happens to an event ? obviously , Computing state takes a long time . Snapshot mode is a mitigation measure .

We can decide that will be based on every n The current aggregation state of events is calculated and saved as a snapshot . The snapshot will also contain the location in the event store . then , The rehydration process will simply start with the latest snapshot and will continue to play from the designated location . You can also create snapshots based on other policy types ( If the rehydration time exceeds a certain threshold, etc ).

3. How to deal with Events ?

I want to review the difference between command and event . First , It's worth distinguishing between internal and external events . External events are generated by another application , And internal events are generated by our application ( Based on external commands ).

We had an interesting debate about how to technically manage external events that reach applications . I mean a real event , It's what happened in the past ( Like radar tracks ).

There are really two possible ways :

The first way is to think of events as commands . That means we have to go through the command handler first , Verify invariants , And then generate internal events .

The second way is to bypass the command handler and store the event directly in the event store . After all , If we're talking about a real event , Confirming invariants is actually useless . however , It's still important to check the event syntax to make sure we don't contaminate the event store .

If we take the second approach , It can be interesting to implement the rules in the overall rehydration process .

Let's take an example of radar tracking a route . In the case that the producer cannot guarantee the order of messages , We can also persist a timestamp ( Generated by producers ) And calculate the state in this way :

if event.date > latestEventDate {
  // Compute the state
  latestEventDate = event.date
} else {
  // Discard the event
}
<p>

This rule will ensure that the state is calculated only based on recent events . This means that persisting an event does not necessarily mean that it will affect the current state .

In the first way , This rule will be executed before the event is persisted .

4. Event model

Is it necessary to create an object model specifically for the events stored in the event store ( An event corresponds to a class , Corresponding to a data table )? in my opinion , The answer is No ( At least most of the time ).

First , Because we may want to persist different versions of events at different times . under these circumstances , We have to implement the strategy , Mapping events from one model version to another .

I want to use a specific example to illustrate the benefits . Let's consider an application that receives data from the system A And system B Events . Both systems publish flight events based on their own data models . If we create a common data model C( Data table of events ), We need to be able to A Convert to C, take B Convert to C. However , At some stage of the project , We are only right A and B I'm interested in . It means C It's just A and B Subset .

But if we need to improve and manage the application in the future A and B Other elements in ? Because these events are using C Format saved , So these elements are lost directly . On the other hand , If we decide to last A and B Format , We can simply make some improvements to the command handler to manage these elements .

5. Final consistency

Theoretical definition :

The ultimate consistency is CQRS It brings a concept ( Most of the time ). It's important to understand the consequences and the impact .

First , We need to talk about the different levels of consistency .

The ultimate consistency is that we can ensure that the data will be replicated ( From writing to CQRS The read part of the application ) Model of . The problem is that we can't guarantee time exactly . This time will be affected by factors such as overall throughput , Network delay and other factors . This is the weakest form of consistency , But it provides the lowest latency .

stay CQRS The final consistency of the application means that at some point the write part may not be synchronized with the read part .

contrary , We found a strong consistency model . Unless we use the same database to manage read and write , Or we sell our souls to the devil by using two-phase commit (banq notes : Succumb to 2PC And other existing solutions ), Otherwise, we should not implement this level of consistency in Distributed Systems .

If we had two different databases , Nearest 2PC The implementation of a transaction is to manage everything in a single thread ( Data modification ). The thread is responsible for saving the data in the write database and read database . A thread can also be dedicated to an aggregation instance , And manage incoming commands in order . however , If a transient error occurs while synchronizing the view , What will be the impact ? Whether we need to compensate or go back CQRS Other views and write parts of the application ? Do we still need to implement an error retry loop ? Whether we need to implement the pause command through the circuit breaker mode , Let the handler stop the new incoming event ? It's very important to deal with transient errors that occur ( Anything that can go wrong will go wrong ).

In two consistency models ( Ultimately and strongly consistent ) Between , We can find many different models : Causal consistency , Sequential consistency, etc . for example , The client is monotonous monotonic The consistency model only guarantees that each session ( Application or service instance ) With strong consistency . therefore , The implementation of CQRS Applications are not just the two extreme choices between ultimate and strong consistency , There's a grey choice in the middle .

My opinion is as follows : Because we can't guarantee strong consistency , We accept the ultimate consistency as much as possible . however , The premise is to accurately understand the impact on other parts of the system .

Case study

Let's take a look at a specific example I came across in the project .

One of the challenges is managing a unique identifier for each flight . We have to deal with external systems ( Outside the company ) Events , Our unique identifier is not used in these events . For a runway , An identifier is a combination ( By leaving the airport + Departure time + Aircraft identifier + Arrive at the airport ), The other runway sends a unique identifier for each flight ( But I don't know which first runway ). The goal is to manage our own unique identifiers ( Called a globally unique identifier GUFI), And make sure that every event corresponds to the right GUFI.

The simplest solution is to make sure that each incoming event is looked up in a specific view of our application to correlate the corresponding GUFI. But what if this view is ultimately consistent ? In the worst case , We may encounter events related to the same flight , But different GUFI Identifier storage ( Believe me, it's a problem ).

One solution might be to GUFI Delegate management of to another service , It's very consistent .

Greg Young stay Q/A Another solution was offered during the meeting . We can implement a N A cache of last Events . If the view doesn't contain the data we're looking for , We have to check this cache , To make sure it hasn't been received before .n The bigger it is , The greater the chance of alleviating this inconsistency window between the write and read sites .

This cache can use Hazelcast,Redis And so on , It can also be a local buffer for an application instance . In the latter case , We may have to implement a fragmentation mechanism , So that events related to the same object are always distributed to the same application instance , For example, using hash functions ( It's better to have consistent hash functions to easily scale out ).

[ The quilt admin On 2018-06-12 17:52 A modified ]

版权声明
本文为[Jiedao jdon]所创,转载请带上原文链接,感谢
https://chowdera.com/2021/05/20210504144409614p.html

随机推荐