当前位置:网站首页>In depth understanding of mongodb's most comprehensive enhanced version 4.4 new features

In depth understanding of mongodb's most comprehensive enhanced version 4.4 new features

2020-12-07 19:20:01 Aliyun yunqi

MongoDB This year, a new 4.4 The big version , This release contains many enhancements Feature, It can be called a maintenance version , And it's a maintenance version that users have been looking forward to ,MongoDB Officials also call this release 「User-Driven Engineering」, The new version is mainly aimed at the user voice of some pain points , Focus on improvement .

9989c5b90f96dedb20d3e717592eeed2c54bdb86.jpeg

And Alibaba cloud as MongoDB Official global strategic partners , Will be the exclusive online network 4.4 The new version , Next, Alibaba cloud MongoDB Some users pay more attention to the team Feature , In depth interpretation .

Increased availability and fault tolerance

Mirrored Reads

In the service of alicloud MongoDB In the customer's process , I have observed that there are many customers who buy a three node replica set , But actually in the process of using, reading and writing are all in Primary node , One of the visible Secondary It doesn't carry any read traffic .

So after the occasional downtime switch , Customers can obviously feel the jitter of service access delay , It will take a while to return to the previous level , The reason for the jitter is , The newly elected master library has never provided a read service before , Do not understand the access characteristics of the business , There is no targeted caching of data , So after suddenly providing services , There will be a lot of 「Cache Miss」, Data needs to be reloaded from disk , Causing access delays to rise . In the case of large memory instances , The problem is more obvious .

stay 4.4 in ,MongoDB In view of the above problems, we have realized 「Mirrored Reads」 function , namely , The main library will copy the read traffic to the standby database in a certain proportion , To help the standby database warm up the cache . This execution is a 「Fire and Forgot」 act , There will be no substantial impact on the performance of the master library , However, the load of the standby database will increase to a certain extent .

The proportion of traffic replication is dynamically configurable , adopt mirrorReads Parameter setting , Default copy 1% Of traffic .

9989c5b90f96dedb20d3e717592eeed2c54bdb86.jpeg

Besides , Can pass db.serverStatus( { mirroredReads: 1 } ) Check it out. Mirrored Reads Relevant statistics ,
9989c5b90f96dedb20d3e717592eeed2c54bdb86.jpeg

Resumable Initial Sync

stay 4.4 In the previous version , If the standby database is doing full synchronization , The network jitter causes the connection to flash off , Then the standby database needs to start full synchronization again , All the previous work was wasted , When the amount of data is large , such as TB Level , It's more debilitating .

And in the 4.4 in ,MongoDB Provides , When the total synchronization is interrupted due to network abnormality , The ability to recover full synchronization from an interrupted location . After trying to recover for a while , If it still doesn't work , Then a new synchronization source will be selected for new full synchronization . The default timeout for this attempt is 24 Hours , Can pass replication.initialSyncTransientErrorRetryPeriodSeconds Change when the process starts .

It should be noted that , For the interrupt caused by non network exception encountered in the whole synchronization process , Still need to restart full synchronization .

Time-Based Oplog Retention

We know ,MongoDB Medium Oplog The collection records all data change operations , Except for copying , It can also be used for incremental backup , Data migration , Data subscription and other scenarios , yes MongoDB Important infrastructure of data Ecology .

Oplog As a Capped Collection To achieve , Although 3.6 Start ,MongoDB Supported by replSetResizeOplog Command dynamic modification Oplog The size of the collection , But the size often does not accurately reflect the downstream of Oplog The need for incremental data , Consider the following scenario ,

• It's planned to be in the early hours of the morning 2 - 4 Point to some Secondary The node is shut down for maintenance , Upstream should be avoided Oplog Is cleared to trigger full synchronization .

• Downstream data subscription components may stop services due to some exceptions , But the slowest will be 3 Resume service within hours and continue incremental pull , The lack of incremental upstream should also be avoided .

therefore , In a real application scenario , Most of the time it is necessary to keep the latest period of time Oplog, How many Oplog It's often hard to determine .

stay 4.4 in ,MongoDB Support storage.oplogMinRetentionHours Parameters define the least reserved Oplog Duration , It can also be done through replSetResizeOplog Command to change the value online , as follows ,
9989c5b90f96dedb20d3e717592eeed2c54bdb86.jpeg

Scalability and performance enhancements

Hidden Indexes

Hidden Index It's alicloud MongoDB and MongoDB After the official strategic cooperation is reached, the joint construction of Feature. We all know that maintaining too many indexes in the database can lead to write performance degradation , But often the complexity of the business determines the operation and maintenance MongoDB Of the students dare not easily delete a potentially inefficient index , Worry about the service performance jitter caused by incorrect deletion , And the cost of index reconstruction is also very high .

Hidden Index It is to solve DBA The above difficulties faced by students , It supports the adoption of collMod Command to hide the existing index , Guarantee the following Query They don't use the index , After a period of observation , Make sure there are no exceptions to the business , You can safely delete the index .
9989c5b90f96dedb20d3e717592eeed2c54bdb86.jpeg

It should be noted that , After the index is hidden, it's just for MongoDB The execution planner for is not visible , It doesn't change some special behavior of the index itself , For example, one key constraint ,TTL Elimination, etc .

Index is hidden , If a new write , It will also be updated , So you can also unhide , It's very convenient to make the index available immediately .

Refinable Shard Keys

When using MongoDB When the cluster is partitioned , I believe everyone knows to choose a good one Shard key How important , Because it determines the partition cluster in the specified Workload Is there good scalability . But in practice MongoDB In the process of , Even if we think carefully about what to choose Shard Key, Because of Workload A change in the appearance of Jumbo Chunk, Or the business flow is to single Shard The situation of .

stay 4.0 And previous versions , Set selected Shard Key And the corresponding Value It can't be changed , stay 4.2 edition , Although it can be modified Shard Key Of Value, But the data span Shard Migration and implementation mechanism based on distributed transaction result in high performance overhead , And it's not a complete solution Jumbo Chunk Or access to hot issues . such as , Now there's an order form ,Shard Key by {customer_id:1}, At the beginning of the business, each customer will not have many orders , In this way Shard Key It can meet the needs , But as the business grows , A large customer has accumulated more and more orders , Then the access to the customer order becomes a single Shard The hot , Due to orders and customer_id Natural connections , modify customer_id It doesn't improve uneven access .

For similar scenarios mentioned above , stay 4.4 in , You can go through refineCollectionShardKey Give orders to existing Shard Key Add one or more Suffix Field To improve existing documents in Chunk The problem of distribution on . such as , In the order business scenario described above , adopt refineCollectionShardKey The command Shard key Change to {customer_id:1, order_id:1}, To avoid single Shard Hot issues on access to .

What we need to know is ,refineCollectionShardKey Command performance overhead is very low , Just change Config Server Metadata on , There is no need for any form of data migration ( Because simply adding Suffix It doesn't change the data in the existing chunk The distribution on ), Data fragmentation is still normal in the follow-up Chunk Step by step in the process of automatic splitting and migration . Besides ,Shard Key There needs to be a corresponding Index To support , therefore refineCollectionShardKey Requires the creation of new Shard Key Corresponding Index.

Because not all documents have new ones Suffix Field(s), So in 4.4 In fact, it implicitly supports 「Missing Shard Key」 The function of , That is, the newly inserted document can not contain the specified Shard Key Field. however , I don't recommend it , Easy to produce Jumbo Chunk.

Compound Hashed Shard Keys

stay 4.4 In the previous version , Only single field hash slice key can be specified , The reason is that at this time MongoDB Composite hash index is not supported , The result of this is , It is easy to appear that the aggregate data is unevenly distributed in the partition .

And in the 4.4 Composite hash index is supported in , namely , You can specify a single hash field in a composite index , There's no limit to location , It can be used as a prefix , It can also be used as a suffix , It also provides support for composite hash slice keys ,

9989c5b90f96dedb20d3e717592eeed2c54bdb86.jpeg

With this new feature , It will bring a lot of benefits , For example, in the following two scenarios ,

• Because of the law , Need to use MongoDB Of zone sharding function , Scatter the data as evenly as possible over multiple slices of a region .

• The value of the slice key specified by the collection is incremented , For example, in the example above ,{customer_id:1, order_id:1} This slice key , If customer_id Is increasing , And businesses always access the latest customer data , The result is that most traffic always accesses a single fragment .

In the absence of 「 Compound hash slice key 」 In support of , Only the business can calculate the hash value in advance for the required fields , Store in a special field in the document , And then through 「 Range fragmentation 」 The method of specifying the special field and other fields of the calculated hash value as chip keys to solve the above problems .

And in the 4.4 The above problems can be easily solved by specifying the required fields as hash in , such as , For the second problem scenario described above , The slice key is set to {customer_id:'hashed', order_id:1} that will do , Greatly simplifies the complexity of business logic .

Hedged Reads

The increase of access delay may bring direct economic loss ,Google There's a research report that shows , If the loading time of the web page exceeds 3 second , The jump out rate of users will increase 50%. therefore , stay 4.4 in MongoDB Provides Hedged Reads The function of , That is, in the split cluster scenario ,mongos A read request is sent to two replica set members of a fragment at the same time , Then choose the fastest return result to reply to the client , To reduce the business of P95 and P99 Delay .

Hedged Reads The function is as Read Preference To provide a part of , So it can be in Operation In terms of granularity , When Read Preference Appoint nearest when , Enabled by default Hedged Reads function , When the specified as primary when , I won't support it Hedged Reads function , When specified as other , Need to display the specified hedgeOptions, as follows ,
9989c5b90f96dedb20d3e717592eeed2c54bdb86.jpeg

Besides ,Hedged Reads Also needed mongos Turn on support , To configure readHedgingMode Parameter is on, Default mongos Enable this function to support .

9989c5b90f96dedb20d3e717592eeed2c54bdb86.jpeg

Reduce replication latency

The delay of primary / secondary replication is related to MongoDB Reading and writing have a very big impact , One side , In some specific situations , Reading and writing need to wait , The backup database needs to be replicated in time and the incremental update of the primary database should be applied , Read and write to continue , On the other hand , Lower replication latency , It will also bring a better consistency experience when reading the standby database .

Streaming Replication

stay 4.4 In the previous version , Backup library obtains incremental update operation by polling the main library continuously . Every time you poll , The standby database sends a message to the primary database getMore Command to read on it Oplog aggregate , If there's data , Return a maximum 16MB Of Batch, If there is no data , The standby database will also pass awaitData Option to control the unnecessary of the standby database getMore expenses , At the same time, when there are new incremental updates , The first time to get the corresponding Oplog.

Pull is made up of a single OplogFetcher Thread to complete , Every Batch You need to go through a complete RTT, When the replica set network is not in good condition , The performance of replication is severely limited by network latency . therefore , stay 4.4 in , incremental Oplog It's continuous “ flow ” Standby , Instead of relying on the standby database to actively poll , Compared to the previous way , At least in the Oplog Half the cost of getting RTT.

When the user's write operation specifies “majority” writeConcern When , Write operations need to wait for enough backup libraries to return to the replication success confirmation ,MongoDB An internal test shows that , Under the new replication mechanism , In a high latency network environment , Can be raised on average 50% Of majority Write performance .

Another scenario is that the user uses Causal Consistency, In order to ensure that you can read your own write operations in the standby database (Read Your Write), It is also strongly dependent on the secondary database to the primary database Oplog Timely replication of .

Simultaneous Indexing

stay 4.4 In the previous version , Index creation needs to be done after the main database is completed , It will be copied to the standby database for execution . Create action on the standby database , In different versions , Because of the creation mechanism and creation method ( The front desk 、 backstage ) Different , For the standby database Oplog The impact of the application is also very different .

however , Even in 4.2 in , Unified the front and back index creation mechanism , Using a fairly fine-grained locking mechanism —— Exclusive locks on collections are only added at the beginning and end of index creation , Also because of the performance overhead of index creation itself (CPU、IO), Cause replication delay , Or because of some special operation , such as collMod Command to modify the set meta information , And lead to Oplog Application blocking for , Even because of the history of the main database Oplog To be covered and enter Recovering state .

stay 4.4 in , The index creation operations on the master database and the standby database are performed simultaneously , The main and standby delay caused by the above conditions can be greatly reduced , Try to make sure that even during index creation , The standby database can also access the latest data .

Besides , The new index creation mechanism is in majority After the data hosting node with voting permission returns successfully , The index will really work . therefore , It can also reduce the read-write separation scenario , Performance differences due to different indexes .

Enhanced query capability and ease of use

Traditional relational database (RDBMS) It is commonly used that SQL Language is the interface , Clients can write the complexity of integrating some business logic locally SQL sentence , To achieve powerful query capabilities .MongoDB As a new document database system , There are also customized MQL Language , The ability of complex query mainly relies on Aggregation Pipeline To achieve , Although weaker than RDBMS, But in recent major versions, it is also grinding continuously , The ultimate goal is to enable users to enjoy MongoDB Flexibility and scalability at the same time , You can also enjoy rich functionality .

Union

On the ability of multi table joint query ,4.4 Only one was provided before $lookup stage Used to implement something similar to SQL Medium 「left outer join」 function , stay 4.4 In the new $unionWith stage It also provides something similar to SQL Medium 「union all」 function , Users aggregate data from two sets into a result set , Then do the specified query and filter . The difference in $lookup stage Yes. ,$unionWith stage Support fragment set . When in Aggregate Pipeline There are several $unionWith stage When , You can aggregate multiple sets of data , Use as follows ,
9989c5b90f96dedb20d3e717592eeed2c54bdb86.jpeg

Can be in pipeline The parameter specifies a different stage, Used before aggregating set data , First, do some filtering , Very flexible to use , Here is a simple example , For example, business order data is split and stored in different sets by table , The second quarter has the following data ( The purpose of the demonstration is ),

9989c5b90f96dedb20d3e717592eeed2c54bdb86.jpeg

Now suppose the business needs to know , Sales of different products in the second quarter , stay 4.4 Before , The business may need to read all the data itself , Then we can do aggregation at the application level to solve this problem , Or rely on a data warehouse product for analysis , But there needs to be some kind of data synchronization mechanism .

And in the 4.4 You only need one of the following Aggregate Sentence can solve the problem ,
9989c5b90f96dedb20d3e717592eeed2c54bdb86.jpeg

Custom Aggregation Expressions

4.4 In previous versions, you can use find In the command $where operator perhaps MapReduce Function to achieve in Server The client performs custom JavaScript Script , In turn, it provides more complex query capabilities , But these two functions did not achieve and Aggregation Pipeline Unity in use .

therefore , stay 4.4 in ,MongoDB Two new Aggregation Pipeline Operator,$accumulator and $function To replace $where operator and MapReduce, With the help of 「Server Side JavaScript」 To implement custom Aggregation Expression, In this way, the function interface of complex query is concentrated on Aggregation Pipeline in , Improve the interface unity and user experience at the same time , Also can put the Aggregation Pipeline Its own execution model is used , To achieve the so-called 「1+1 > 2」 The effect of .

$accumulator and MapReduce Some of the functions are similar , Will pass first init Function defines an initial state , Then for each input document , According to the designation accumate Function update status , And then it's up to you to decide if you want to implement merge function , such as , If you use $accumulator operator, In the end, we need to do the finished results of different partitions merge, Last , If you specify finalize function , After all input documents have been processed , According to this function, the state is converted to a final output .

$function and $where operator Basically consistent in function , But what's powerful is that it can be combined with other Aggregation Pipeline Operator In combination with , In addition, it can be done in find With the aid of $expr operator To use $function operator, Equivalent to the previous $where operator,MongoDB The official document also recommends that $function operator.

Other ease of use enhancements

Some Other New Aggregation Operators and Expressions

In addition to the above $accumulator and $function operator,4.4 There are many other new ones in Aggregation Pipeline Operator, For example, string processing , Gets the , There are also operators to get the size of a document or binary string , See the following list for details ,

9989c5b90f96dedb20d3e717592eeed2c54bdb86.jpeg

Connection Monitoring and Pooling

4.4 Of Driver The behavior monitoring and custom configuration of the client connection pool are added in , standard-passing API To pool and subscribe to related events , Including the closing and opening of connections , Cleaning up the connection pool . It can also be done through API To configure some behavior of the connection pool , such as , Have the biggest / Minimum connections , Maximum free time per connection , The timeout for a thread to wait for an available connection , For details, please refer to MongoDB Official design documents .

Global Read and Write Concerns

stay 4.4 In the previous version , If the execution of the operation is not explicitly specified readConcern perhaps writeConcern, There will also be default behavior , such as readConcern The default is local, and writeConcern The default is {w: 1}. however , This default behavior cannot be changed , If the user wants all of them insert Operation of the writeConcern The default is yes {w: majority}, So only all access MongoDB To specify the value .

stay 4.4 Through setDefaultRWConcern Command to configure the global default readConcern and writeConcern, as follows ,
9989c5b90f96dedb20d3e717592eeed2c54bdb86.jpeg

It can also be done through getDefaultRWConcern Command to get the current default readConcern and writeConcern.
Besides , This time, MongoDB Make it more intimate , When recording slow logs or diagnostic logs , The current operation will be recorded readConcern perhaps writeConcern The source of the settings , There are three definitions of the same source ,

9989c5b90f96dedb20d3e717592eeed2c54bdb86.jpeg

about writeConcern Come on , There is also one source ,
9989c5b90f96dedb20d3e717592eeed2c54bdb86.jpeg

New MongoDB Shell (beta)

For operation and maintenance MongoDB For my classmates , The most used tool may be mongo shell,4.4 A new version of mongo shell, Added code highlights like , Command auto-complete , More readable error messages and other very human features , however , Currently still beta edition , Many commands don't support , Just for a taste .

9989c5b90f96dedb20d3e717592eeed2c54bdb86.jpeg

other

This time 4.4 Release , As mentioned above, it is mainly a maintenance version , So in addition to the above interpretation , There are many other small optimizations , image $indexStats Optimize ,TCP Fast Open Support the optimization of company building , Index deletion optimization and so on , There are also some relatively large enhancements , Like the new structured log LogV2, New security mechanism support, etc , These may not be the users' top priority , I will not describe them all here , Interested readers can refer to the official Release Notes.

Link to the original text
This article is the original content of Alibaba cloud , No reprint without permission .

版权声明
本文为[Aliyun yunqi]所创,转载请带上原文链接,感谢
https://chowdera.com/2020/11/20201126100059296c.html