当前位置:网站首页>Use cgroups to limit the memory usage of mongodb

Use cgroups to limit the memory usage of mongodb

2021-10-22 10:36:29 Bird's nest

cgroups, Its name comes from the control group (control groups) Abbreviation , yes Linux A function of the kernel , Used to restrict , Control and separate the resources of a process group ( Such as CPU、 Memory 、 Disk input and output, etc ).

This project was first built by Google The engineer of is in 2006 Year launch ( Mainly Paul Menage and Rohit Seth), The earliest name was process container (process containers). stay 2007 in , Because in Linux The kernel , Containers (container) This noun has many different meanings , To avoid confusion , It was renamed cgroup, And is incorporated into 2.6.24 Version of the kernel . Since then, , Added a lot of functions .

send ​​​ use ​​​ cgroup, system ​​​ system ​​​ tube ​​​ The reason is ​​​ member ​​​ can ​​​ more ​​​ have ​​​ body ​​​ The earth ​​​ control ​​​ system ​​​ Yes ​​​ system ​​​ system ​​​ information ​​​ Source ​​​ Of ​​​ branch ​​​ with ​​​、​​​ optimal ​​​ First ​​​ along ​​​ order ​​​、​​​ Refuse to ​​​ most ​​​、​​​ tube ​​​ The reason is ​​​ and ​​​ prison ​​​ control ​​​.​​​ can ​​​ more ​​​ good ​​​ The earth ​​​ root ​​​ According to the ​​​ ren ​​​ service ​​​ and ​​​ use ​​​ Household ​​​ branch ​​​ with ​​​ hard ​​​ Pieces of ​​​ information ​​​ Source ​​​, carry ​​​ high ​​​ total ​​​ body ​​​ effect ​​​ rate ​​​.
In practice , System administrators generally use cgroup Do the following :

  • Isolate a process group ( such as :nginx All the processes of ), And limit the resources they consume , Such as binding CPU The core of .
  • For this group of processes Allocate enough memory to use
  • Allocate corresponding network bandwidth and disk storage limits for this group of processes
  • Restrict access to certain devices ( By setting the white list of devices )

cgroups Relevant concepts

  1. Mission (task). stay cgroups in , A task is a process of the system .
  2. Control groups (control group). A control group is a group of processes divided according to a certain standard .cgroups The resource control in is realized in the unit of control group . A process can join a control group , Also migrate from one process group to another control group . Processes in a process group can use cgroups Resources allocated in control groups , At the same time cgroups Limits set in control groups .
  3. Hierarchy (hierarchy). Control groups can be organized into hierarchical In the form of , A control group tree . The child node control group in the control group tree is the child of the parent node control group , Inherit the specific attributes of the parent control group .
  4. Subsystem (subsystem). A subsystem is a resource controller , such as cpu Subsystem is control cpu A controller for time allocation . Subsystem must be attached (attach) It works at a level , After a subsystem is attached to a certain level , All control groups at this level are controlled by this subsystem .

Current cgroup There are some rules :
1. Every time you create a new level in the system , All tasks in the system are default at that level cgroup( We call it root cgroup , this cgroup Automatically create when creating a hierarchy , Created later in this hierarchy cgroup This is all about it cgroup The offspring of ) The initial members of .
2. A subsystem can only be attached to one level at most . ( A hierarchy does not have two identical subsystems attached )
3. Multiple subsystems can be attached to a hierarchy
4. A task can be more than one cgroup Members of , But these cgroup It has to be at different levels .
5. Processes in the system ( Mission ) Create child process ( Mission ) when , The subtask automatically becomes its parent process cgroup Members of . The subtask can then be moved to a different cgroup in , But at first it always inherited its father's task cgroup.

cgroup Subsystem

cgroups A subsystem is defined for each resource that can be controlled . Typical subsystems are described as follows :

  1. cpu Subsystem , Mainly restricting the progress of cpu Usage rate .
  2. cpuacct Subsystem , You can count cgroups In the process of cpu Use report .
  3. cpuset Subsystem , It can be for cgroups The processes in are assigned separate cpu Node or memory node .
  4. memory Subsystem , Can limit the process memory Usage quantity .
  5. blkio Subsystem , Block devices that can restrict processes io.
  6. devices Subsystem , Can control the process to access certain devices .
  7. net_cls Subsystem , Can mark cgroups Network packets for processes in , And then you can use tc modular (traffic control) Control packets .
  8. freezer Subsystem , You can suspend or resume cgroups Process in .
  9. ns Subsystem , Can make a difference cgroups The following processes use different namespace.

cgroups install

If the system is not installed yet cgroups, You can install it with the following command

       
       
       
1
       
       
       
yum install libcgroup

Start and view service status :

       
       
       
1
2
       
       
       
service cgconfig start
service cgconfig status

Linux hold cgroups Implemented as a file system , The mount point of each subsystem is configured in /etc/cgconfig.conf In file :

       
       
       
1
2
3
4
5
6
7
8
9
10
       
       
       
mount {
cpuset = /cgroup/cpuset;
cpu = /cgroup/cpu;
cpuacct = /cgroup/cpuacct;
memory = /cgroup/memory;
devices = /cgroup/devices;
freezer = /cgroup/freezer;
net_cls = /cgroup/net_cls;
blkio = /cgroup/blkio;
}

Or by order lssubsys -m perhaps mount -t cgroup mount .

       
       
       
1
2
3
4
5
6
7
8
9
       
       
       
# lssubsys -m
cpuset /cgroup/cpuset
cpu /cgroup/cpu
cpuacct /cgroup/cpuacct
memory /cgroup/memory
devices /cgroup/devices
freezer /cgroup/freezer
net_cls /cgroup/net_cls
blkio /cgroup/blkio

Or you can mount several subsystems separately :

       
       
       
1
       
       
       
mount -t cgroup -o remount,cpu,cpuset,memory cpu_and_mem /cgroup/cpu_and_mem

cgroups Use

Mount a certain cgroups After the subsystem reaches the mount point , You can create a folder under the mount point or use cgcreate Command to create cgroups Nodes in a hierarchy . For example, by order cgcreate -g cpu:test You can go to cpu Under the sub-system, build a system called test The node of . The results are shown below :

       
       
       
1
2
3
4
5
       
       
       
# cgcreate -g cpu:test
# ls /cgroup/cpu
cgroup.event_control cpu.cfs_quota_us cpu.shares release_agent
cgroup.procs cpu.rt_period_us cpu.stat tasks
cpu.cfs_period_us cpu.rt_runtime_us notify_on_release test

You can then write the required value to test The different files below , To configure resources that need to be limited . There are many different configurations under each subsystem , The parameters that need to be configured are different , Detailed parameter settings need to refer to cgroups manual . Use cgset Commands can also be set cgroups The parameters of the subsystem , The format is cgset -r parameter=value path_to_cgroup.
such as :cgset -r cfs_quota_us=50000 test Restrict process groups test Use 50% Of CPU.
Or write a file directly :

       
       
       
1
       
       
       
echo 50000 > /cgroup/cpu/test/cpu.cfs_quota_us

The command can refer to redhat Documents : Setting Parameters

When you need to delete a cgroups Node time , have access to cgdelete command , For example, to delete the above test node , have access to cgdelete -r cpu:test Command to delete .

Add the process to cgroups There are also many ways for child nodes , You can directly pid Write to... Below the child node task In file . It can also be done through cgclassify Add process , The format is cgclassify -g subsystems:path_to_cgroup pidlist, It can also be used directly cgexec In a certain cgroups Start the process under , The format is cgexec -g subsystems:path_to_cgroup command arguments.

It can also be in /etc/cgconfig.conf The document defines group, The format is as follows :

       
       
       
1
2
3
4
5
6
7
8
       
       
       
group <name> {
[ <permissions>]
<controller> {
<param name> = <param value>;
}
}

such as :

       
       
       
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
       
       
       
mount {
cpuset = /cgroup/cpuset;
cpu = /cgroup/cpu;
cpuacct = /cgroup/cpuacct;
memory = /cgroup/memory;
devices = /cgroup/devices;
freezer = /cgroup/freezer;
net_cls = /cgroup/net_cls;
blkio = /cgroup/blkio;
}
group mysql_g1 {
cpu {
cpu. cfs_quota_us = 50000;
cpu. cfs_period_us = 100000;
}
cpuset {
cpuset. cpus = "3";
cpuset. mems = "0";
}
cpuacct{
}
memory {
memory. limit_in_bytes= 104857600;
memory. swappiness= 0;
# memory.max_usage_in_bytes=104857600;
# memory.oom_control=0;
}
blkio {
blkio.throttle. read_bps_device= "8:0 524288";
blkio.throttle. write_bps_device= "8:0 524288";
}
}

You can also make a service Service Join the process group when starting , Please refer to : Starting_a_Service

Redhat The document introduces in detail cgroups How to configure and use , It is a good reference .

practice , Limit MongoDB Memory usage

MongoDB Is a big memory eater , It will use the memory of the server as much as possible . When there's a huge amount of data , Memory will soon be consumed , As a result, other processes on the server cannot allocate memory .
We can use cgroups To limit MongoDB Memory usage . actually , In the reference document 2 in Vadim Tkachenko He introduced his practical method .

There are several steps to configure :

  1. Create a control group :cgcreate -g memory:DBLimitedGroup
  2. Specify the maximum memory available 16G: echo 16G > /sys/fs/cgroup/memory/DBLimitedGroup/memory.limit_in_bytes
  3. Drop cached pages (flush and drop): sync; echo 3 > /proc/sys/vm/drop_caches
  4. take mongodb Join the control group :cgclassify -g memory:DBLimitedGrouppid of mongod

Basically completed the task , So this MongoDB You can use at most 16G Of memory .
In order to deal with the problem that the machine has to be restarted, it has to be added manually , You can follow the above document to Mongo The service is added to the control group .

besides , The author also mentioned dirty cache flush The problem of , Notice two parameters :/proc/sys/vm/dirty_background_ratio and /proc/sys/vm/dirty_ratio.

Here is an introduction to adjusting disk buffer parameters :
1) /proc/sys/vm/dirty_ratio
This parameter controls the size of the file system write buffer of the file system , The unit is percentage , Represents the percentage of system memory , Indicates how much memory the write buffer uses in the system , Start writing data to disk . Increasing it will use more system memory for disk write buffer , It can also greatly improve the write performance of the system . however , When you need to continue 、 Constant write occasion , It should be reduced ,:
echo '1' > /proc/sys/vm/dirty_ratio

2) /proc/sys/vm/dirty_background_ratio
This parameter controls the pdflush process , When to refresh the disk . The unit is percentage , Represents the percentage of system memory , When the write buffer is used in the system memory ,pdflush Start writing data to disk . Increasing it will use more system memory for disk write buffer , It can also greatly improve the write performance of the system . however , When you need to continue 、 Constant write occasion , It should be reduced ,:

echo '1' > /proc/sys/vm/dirty_background_ratio

3) /proc/sys/vm/dirty_writeback_centisecs
This parameter controls the dirty data refresh process of the kernel pdflush Running interval of . The unit is 1/100 second . The default value is 500, That is to say 5 second . If your system is constantly writing actions , In fact, it's better to reduce this value , In this way, the peak write operation can be flattened into multiple write operations . The setting method is as follows :

echo "100" > /proc/sys/vm/dirty_writeback_centisecs
If your system is a short-term spike in write operations , And the write data is not big ( Dozens of M/ Time ) And there is more memory , Then you should increase this value :

echo "1000" > /proc/sys/vm/dirty_writeback_centisecs

4) /proc/sys/vm/dirty_expire_centisecs
This parameter declares Linux There is too much data in the kernel write buffer “ used ” After that ,pdflush The process starts to think about writing to disk . The unit is 1/100 second . The default is 30000, That is to say 30 Second data is old , The disk will be refreshed . For particularly overloaded write operations , It is also good to reduce this value appropriately , But it can't shrink too much , Because shrinking too much can also lead to IO Improving too fast .

echo "100" > /proc/sys/vm/dirty_expire_centisecs
Of course , If your system memory is large , And the write mode is intermittent , And the data written each time is not big ( For example, dozens of M), Then it's better to have a larger value .

5) /proc/sys/vm/vfs_cache_pressure
This file indicates that the kernel recycle is used for directory and inode cache The tendency of memory ; The default value 100 Indicates that the kernel will be based on pagecache and swapcache, hold directory and inode cache Keep it at a reasonable percentage ; Reduce the value below 100, Will cause the kernel to tend to keep directory and inode cache; Increase this value by more than 100, Will cause the kernel to tend to recycle directory and inode cache

Default settings :100

6) /proc/sys/vm/min_free_kbytes
The document indicates compulsory Linux VM The minimum amount of free memory to keep (Kbytes).
Default settings :724(512M Physical memory )

7) /proc/sys/vm/nr_pdflush_threads
This file represents the currently running pdflush Number of processes , stay I/O Under high load , The kernel will automatically add more pdflush process .
Default settings :2( read-only )

8) /proc/sys/vm/overcommit_memory
This file specifies the kernel's strategy for memory allocation , The value could be 0、1、2.
0, Indicates that the kernel will check if there is enough memory available for the process to use ; If there is enough memory available , Memory request allows ; otherwise , Memory request failed , And return the error to the application process .
1, Indicates that the kernel is allowed to allocate all physical memory , Regardless of the current memory state .
2, Indicates that the kernel is allowed to allocate more memory than the sum of all physical memory and swap space ( reference overcommit_ratio).

Default settings :0

9) /proc/sys/vm/overcommit_ratio
The document says , If overcommit_memory=2, Percentage of memory that can be overloaded , Calculate the overall available memory of the system through the following formula .
System allocable memory = Swap space + Physical memory *overcommit_ratio/100

10) /proc/sys/vm/page-cluster
The file represents writing once to swap The number of pages written in the area ,0 Express 1 page ,1 Express 2 page ,2 Express 4 page .
Default settings :3(2 Of 3 Power ,8 page )

11) /proc/sys/vm/swapiness
This file indicates the extent to which the system performs exchange behavior , The number (0-100) The higher the , The more likely disk swapping is .

Reference documents

  1. https://www.kernel.org/doc/Documentation/cgroups/cgroups.txt
  2. Using Cgroups to Limit MySQL and MongoDB memory usage
  3. Red Hat Enterprise Linux 6 Resource Management Guide
  4. cgroups Introduction and detailed explanation of installation, configuration and use
  5. Meituan Linux Resource management cgroups brief introduction
  6. Docker Basic technology :Linux CGroup
  7. linux Cluster -- Set disk buffer parameters
  8. http://centaurea.io/blog?name=mongodb-memory-allocation-and-cache-management

版权声明
本文为[Bird's nest]所创,转载请带上原文链接,感谢
https://chowdera.com/2021/10/20211009000611577e.html

随机推荐