当前位置:网站首页>Your random IO hard disk

Your random IO hard disk

2020-11-08 16:12:40 Zhang Yanfei Allen

We all know that hard drives are random IO Very slowly , But compared to the order IO How much slower , I don't know if you've ever had a direct digital comparison . Today, I'm going to do the actual pressure test to compare the order of the disks IO And random IO Performance data performance in different scenarios . With today's experimental data , You will have a deep understanding of why database transactions are implemented in the form of logs , Why do you want to use larger nodes in the index B+ Trees .

For any storage system , Performance is nothing more than bandwidth 、 Delay or delay IOPS. My test machine's hard disk configuration is a by 7 block 300G It is made up of ten thousand rotating mechanical disks RAID5, The pressure measuring tool is used fio, During pressure measurement , We fix a few parameters :

  • IO Engine we choose libaio
  • In order to avoid operating system management PageCache Memory interference with test results , Use direct Parameters bypass
  • open unified_rw_reporting, Let the results show read and write respectively
  • To ensure that the test is relatively accurate , We set the runtime to 300s
  • Due to server sensitivity , No bare equipment is selected for pressure test object , The files used , There's a little bit of file system overhead
  • The test file size is defined as 100G, my RAID The card cache is 1G, The goal is to keep it from hitting too much
  • Scheduling policy we choose the most commonly used noop
  • open refill_buffers, Every time I/O Rebuild the test file data fragment after submission , Guarantee randomness
  • according to RAID Use configuration suggestions , Turn off the disk cache

Then, we adjust the other parameters dynamically , And then do a number of comparative tests

  • In reading and writing mode , Use sequential and random reading to verify separately
  • disk IO Unit we use integer multiples of sectors ,512 1K 2K ...
  • RAID Card pre reading strategy , Set separately NORA( Don't turn on preview ) and RA( Open preview ) To test independently

Sequential read test

Let's look at the sequential reading case first , The bandwidth performance of the disk array , See the picture 1:

 chart 1  Bandwidth performance

You can see , When IO size When I was younger , Even if it's sequential, continuous IO request , Bandwidth is not awesome , Only less than 20MB/s. With IO size When it's added , Bandwidth is coming up , The maximum can reach 1.2GB many .

Now, let's take a look at NORA Under the circumstances , stay 128K Add to 256K When , Bandwidth has suddenly increased a lot , Why is this ? The secret is mine RAID The stripe size in the array is 128K, When IO size by 256K When , It's only the disk array that really works in parallel .IO size When I was a child , It doesn't take advantage of multiple sets .

/opt/MegaRAID/MegaCli/MegaCli64 -LDInfo -Lall -aALL
......
Strip Size          : 128 KB

The other is the order IO The situation of ,RA Prefetching can also play a role , stay IO size stay 64k You can reach 1.2GB The bandwidth of the .

Let's look at the delay , See the picture 2:

 chart 2  Delay performance

The units in our graph are microseconds -us, stay 《 Let's talk about disk partitioning 》 in , I have theoretically estimated disk time , Disk time consumption is mainly in two places :

  • Seek time :3-15ms, This time can be optimized by rational zoning
  • Rotation delay : The delay is about 0-6ms

Why in the picture 2 In the experimental results , The delay is very low , stay IO size by 512 When , The average is only 30us about ? In fact, the order is IO Under the circumstances ,RAID Card cache hit rate is high , In fact, most of the read requests don't penetrate into the mechanical axis of the disk .

Let's see IOPS, See the picture 3:

 chart 2 IOPS performance

stay IO request size Just for 1 When it's a sector size , Disk array IOPS The highest performance , Reached 3W Many times per second . When IO size When it's added ,IOPS In a gradual decline , But this time , In fact, the throughput of the disk is increasing .

Put it together , Disk array in order IO In the case of the performance is still very good , There are three reasons :

  • The order IO Under the circumstances ,RAID The card has a high hit rate , Especially when it's set up RAID Prefetch
  • The order of the single dish itself IO It's also the most comfortable state of disk work , Because it saves the seek delay
  • When IO exceed RAID When it's the size of a bar ,IO It will be distributed to multiple disks for parallel processing

Random read test

When we use disks as developers , It may not be guaranteed that it will always work in the most comfortable state , Sometimes it may have to be visited randomly . So let's try my disk array in random conditions today , about fio For tools, you just need to set rw Parameter is randread As well as . however IO size I only tested 128 It stops , Because the bigger it is, the more like the order IO 了 .

Let's start with bandwidth , See the picture 4:

 chart 4  Bandwidth performance

The mechanical hard disk is even made up of RAID array , And there's caching , It seems to be random IO There's nothing to do . At random IO Under the circumstances , Bandwidth throughput is terrible , stay IO size When I was younger , It's only a few tenths of a second .

Let's look at the delay again , See the picture 5:

 chart 5  Delay performance

In random cases, the delay is basically 5ms about , This is in line with our previous theoretical calculation . Random access leads to more requests actually penetrating the mechanical axis .

Look again. IOPS, This indicator is also very poor , That is to say 200 Or so! . This data and graph 5 The delay of the formation of echo , Processing a request 5ms about , that 1 Second is not only to deal with 200 About times . So hard disk manufacturers give you a hair every day , Talking about his disk IOPS It can reach tens of thousands of . But they never talk about randomness IO Under the circumstances , In fact, the only special thing is 200.

 chart 6 IOPS performance

You see my ten thousand turn mechanical hard disk composition RAID5 array , In the case of the best sequence conditions , Bandwidth can reach 1GB/s above , The average delay is also very low , Lowest only 20 many us. But at random IO Under the circumstances , The short board of the mechanical hard disk is fully exposed , A few tenths of a megabyte of bandwidth , nearly 5ms Delay of ,IOPS Only 200 about . The reason is that

  • Random access makes RAID The card cache became a device
  • Disks can't work in parallel , Because of my machine RAID Width Strip Size by 128 KB
  • Mechanical shafts also have to jump and fro between tracks .

Understand the disk order IO Dozens of times M Even one. GB The bandwidth of the , Random IO This is really pathetic .

Conclusion

From the above test data, we can see that the mechanical hard disk is in order IO And random IO The huge performance difference under . In order IO Under the circumstances , Disk is the best order IO, Plus Raid Card cache hit rate is also high . At this time, the bandwidth performance has dozens of 、 A few hundred M, Under the best conditions, it can even reach 1GB.IOPS There can be 2-3W about . At random IO Under the situation of , Mechanical axis is forced to jump to find out ,RAID The card cache has also failed . Bandwidth has dropped to 1MB following , Lowest only 100K,IOPS It's just pathetic 200 about .

If you really understand the data from the above experiments , Can understand a lot of things in engineering practice .

Copy folder : We all know , When copying a folder , If this folder contains a lot of heap files , It's very slow to copy . The reason is that the rate of mechanical hard disk is random IO. How to improve the replication speed ? It's simple , It's just to bag them first . After packing, the folder becomes a big file , If you copy it at this time , Disk is the best order of execution IO 了 , So it's going to be a lot faster .

Database transactions : All databases are implementing transactions , All must ensure that the write data is successfully dropped before returning . But why do they almost all return success when they are put into their own transaction log files , Instead of writing directly to a data table file . The reason behind this is disk read and write performance issues , Transactions only need to ensure that the data landing is successful , As for where to write it doesn't matter . If you write to a data file, the probability becomes random IO 了 . If you write to a log file , It's just the order IO, Performance is the ultimate .

Mysql Of B+ Trees : You can see in the above data , No matter the order IO Or random IO, Just add it every time IO The unit of , Performance will rise . Understand this , You can really understand why Mysql Is to use B+ Trees are indexes , Instead of using other trees ( For example, a binary tree ). because B+ The nodes of the tree are bigger ,IO Getting up makes the disk work more comfortable .

Finally, I would like to share a 5 My practical performance optimization case in engineering years ago . We took over a system , With millions of users imei, To Mysql To query another string of users id(clientid) data . The implementation of pre development is traditional batch Mysql Statement query . In this way , Not to mention the network many times RTT Time consuming , speak only one point Mysql Inquire about , Even if there is an index, a lot of randomization is needed IO, Because the user imei It's randomly distributed . The optimization I used was also very simple , Put... Directly Mysql The user table passes the order of the whole user table at one time IO The way to read it out ,load Into memory . Use... In memory HashTable Organize , adopt Hash For quick query . In the end, the time-consuming optimization was lost 90% above .


file


Development of hard disk album of internal training :


My official account is 「 Develop internal skill and practice 」, I'm not just talking about technical theory here , It's not just about practical experience . It's about combining theory with practice , Deepen the understanding of theory with practice 、 Use theory to improve your technical practice ability . Welcome to my official account , Please also share with your friends ~~~

版权声明
本文为[Zhang Yanfei Allen]所创,转载请带上原文链接,感谢