Your random IO hard disk
2020-11-08 16:12:40 【Zhang Yanfei Allen】
We all know that hard drives are random IO Very slowly , But compared to the order IO How much slower , I don't know if you've ever had a direct digital comparison . Today, I'm going to do the actual pressure test to compare the order of the disks IO And random IO Performance data performance in different scenarios . With today's experimental data , You will have a deep understanding of why database transactions are implemented in the form of logs , Why do you want to use larger nodes in the index B+ Trees .
For any storage system , Performance is nothing more than bandwidth 、 Delay or delay IOPS. My test machine's hard disk configuration is a by 7 block 300G It is made up of ten thousand rotating mechanical disks RAID5, The pressure measuring tool is used fio, During pressure measurement , We fix a few parameters ：
- IO Engine we choose libaio
- In order to avoid operating system management PageCache Memory interference with test results , Use direct Parameters bypass
- open unified_rw_reporting, Let the results show read and write respectively
- To ensure that the test is relatively accurate , We set the runtime to 300s
- Due to server sensitivity , No bare equipment is selected for pressure test object , The files used , There's a little bit of file system overhead
- The test file size is defined as 100G, my RAID The card cache is 1G, The goal is to keep it from hitting too much
- Scheduling policy we choose the most commonly used noop
- open refill_buffers, Every time I/O Rebuild the test file data fragment after submission , Guarantee randomness
- according to RAID Use configuration suggestions , Turn off the disk cache
Then, we adjust the other parameters dynamically , And then do a number of comparative tests
- In reading and writing mode , Use sequential and random reading to verify separately
- disk IO Unit we use integer multiples of sectors ,512 1K 2K ...
- RAID Card pre reading strategy , Set separately NORA（ Don't turn on preview ） and RA（ Open preview ） To test independently
Sequential read test
Let's look at the sequential reading case first , The bandwidth performance of the disk array , See the picture 1：
You can see , When IO size When I was younger , Even if it's sequential, continuous IO request , Bandwidth is not awesome , Only less than 20MB/s. With IO size When it's added , Bandwidth is coming up , The maximum can reach 1.2GB many .
Now, let's take a look at NORA Under the circumstances , stay 128K Add to 256K When , Bandwidth has suddenly increased a lot , Why is this ？ The secret is mine RAID The stripe size in the array is 128K, When IO size by 256K When , It's only the disk array that really works in parallel .IO size When I was a child , It doesn't take advantage of multiple sets .
/opt/MegaRAID/MegaCli/MegaCli64 -LDInfo -Lall -aALL ...... Strip Size : 128 KB
The other is the order IO The situation of ,RA Prefetching can also play a role , stay IO size stay 64k You can reach 1.2GB The bandwidth of the .
Let's look at the delay , See the picture 2：
The units in our graph are microseconds -us, stay 《 Let's talk about disk partitioning 》 in , I have theoretically estimated disk time , Disk time consumption is mainly in two places ：
- Seek time ：3-15ms, This time can be optimized by rational zoning
- Rotation delay ： The delay is about 0-6ms
Why in the picture 2 In the experimental results , The delay is very low , stay IO size by 512 When , The average is only 30us about ？ In fact, the order is IO Under the circumstances ,RAID Card cache hit rate is high , In fact, most of the read requests don't penetrate into the mechanical axis of the disk .
Let's see IOPS, See the picture 3：
stay IO request size Just for 1 When it's a sector size , Disk array IOPS The highest performance , Reached 3W Many times per second . When IO size When it's added ,IOPS In a gradual decline , But this time , In fact, the throughput of the disk is increasing .
Put it together , Disk array in order IO In the case of the performance is still very good , There are three reasons ：
- The order IO Under the circumstances ,RAID The card has a high hit rate , Especially when it's set up RAID Prefetch
- The order of the single dish itself IO It's also the most comfortable state of disk work , Because it saves the seek delay
- When IO exceed RAID When it's the size of a bar ,IO It will be distributed to multiple disks for parallel processing
Random read test
When we use disks as developers , It may not be guaranteed that it will always work in the most comfortable state , Sometimes it may have to be visited randomly . So let's try my disk array in random conditions today , about fio For tools, you just need to set rw Parameter is randread As well as . however IO size I only tested 128 It stops , Because the bigger it is, the more like the order IO 了 .
Let's start with bandwidth , See the picture 4：
The mechanical hard disk is even made up of RAID array , And there's caching , It seems to be random IO There's nothing to do . At random IO Under the circumstances , Bandwidth throughput is terrible , stay IO size When I was younger , It's only a few tenths of a second .
Let's look at the delay again , See the picture 5：
In random cases, the delay is basically 5ms about , This is in line with our previous theoretical calculation . Random access leads to more requests actually penetrating the mechanical axis .
Look again. IOPS, This indicator is also very poor , That is to say 200 Or so! . This data and graph 5 The delay of the formation of echo , Processing a request 5ms about , that 1 Second is not only to deal with 200 About times . So hard disk manufacturers give you a hair every day , Talking about his disk IOPS It can reach tens of thousands of . But they never talk about randomness IO Under the circumstances , In fact, the only special thing is 200.
You see my ten thousand turn mechanical hard disk composition RAID5 array , In the case of the best sequence conditions , Bandwidth can reach 1GB/s above , The average delay is also very low , Lowest only 20 many us. But at random IO Under the circumstances , The short board of the mechanical hard disk is fully exposed , A few tenths of a megabyte of bandwidth , nearly 5ms Delay of ,IOPS Only 200 about . The reason is that
- Random access makes RAID The card cache became a device
- Disks can't work in parallel , Because of my machine RAID Width Strip Size by 128 KB
- Mechanical shafts also have to jump and fro between tracks .
Understand the disk order IO Dozens of times M Even one. GB The bandwidth of the , Random IO This is really pathetic .
From the above test data, we can see that the mechanical hard disk is in order IO And random IO The huge performance difference under . In order IO Under the circumstances , Disk is the best order IO, Plus Raid Card cache hit rate is also high . At this time, the bandwidth performance has dozens of 、 A few hundred M, Under the best conditions, it can even reach 1GB.IOPS There can be 2-3W about . At random IO Under the situation of , Mechanical axis is forced to jump to find out ,RAID The card cache has also failed . Bandwidth has dropped to 1MB following , Lowest only 100K,IOPS It's just pathetic 200 about .
If you really understand the data from the above experiments , Can understand a lot of things in engineering practice .
Copy folder ： We all know , When copying a folder , If this folder contains a lot of heap files , It's very slow to copy . The reason is that the rate of mechanical hard disk is random IO. How to improve the replication speed ？ It's simple , It's just to bag them first . After packing, the folder becomes a big file , If you copy it at this time , Disk is the best order of execution IO 了 , So it's going to be a lot faster .
Database transactions ： All databases are implementing transactions , All must ensure that the write data is successfully dropped before returning . But why do they almost all return success when they are put into their own transaction log files , Instead of writing directly to a data table file . The reason behind this is disk read and write performance issues , Transactions only need to ensure that the data landing is successful , As for where to write it doesn't matter . If you write to a data file, the probability becomes random IO 了 . If you write to a log file , It's just the order IO, Performance is the ultimate .
Mysql Of B+ Trees ： You can see in the above data , No matter the order IO Or random IO, Just add it every time IO The unit of , Performance will rise . Understand this , You can really understand why Mysql Is to use B+ Trees are indexes , Instead of using other trees （ For example, a binary tree ）. because B+ The nodes of the tree are bigger ,IO Getting up makes the disk work more comfortable .
Finally, I would like to share a 5 My practical performance optimization case in engineering years ago . We took over a system , With millions of users imei, To Mysql To query another string of users id（clientid） data . The implementation of pre development is traditional batch Mysql Statement query . In this way , Not to mention the network many times RTT Time consuming , speak only one point Mysql Inquire about , Even if there is an index, a lot of randomization is needed IO, Because the user imei It's randomly distributed . The optimization I used was also very simple , Put... Directly Mysql The user table passes the order of the whole user table at one time IO The way to read it out ,load Into memory . Use... In memory HashTable Organize , adopt Hash For quick query . In the end, the time-consuming optimization was lost 90% above .
Development of hard disk album of internal training ：
- 1. Disk opening ： Take off the hard coat of the mechanical hard disk ！
- 2. Disk partitioning also implies technical skills
- 3. How can we solve the problem that mechanical hard disks are slow and easy to break down ？
- 4. Disassemble the SSD structure
- 5. How much disk space does a new empty file take ？
- 6. Only 1 How much disk space does a byte file actually take up
- 7. When there are too many documents ls Why is the command stuck ？
- 8. Understand the principle of formatting
- 9.read How much disk does a byte of file actually take place on IO？
- 10.write When to write to disk after one byte of file IO？
- 11. Mechanical hard disk random IO Slower than you think
- 12. How much faster is a server equipped with a SSD than a mechanical hard disk ？
My official account is 「 Develop internal skill and practice 」, I'm not just talking about technical theory here , It's not just about practical experience . It's about combining theory with practice , Deepen the understanding of theory with practice 、 Use theory to improve your technical practice ability . Welcome to my official account , Please also share with your friends ~~~
本文为[Zhang Yanfei Allen]所创，转载请带上原文链接，感谢
- C++ 数字、string和char*的转换
- Won the CKA + CKS certificate with the highest gold content in kubernetes in 31 days!
- C + + number, string and char * conversion
- C + + Learning -- capacity() and resize() in C + +
- C + + Learning -- about code performance optimization
C + + programming experience (6): using C + + style type conversion
Latest party and government work report ppt - Park ppt
Online ID number extraction birthday tool
Field pointer? Dangling pointer? This article will help you understand!
GVRP of hcna Routing & Switching
- LeetCode 91. 解码方法
- Seq2seq implements chat robot
- [chat robot] principle of seq2seq model
- Leetcode 91. Decoding method
- HCNA Routing＆Switching之GVRP
- GVRP of hcna Routing & Switching
- HDU7016 Random Walk 2
- [Code+＃1]Yazid 的新生舞会
- CF1548C The Three Little Pigs
- HDU7033 Typing Contest
- HDU7016 Random Walk 2
- [code + 1] Yazid's freshman ball
- CF1548C The Three Little Pigs
- HDU7033 Typing Contest
- Qt Creator 自动补齐变慢的解决
- HALCON 20.11：如何处理标定助手品质问题
- HALCON 20.11：标定助手使用注意事项
- Solution of QT creator's automatic replenishment slowing down
- Halcon 20.11: how to deal with the quality problem of calibration assistant
- Halcon 20.11: precautions for use of calibration assistant
- "Top ten scientific and technological issues" announced| Young scientists 50 ² forum
- Reverse linked list
- JS data type
- Remember the bug encountered in reading and writing a file
- Singleton mode
- 在这个 N 多编程语言争霸的世界，C++ 究竟还有没有未来？
- In this world of N programming languages, is there a future for C + +?
- js Promise
- js 数组方法 回顾
- ES6 template characters
- js Promise
- JS array method review
- 【Golang】️走进 Go 语言️ 第一课 Hello World
- [golang] go into go language lesson 1 Hello World