cache IO And direct IO
cache IO： Data goes through the disk first DMA copy To kernel space , And then through kernel space cpu copy To user space
direct IO： Data goes through the disk DMA copy To user space
cache IO Also known as the standard IO, Default for most file systems IO Operations are all caching IO
stay Linux The cache of IO Is it in the mechanism , Data is first copied from disk to kernel space buffer , Then copy from the kernel space buffer to the address space of the application .
- Read operations
- The operating system checks the buffer of the kernel to see if there is any data needed , If it's already cached , So go straight back from the cache . Otherwise read from disk , It is then cached in the operating system's cache .
- Write operations
- Copy data from user space to the cache in kernel space . At this time, the writing operation for the user program has been completed , It's up to the operating system to decide when to write to disk again . Unless the display is called sync Synchronous naming
- cache IO The advantages of
- To some extent, it separates kernel space from user space , Protect the operation safety of the system itself
- Can reduce the number of times to read the disk , To improve performance
- cache IO The shortcomings of
- In cache IO In mechanism ,DMA Method can read data directly from disk to page cache , Or write data directly back to disk from page cache , Instead of data transfer directly between the application address space and the disk , such , Data in the process of transmission needs to be in the application address space （ User space ） And caching （ Kernel space ） Direct multiple data copy operations , This data copy operation brings about CPU The memory overhead is very large
direct IO That is, applications access disk data directly , Without passing through the kernel buffer , That is to bypass the kernel buffer , Manage yourself IO Buffer zone , The purpose of this is to reduce one copy of data from the kernel buffer to the user program cache
The purpose of introducing kernel buffer is to improve the access performance of disk files , Because when a process needs to read a disk file , If the contents of the file are already in the kernel buffer , Then there is no need to access the disk again . When a process needs to write data to a file, it is , In fact, just writing to the kernel buffer tells the process that it has written successfully , The real write to disk is delayed by a certain strategy .
However , For some more complex applications , For example, database server , In order to fully improve performance . I want to bypass the kernel buffer , By oneself in user mode space time and management IO buffer , Including cache mechanism and write delay mechanism , To support a unique query mechanism , For example, the database can increase the query cache hit rate according to the reasonable strategy . On the other hand , Bypassing kernel buffers can also reduce the overhead of system memory , Because the kernel buffer itself is using system memory .
direct IO The disadvantage is that if the directly accessed data is no longer used in the cache , Then every time the data is loaded directly from the disk , This kind of direct loading will be very slow , Usually direct IO Asynchronous with IO Combined use will get better performance
Linux Provides support for this demand , That is to say open() Add parameter option in system call O_DIRECT, Files opened with it can bypass direct access to the kernel buffer , This effectively avoids CPU And memory overhead of extra time .
By the way , And O_DIRECT A similar option is O_SYNC, The latter is only valid for writing data , It immediately writes the data written to the kernel buffer to disk , Minimize the loss of data in case of machine failure , But it still has to go through the kernel buffer .