当前位置:网站首页>CUDA_ Register and local memory

CUDA_ Register and local memory

2020-11-09 22:41:04 Li Baqian



register

GPU On chip high speed memory , Basic unit is a register file , The size of each register file is 32bit.

Ability to calculate 1.0/1.1 Version hardware , Every SM The number of register files in is 8192; And in the 1.2/1.3 In hardware , Every SM The number of register files in is 16384.

In general , The simple local variables in the kernel thread are all in register memory .

Local memory

For each thread , Local memory is also private , If registers are consumed , The data will be stored in local storage . If each thread uses too many registers , Or declare a large structure or array , Or the compiler can't determine the size of the array , The private data of the thread may be allocated to local memory in . The input and intermediate variables of a thread are stored in registers or local memory . Data in local memory is stored in video memory by data , So right. local memory The access speed of is very slow .

as follows ,mt Will be deposited local memory in

__global__ void localmemDemo(float* A)
{
  unsigned int mt[3];
}

If you define a thread private array at the same time , It's initialized , So if the array size is not large , It is still possible that the array can be divided into registers .

__global__ void localmemDemo(float* A)
{
  unsigned int mt[3] = {1, 2, 3};
}

At compile time , Output ptx(parallel thread execution) Assembly code ( Add... At compile time -ptx perhaps -keep Options ), You can see if variables are assigned to... In the first phase of compilation local memory in , If a variable is in ptx China and Israel .local Mnemonic statement , You can use ld.local and st.local Access mnemonic , This variable is stored in local memory in . however , Even if the first compiled variable is not in local memory in , In the second stage of compilation, it is still possible to store variables in the local memory in . At this time , By adding --ptxas-options=-v Compile options are used to observe lmem Usage situation .

If the array is small , And it must be allocated in registers , Use the following methods :

__global__ void localmemDemo(float* A)
{
  unsigned int mt0, mt1, mt2;
}

版权声明
本文为[Li Baqian]所创,转载请带上原文链接,感谢