当前位置:网站首页>Developing instruction level parallelism with dynamic scheduling, multiple emission and speculation

Developing instruction level parallelism with dynamic scheduling, multiple emission and speculation

2020-12-06 20:04:45 Xlgd

Recently, I need to complete a program about instruction level parallelism (ILP) Report of , My theme is “ With dynamic scheduling 、 More launch and speculation to develop ILP”, It's a textbook 《 Computer architecture —— Quantitative research methods 》 The third chapter and the Ninth Section of the third chapter of the article , It's recorded here , For future review .
The notes are divided into the following parts :

  • review
    • Assembly line
    • Pipeline adventure
  • Dynamic scheduling
  • speculation
  • Multiple emission
  • Combine them

review

First of all, let's introduce what pipeline is and some problems caused by using pipeline

Assembly line

Pipelining is a technique used to overlap multiple instructions .

The picture above is a simple example , When you don't use pipelining technology , Processor every 800ps Complete an instruction , In order to improve the throughput of executing instructions , The execution of instructions is divided into five stages ( Fingering 、 decoding 、 perform 、 Visiting and depositing 、 Write back to ) And add the corresponding operation unit , The processor can overlap instructions , As shown in the second picture above , The modified assembly line every 200ps You can execute an instruction , Increased throughput , As you can see, pipelining only increases the throughput of instructions executed by the processor , It doesn't reduce the execution cycle of daily instructions , In the second diagram, it is still necessary to execute each instruction 800ps.
Here we insert a brief introduction to each stage of the classic five stage pipeline :

  • Fingering (IF), Read instructions from instruction memory
  • decoding (ID), Parsing instructions , Reading registers at the same time
  • perform (EX), Perform operations or calculate addresses ( For loading / Store instructions )
  • Visiting and depositing (MEM), Read the operands from the data register ( For loading / Store instructions )
  • Write back to (WB), Write the result back to the register

Pipeline adventure

But there will be some data related problems between instructions , So the pipeline technology also has some limitations , The next three common types of pipeline are introduced

  • Structural adventure , In other words, the hardware does not support the execution of multiple instructions in the same clock cycle .
  • Data risk , Occurs in the case of a pipeline pause because one instruction has to wait for the completion of another . There are two specific reasons :
    • Data dependency , It's also called true data correlation . There is a read after write between two instructions (RAW) problem , That is, the data to be written in the first instruction , Just about to be used by the second instruction , And because the first instruction needs to be in the 5 Stages ( Write back to ) It will actually write data to the register , The second instruction is in the pipeline 3 Stages ( perform ) This data is needed , So there's this real data correlation , It can be by-pass (bypassing) Technology and pause ( It's also called a bubble ) Method to solve .
    • Name dependency , In other words, two instructions read and write the same register (WAR) Or write after writing (WAW), It's not the same as reading after writing , Write after read and write after write do not generate data flow , In other words, there is no real data correlation between the two instructions , It's just a dependency created by operating on the same register , Can be renamed by register (register renaming) Method to solve .


An example of this is the problem of reading and writing , The first instruction modifies the register s0 The data of , The second instruction needs to read s0 The data of , Bypass technology is used to get s0 The cycle of data , Send the data directly through a bypass to where it is needed , You can see the blue line , Normally, the first instruction needs to be in WB Stage can be modified s0 Value , But through the bypass , Can be in EX The stage obtains the data and sends it directly to the decoding of the second instruction (ID) Stage , Avoid pauses .


Here's another example , Because the first instruction in this example is a load instruction , Must wait until MEM Stage to get the data , But the second instruction is EX The stage is about to use data , We can't send future data to the present , So you can't use bypass technology in this case , So you can see a pause between the two instructions , That's the blue bubble (bubble).

  • Control risk , Because the decision depends on the outcome of an instruction , And this instruction is in progress . Simply put, it's because of conditional statements , It is not known whether the instruction needs to be executed until the branch is resolved , It will affect the throughput of the pipeline . There's a pause in the solution 、 Branch prediction (branch prediction) And delayed branching (delayed branch).


This is an example of using branch prediction to solve branch risk , The second instruction requires a judgment , And the result of judgment can't be obtained quickly , Not to produce a pause , So the processor predicts that the instruction will not jump , So instead of pausing for the result, the processor continues to execute the third instruction , Branch prediction is simply to keep going , So just predict a result , Let the processor execute the predicted instruction first , Branch prediction has other details , If you are interested, you can read the textbook , This is not the point .


The branch prediction in the figure above fails , So for the instructions predicted before stopping execution , It's a simple insertion of bubbles , And then go and execute the right instructions .

About delayed branching , In short, it is in the process of branch resolution , To execute a branch independent instruction to avoid pipeline pause .

Dynamic scheduling

I won't talk too much about dynamic scheduling , My focus is on speculation and multiple launches , Here we mainly list the characteristics of dynamic scheduling :

  • Disorderly execution
  • Out of order
  • Imprecise anomaly , A problem caused by out of order completion , That is, when something goes wrong , Some instructions that should not have been executed have already been executed , Or some instructions that should have been executed have not yet been executed .
  • Tomasulo Algorithm , An algorithm for dynamic scheduling

The execution process of dynamic scheduling is divided into launching 、 perform 、 Write the results in three parts .

speculation

Conjecture technology is mainly to overcome control related limitations , To develop more ILP.

thought :

  • With dynamic branch prediction, select which instructions to execute
  • Using conjecture , Instructions can be executed before solving control related problems ( Ability to undo the effects of erroneous inferential sequences )
  • Dynamic scheduling , To deal with the scheduling of different combinations of basic modules

characteristic :

  • Disorderly execution
  • Additional instructions are submitted
  • Reorder buffers
  • Sequential submission

Let's take a look at the basic architecture based on hardware speculation :

The red box is the part that changes based on the dynamic scheduling structure , The yellow box is the key part of the dynamic scheduling structure , Next, explain .

Reorder buffers (reorder buffer, abbreviation ROB): This is supposed to be a new hardware , Because of the ROB, An additional instruction commit is added after the result of the speculative instruction execution , So there are four parts to the hypothetical instruction execution , It's the launch 、 perform 、 Write results and submit .ROB Register set is extended in , The result of the instruction is saved for a certain period of time , This period of time from the completion of the relevant operation of the instruction , Until the instruction has been submitted , Registers and memory are updated only after the instruction is committed ( That is, we are very sure that the instruction will be executed ), therefore ROB It is to provide the operands between the execution of the instruction and the submission of the instruction .ROB Be similar to Tomasulo The memory buffer in the algorithm , So the buffer will be loaded (load buffers, The second red box ) The next memory buffer is integrated into ROB It's in .

Reserve Station (reservation stations, abbreviation RS): The reservation station provides register renaming function , It's solved WAR and WAW problem , It buffers operands for instructions waiting to be sent , The basic idea is : The reservation station extracts and buffers an operand as soon as it is available , This eliminates the need to get the operands from registers . Besides , The instruction waiting to be executed specifies the reservation station , Give yourself input . Last , When the register is written continuously and overlapped , Only the last operation is actually used to update the register . When the command is launched , Renames the register specifier of the operands to be used , Change to keep the name of the station , This implements the register rename function .

Common data bus (common data bus, abbreviation CDB):CDB Pass the operands from the reservation station to all functional units that need it , It doesn't have to go through registers , Speed up the execution of instructions .


The figure above shows the situation of some instructions at a certain time in the process of execution , The top two instructions are executed in dynamic scheduling , They point, respectively, to the execution in the case of speculation , It can be seen that , Under dynamic scheduling , Because it's out of order , These two instructions have been executed and written to the register . The same instruction is supposed to have been executed , But because of the MUL.D The instruction has not completed the instruction submission , So these two instructions are not allowed to complete the instruction submission .
This difference means having ROB Can dynamically execute code while maintaining precise exceptions .

Multiple emission

The goal of multiple transmit processors is to allow multiple instructions to be sent in a single clock cycle , It can be divided into three categories :

  • Static scheduling superscalar processors
  • Dynamically scheduling superscalar processors
  • VLIW( Very long instruction words ) processor

The difference is , The static scheduling is sequential ( That's sequence ) perform , Dynamic scheduling uses out of order execution , The number of instructions they emit in each cycle is variable , and VLIW The processor emits a fixed number of instructions per clock cycle .


The figure above is a simple example of a loop , Without using any technology , Executing a loop and getting a result requires 9 A cycle .


The code above uses loop unrolling and static scheduling techniques , To get a result requires 3.5 A cycle .


The code above uses loop expansion 、 Static scheduling and VLIW technology , To get a result requires 1.29 A cycle .


The figure above summarizes the technologies used by several different processors and their respective applications .

Combine them

Finally, we will schedule dynamically 、 Conjecture combined with multiple launch technology , The basic structure is as follows :

We can see that this structure adds floating point multiplication unit and integer unit to the conjectured structure , Support with separate integers 、 load / Multi transmit superscalar pipeline for storage and floating point units .

This article gives a general introduction to dynamic scheduling 、 Speculation and multiple launch technologies to develop ILP Basic concepts of , Readers need to understand the basic five level pipeline and dynamic scheduling technology , For more details, please refer to the textbook I mentioned at the beginning of the article .

Reference resources :

  • 《 Computer architecture —— Quantitative research methods 》
  • 《 Computer composition and Design —— Hardware / software interface 》

版权声明
本文为[Xlgd]所创,转载请带上原文链接,感谢
https://chowdera.com/2020/12/20201206200345960s.html