当前位置:网站首页>I've been using these tips to improve the efficiency of code!

I've been using these tips to improve the efficiency of code!

2021-05-04 16:57:00 C programming learning base

The purpose of our program is to make it work stably in any situation . A program that runs very fast but gets the wrong result is useless .

In the process of program development and optimization , We have to think about how the code is used , And the key factors that affect it .

Usually , We have to make a trade-off between the simplicity of the program and its speed . Today we will talk about how to optimize the performance of the program .

 

One 、 Reduce the amount of program calculation

1. Sample code


 

2. Analysis of the code

   Code as above , Every time the outer loop is executed , We're going to do a multiplication calculation .i = 0,ni = 0;i = 1,ni = n;i = 2,ni = 2n. therefore , We can replace multiplication with addition , With n Step length , This reduces the amount of code in the outer loop .

3. Improve the code


 

The multiplication instruction in a computer is much slower than the addition instruction .

 

Two 、 Extract the common parts of the code

1. Sample code

   Imagine , We have an image , We represent the image as a two-dimensional array , Array elements represent pixels . We want to get the east of a given pixel 、 south 、 In the west 、 The sum of the four neighbors in the North . And find their average or their sum . The code is as follows .


 

2. Analysis of the code

   After compiling the above code, the assembly code is as follows , Pay attention to 3,4,5 That's ok , There are three times n The multiplication of . Let's take the top one up and down After expansion, you will find that all four lattice expressions have i*n + j. therefore , You can extract the public part , And then through addition and subtraction, we can get up、down Equal value .


 

3. Improve the code


 

   The assembly of the improved code is as follows . There's only one multiplication after compilation . Less 6 Clock cycles ( A multiplication period is about 3 Clock cycles ).


 

   about GCC For compilers , Compiler can be optimized according to different levels , There are different ways to optimize , Will automatically complete the above optimization operation . Let's introduce , Those have to be optimized manually .

 

3、 ... and 、 Eliminate inefficient code in loops

1. Sample code

   There seems to be nothing wrong with the program , A very common case conversion code , But why is it that as the input length of the string gets longer , The execution time of the code will increase exponentially ?


 

2. Analysis of the code

   So let's test the code , Enter a series of strings .


lower1 Code performance testing

   When the length of the input string is less than 100000 when , There is little difference in the running time of programs . however , As the length of the string increases , The running time of the program increases exponentially .

   We convert the code into goto Look at the form .


 

   The above code is divided into initialization ( The first 3 That's ok ), test ( The first 4 That's ok ), to update ( The first 9,10 That's ok ) In the third part of . Initialization will only be performed once . But tests and updates are executed every time . Every time you do a cycle , Will be right. strlen Call once .

   So let's see strlen Function source code is how to calculate the string length .


 

  strlen Function to calculate the length of a string : Traversal string , Until I met ‘\0’ Will stop . therefore ,strlen The time complexity of the function is O(N).lower1 in , For length is N In terms of string ,strlen The number of calls to is N,N-1,N-2 ... 1. For a linear time function call N Time , Its time complexity is close to O(N2).

3. Improve the code

   For such redundant calls in loops , We can move it out of the loop . Apply the results to the cycle . The improved code is as follows .


 

   Compare the two functions , As shown in the figure below .lower2 The execution time of the function is significantly improved .


lower1 and lower2 Code efficiency

 

Four 、 Eliminate unnecessary memory references

1. Sample code

   The following code functions as , Calculation a The sum of all elements in each row of the array exists b[i] in .


 

2. Analysis of the code

   The assembly code is as follows :


 

   This means that each loop needs to be read from memory b[i], Then take it. b[i] Write back to memory .b[i] +=  b[i] + a[i*n + j]; Actually, at the beginning of each cycle ,b[i] It's the last value . Why read it from memory and write it back every time ?

3. Improve the code


 

   The assembly looks like this :


 

   The improved code introduces temporary variables to save intermediate results , Only when the final value is calculated , To store the result in an array or global variable .

 

5、 ... and 、 Reduce unnecessary calls

1. Sample code

   For the convenience of example , We define a structure that contains an array and its length , The main purpose is to prevent array access from crossing the bounds ,data_t It can be int,long Other types . The details are as follows .


 

vec Vector diagram

  get_vec_element Function is used to traverse data Array and stored in val in .


 

   We'll take the following code as an example to start the step-by-step optimization process .

2. Analysis of the code

  get_vec_element The function gets the next element , stay get_vec_element Function , Every cycle has to do with v->len The comparison , To prevent cross-border . It's a good habit to do border checks , But every time it's done, it's inefficient .

3. Improve the code

   We can move the code for the length of the vector to extracorporeal circulation , At the same time, the abstract data type adds a function get_vec_start. This function returns the starting address of the array . So there are no function calls in the loop body , It's direct access to arrays .


 

 

6、 ... and 、  Loop unrolling

1. Sample code

   We are combine2 Improve on the code of .

2. Analysis of the code

Loop unrolling is done by increasing the number of elements calculated for each iteration , Reduce the number of iterations of the loop .

3. Improve the code


 

   In the improved code , The first loop processes two elements of the array at a time . That is, every iteration , Loop index i Add 2, In an iteration , For array elements i and i+1 Use the merge operation . In general, we call it 2×1 Loop unrolling , This transformation can reduce the impact of loop overhead .

Pay attention not to cross the border , Set correctly limit,n Elements , Generally set the boundaries n-1

 

7、 ... and 、 Cumulative variable , Multi channel parallel

1. Sample code

   We are combine3 Improve on the code of .

2. Analysis of the code

   For a combinable and exchangeable merge operation , For example, integer addition or multiplication , We can divide a combination into two or more parts by combining and calculating , And merge the results at the end to improve performance .

Particular attention : Don't combine floating point numbers easily . The encoding format of floating-point numbers is different from other integer numbers .

3. Improve the code


 

   The above code is expanded in two loops , To merge more elements per iteration , Two parallel channels are also used , Accumulate elements with even index values in variables acc0 in , And elements with odd index values accumulate in variables acc1 in .

   therefore , Let's call this ”2×2 Loop unrolling ”. Application 2×2 Loop unrolling . By maintaining multiple cumulative variables , This approach takes advantage of multiple functional units and their pipelining capabilities

 

8、 ... and 、 Recombine transformation

1. Sample code

   We are combine3 Improve on the code of .

2. Analysis of the code

   In fact, the performance of the code is close to the limit , Even if we do more loop deployment, the performance improvement is not obvious .

   We need to think differently , Pay attention to combine3 In the code 12 Lines of code , We can change the order of merging the next vector elements ( Floating point numbers don't apply ). Before reconnection combine3 The critical path of the code is shown in the figure below .


combine3 The critical path to code

3. Improve the code


 

   Re combining transformation can reduce the number of operations on the critical path in computation , This method increases the number of operations that can be performed in parallel , Make better use of the pipeline capability of functional units to get better performance . The critical path is as follows .


combine3 The critical path after recombining

 

Nine 、 Conditional delivery style code

1. Sample code


 

2. Analysis of the code

   The pipeline performance of modern processor makes the processor work far ahead of the current instructions .

   Branch prediction in the processor predicts where to jump next when it encounters a comparison instruction . If the prediction is wrong , We're going to go back to the branch jump . Branch prediction error will seriously affect the execution efficiency of the program .

   therefore , We should write code to improve the prediction accuracy of the processor , That is to use conditional transfer instruction . We use conditional operations to calculate values , Then use these values to update the program state , As shown in the improved code .

3. Improve the code


 

   In the first part of the original code 4 In line , Need to be right a[i] and b[i] Compare , Go on to the next step , The consequence is to make predictions every time .

   The improved code implements this function to calculate each location i Maximum and minimum of , Then assign these values to a[i] and b[i], Instead of branch prediction .

 

* summary

   We introduced several techniques to improve code efficiency , Some are automatically optimized by the compiler , Some of them need to be realized by ourselves . It is summarized as follows :

         Eliminate continuous function calls . It is possible that , Move the calculation out of the loop . Consider selectively compromising the modularity of the program to achieve greater efficiency .

          Eliminate unnecessary memory references . Introduce temporary variables to save intermediate results . Only when the final value is calculated , To store the result in an array or global variable .

          Expand the cycle , Reduce overhead , And make further optimization possible .

          By using techniques such as multiple cumulative variables and recombination , Find ways to improve instruction level parallelism .

          Rewrite conditional operations in a functional style , Make compilation conditional data transfer .


 

Whether you change careers or not , It's better to be a beginner , It's OK to be advanced , If you want to learn programming , Advanced programmers ~

【 Worthy of attention 】 my Programming learning exchange Club !【 Click to enter 】

C Introduction to language :


 

C Language must read books :


 

版权声明
本文为[C programming learning base]所创,转载请带上原文链接,感谢
https://chowdera.com/2021/05/20210504165533161B.html

随机推荐