Storage rules of integers and floating point numbers in memory

2020-12-06 15:22:07

Focus on + Star sign public Number , Don't miss the highlights author |  Night breeze

layout | strongerHuang

Why our code will float 、 Cast integers , Or print out the accuracy loss , Or something wrong ？

Trying to figure this out , You need to know about integers 、 Floating point storage rules .

Embedded column

1

Floating point storage rules

According to international standards IEEE（ Institute of electrical and Electronic Engineering ） Regulations , Any floating-point number NUM The binary number of can be written as ：

NUM = (-1) ^ S * M * 2 ^ E; (S A symbol ,E It means factorial ,M Represents a significant number )

① When S by 0 when , It means a positive number ; When S by 1 when , It means a negative number ;

②M Represents a significant number ,1<= M <2;

③2^E The index

For example, decimal 3.0, Binary is 0011.0 I can write this as （-1）^ 0 * 1.1 * 2 ^ 1

Another example is decimal -3.0, Binary is -0011.0 I can write this as （-1）^ 1 * 1.1 * 2 ^ 1

And the rules float Type has a sign bit （S）, Yes 8 One digit （E）, and 23 Significant digits （M）

double Type has a sign bit （S）, Yes 11 One digit （E）, and 52 Significant digits （M）

With float Type as an example ： IEEE about （ Significant figures ）M and （ Index ）E There are special rules （ With float For example ）：

1. because M The value of must be 1<= M <2, So it can definitely be written as 1.xxxxxxx In the form of , So the rules are M Omit the first one in storage 1, Only numbers after the decimal point are stored .

This saves space , With float Type as an example , I can save it 23 Decimal information , Plus the missing 1 You can use it 23 To said 24 A valid message .

2. about E（ Index ）E It's an unsigned integer, so E The value range of is （0~ 255）, But in counting, the index can be negative , So it's a rule to deposit E when , Add the middle number to its original value （127）, Subtract the middle number when using it （127）, such E The real value range of is （-127~128）.

about E There are also three situations ：

①E Not all for 0, Not all for 1:

Then we use the normal calculation rules ,E The real value of is E Minus the literal value of 127（ In the middle ),M Add the value of the first omitted 1.

②E All for 0

Then the index E be equal to 1-127 For real value ,M Don't add what's left out 1, It's reduced to 0.xxxxxxxx decimal . This is to show that 0, And some very small integers .

So floating point numbers and 0 When comparing , it is to be noted that .

③E All for 1

When M All for 0 when , Express ± infinity （ Depending on the sign bit ）; When M Not all for 1 when , This number is not a number （NaN）

Embedded column

2

test

The code is as follows ：

void test(void)
{
float m=134.375;
char *a=(char*)&m;

printf("0x%p:%d\n",a,*a);
printf("0x%p:%d\n",a+1,*(a+1) );
printf("0x%p:%d\n",a+2,*(a+2) );
printf("0x%p:%d\n",a+3,*(a+3) );
}

Code output ： The specific calculation process is as follows ： Embedded column

3

Loss of accuracy

We can multiply the decimal part of the decimal system by 2, Take the integer part as a bit of binary , Continue to multiply the remaining decimals 2, Until there are no remaining decimals .

for example 0.2 Can be converted to ：

0.2 x 2 = 0.4 0

0.4 x 2 = 0.8 0

0.8 x 2 = 1.6 1

0.6 x 2 = 1.2 1

0.2 x 2 = 0.4 0

0.4 x 2 = 0.8 0

0.8 x 2 = 1.6 1

namely ：.0011001…

It's a binary number with an infinite loop , That's why there is a loss of precision when converting decimal to binary decimal .

I shared with you not long ago 《 Single precision 、 Double precision 、 What is the difference between multi precision and mixed precision calculation ?》 Maybe you don't quite understand , Today I saw the storage rules of floating point numbers , Do you understand ？

Embedded column

4

Storage rules of integers

Understand the storage rules of floating point numbers , It's easy to understand integers .

Integers are stored in memory in the form of complements , There are positive and negative integers . When you need to store signed numbers , Use the first place to indicate positive （0） And negative （1）.

The inverse and complement of a positive number is still itself , The following is mainly about the inverse and complement of negative numbers . The inverse code is the original code after removing the highest symbol bit, and the remaining bits are reversed bit by bit , The complement is the inverse of the complement 1 .

Test code ：

void test(void)
{
int8_t n=-123;
uint8_t *p=(uint8_t *)&n;

printf("%d\n",n);
printf("%d\n",*p);
}

Output results ： The calculation process is as follows ： Material source ：

https://blog.csdn.net/u014470361/article/details/79820892

disclaimer ： Source network of this paper , The copyright belongs to the original author . If involves the work copyright question , Please contact me to delete .

------------ END ------------

Selected summary | special column | Catalog | Search for

Selected summary | ARM、Cortex-M

Pay attention to WeChat public number 『 Embedded column 』, Bottom menu for more , reply “ Add group ” Join the technical exchange group according to the rules . Click on “ Read the original ” See more sharing , welcome Share 、 Collection 、 give the thumbs-up 、 Looking at .

https://chowdera.com/2020/12/202012061521340189.html