Bitmap Each element in the array represents an element , Commonly used for integer lookup and sorting （ First map to the array, then scan the array ）, Whether the integer is repeated, etc
stay java in , One int Type account 32 A bit , We use one int Array to represent without new int, Total memory consumption 32*32bit, Now if we use int If each bit of the bytecode represents a number , that 32 Only one number is needed int The memory space occupied by the type is enough , This will save a lot of memory in the case of large amount of data .
Specific ideas ：
1 individual int Occupy 4 Byte is 4*8=32 position , Then we only need to apply for one int The array length is int tmp[1+N/32] You can store all this data , among N Represents the total number of searches to be made ,tmp Each element in is in memory 32 Bits can correspond to decimal numbers 0~31, So we can get BitMap surface :
tmp: Can mean 0~31
tmp: Can mean 32~63
tmp Can mean 64~95
Now let's see how the decimal number is converted to the corresponding bit position ：
Assume that this 40 Billion int The data is ：6,3,8,32,36,......, So specific BitMap Expressed as ：
How to determine int The number is tmp Which subscript of the array , This can actually be directly divided by 32 Take the whole part , for example ： Integers 8 Divide 32 Rounding is equal to 0, that 8 It's just tmp On . in addition , How do we know 8 stay tmp Medium 32 Which one of the positions , This situation is direct mod On 32 Just ok, Another example is integers 8, stay tmp No 8 mod On 32 be equal to 8, So integers 8 It's just tmp The eighth in bit position （ Count from the right ）.
1： Look at a little scene > stay 3 Find out the non repeated integers among the hundred million integers , Limited memory is not enough to hold 3 Million integers .
For this scenario, I can use 2-BitMap To solve , That is to assign... To each integer 2bit, Using different 0、1 Combination to identify special meaning , Such as 00 Indicates that this integer has never appeared ,01 Indicates one occurrence ,11 It means that it has happened many times , You can find the repeated integers , The required memory space is normal BitMap Of 2 times , by ：3 Billion *2/8/1024/1024=71.5MB.
The specific process is as follows ：
Scanning 3 Million integers , Group BitMap, To look at first BitMap Corresponding position in , If 00 Has become a 01, yes 01 Has become a 11, yes 11 Keep the same , When will 3 After scanning 100 million integers, that is to say, the whole BitMap It's assembled . Check out BitMap Set the corresponding bit to 11 The integer output of .
2: It is known that a file contains some phone numbers , Each number is 8 Digit number , Count the number of different numbers .
8 Most bits 99 999 999, Probably need 99m individual bit, Probably 10 A few m Bytes of memory . （ Can be understood as from 0-99 999 999 The number of , Each number corresponds to one Bit position , So you just need to 99M individual Bit==1.2MBytes, such , Just a little 1.2M Left and right memory represents all 8 Number of phone numbers ）
BitMap The idea of the interview can be used to solve many problems , Then it will be used in many systems , It's a good way to solve the problem .
however BitMap There are also some limitations , So there will be others based on BitMap To solve these problems .
- Data collision . For example, mapping a string to BitMap There's a collision problem , Then consider using Bloom Filter To solve ,Bloom Filter The use of multiple Hash Function to reduce the probability of conflict .
Data sparseness . Another example is to deposit (10,8887983,93452134) These three data , We need to build a 99999999 Of length BitMap , But in fact, there is only 3 Data , There's a lot of space to waste , In case of such a problem , By introducing Roaring BitMap To solve .