当前位置:网站首页>Interviewer: do you understand bitmap? In what scenarios have you used it? What problems have you encountered?

Interviewer: do you understand bitmap? In what scenarios have you used it? What problems have you encountered?

2020-12-06 07:03:09 Zhu Xiaosi

Click on the above “ Zhu Xiaosi's blog ”, choice “ Set to star ”

The background to reply " book ", obtain

Bit-map The basic idea is to use one bit Bit to mark the corresponding Value, and Key That's the element . Because of the adoption of Bit Store data in units , So in terms of storage space , Can save a lot .(PS: Focus on   Save storage space

Suppose there is such a need : stay 20 Find a number out of 100 million random integers m Whether there is , And suppose 32 Bit operating system ,4G Memory

stay Java in ,int Occupy 4 byte ,1 byte =8 position (1 byte = 8 bit)

If you use int Storage , That's it 20 One hundred million int, So the space occupied is about   (2000000000*4/1024/1024/1024)≈7.45G

If bit by bit storage is different ,20 One hundred million is 20 Billion , The occupied space is about   (2000000000/8/1024/1024/1024)≈0.23G

Degree awarded , Needless to say

that , The problem is coming. , How to express a number ?

Said just now , Each bit represents a number ,0 Does not exist ,1 Indicates presence , That's exactly what binary

In this way, we can easily express {1,2,4,6} These numbers :

The minimum unit of computer memory allocation is bytes , That is to say 8 position , If it means {12,13,15} What shall I do? ?

In the other, of course 8 It means :

In this case , It's like a two-dimensional array

1 individual int Occupy 32 position , Then we only need to apply for one int The array length is int tmp[1+N/32] You can store , among N Represents the maximum of these numbers to store , So :

tmp[0]: Can be said 0~31

tmp[1]: Can be said 32~63

tmp[2]: Can be said 64~95

...

In this way , Given any integer M, that M/32 You get the subscript ,M%32 I know where it is in this subscript

add to

Here's the problem , How can we put a number in it ? for example , Want to put 5 Put this number in , How to do it? ?

First ,5/32=0,5%32=5, It should be in tmp[0] Of the 5 A place , Then we put the 1 Move to the left 5 position , Then press the position or

To binary is to

This is equivalent to 86 | 32 = 118

86 | (1<<5) = 118

b[0] = b[0] | (1<<5)

in other words , To insert a number , take 1 The left shift band represents the digit , Then with the original number by bit or operation

Simplify , Namely 86 + (5/8) | (1<<(5%8))

therefore , The formula can be summed up as :p + (i/8)|(1<<(i%8)) among ,p It means the present value ,i Indicates the number to be inserted

eliminate

Above is the addition of , So what to do if you want to clean it up ?

Or the example above , Suppose we want to 6 remove , How do you do that ?

See from the graph , Just place the number in 0 that will do

1 Move left 6 position , Just get there 6 The digit represented by this number , And then reverse by bit , Finally, it is bitwise with the original number , In this way, the position is 0 了

b[0] = b[0] & (~(1<<6))

b[0] = b[0] & (~(1<<(i%8)))

lookup

We said that before , Each represents a number ,1 Express ( Or existence ),0 It means nothing ( Or it doesn't exist ). By setting it to 1 perhaps 0 To add and remove guys , So to judge whether a data store exists or not is to judge the bit where the number is 0 still 1

hypothesis , We want to know 3 Is here or not , Then just judge  b[0] & (1<<3)  If the value is 0, There is no , If it is 1, It means being

Bitmap What's the usage?

Fast sorting of large amount of data 、 lookup 、 duplicate removal

Quick sort

Suppose we want to be right 0-7 Internal 5 Elements (4,7,2,5,3) Sort ( Let's assume that these elements are not repeated ), We can use Bit-map To achieve the purpose of sorting .

To express 8 Number , We just need 8 individual Bit(1Bytes), First we open up 1Byte Space , Put all of these spaces Bit All bits are set to 0, Then the corresponding position is 1.

Last , Go through it Bit Area , Output the number of the one bit (2,3,4,5,7), In this way, the purpose of sorting is achieved , Time complexity O(n).

advantage :

  • It's very efficient , There is no need to compare and shift ;

  • Less memory , such as N=10000000; Just use... Memory N/8=1250000Byte=1.25M

shortcoming :

  • All the data can't be repeated . That is, duplicate data cannot be sorted and searched .

  • It's only when the data is dense that there are advantages

Get rid of the weight quickly

20 The number of non repeated integers found in billion integers , There's not enough memory for this 20 Million integers .

First , according to “ There's not enough memory for this 05 Million integers ” We can quickly associate with Bit-map. The next key question is how to design our Bit-map To show that 20 The state of a billion numbers . In fact, the problem is very simple , There are only three states of a number , There is no such thing as , only one , Repeat . therefore , We just need 2bits You can store the status of a number , Suppose we set a number that doesn't exist as 00, Once in a while 01, There are two or more times 11. Then we need storage space 2G about .

The next task is to put this 20 Put in a hundred million figures ( Storage ), If the corresponding status bit is 00, Turn it into 01, It means that there is once ; If the corresponding status bit is 01, Turn it into 11, Indicates that there is already one , That is, many times ; If 11, Then the corresponding status bit remains the same , Still means there are many times .

Last , Statistics status bit is 01 The number of , And you get the number of numbers that don't repeat , The time complexity is O(n).

Quick search

This is what we said earlier ,int One element in the array is 4 Bytes take up 32 position , Then divide by 32 We know the subscript of the element , Yes 32 Mod (%32) Just know who it is , If that bit is 1, It means that there is .

Summary & review

Bitmap It is mainly used to quickly retrieve keyword status , The keyword is usually required to be a continuous sequence ( Or keywords are most of a sequential sequence ), The most basic situation , Use 1bit Indicates the status of a keyword ( Two states can be marked ), But you can also use 2bit( Express 4 States ),3bit( Express 8 States ).

Bitmap The main applications of : Means continuous ( Or close to continuous , That is, most of them will appear ) The status of the keyword sequence ( Number of States / Number of keywords   The smaller the better. ).

32 On the bit machine , For an integer number , such as int a=1 In memory 32bit position , This is for the convenience of computer operation . But for some application scenarios , It's a huge waste , Because we can use the corresponding 32bit Bits correspond to decimal bits 0-31 Number , And this is Bit-map The basic idea of .Bit-map The algorithm uses this idea to process a large number of data sorting 、 Inquiry and de duplication .

Add 1

Without overflowing the numbers , For positive and negative numbers , Moving one bit to the left is equivalent to multiplying by 2 Of 1 Power , Move left n Bit is equal to times 2 Of n Power , To shift one bit to the right is to divide 2, Move right n Bits are equal to dividing by 2 Of n Power .

<< Move left , Equivalent to times 2 Of n Power , for example :1<<6    amount to 1×64=64,3<<4 amount to 3×16=48

>> Move right , Equivalent to divided by 2 Of n Power , for example :64>>3 amount to 64÷8=8

^  Exclusive or , It's equivalent to finding the remainder , for example :48^32 amount to 48%32=16

Add 2

Don't use third-party variables , Exchange the values of two variables

//  Mode one 
a = a + b;
b = a - b;
a = a - b;

//  Mode two 
a = a ^ b;
b = a ^ b;
a = a ^ b; 

BitSet

BitSet It implements a bit vector , It can grow as needed . Every bit has a Boolean value . One BitSet The bits of can be indexed by nonnegative integers (PS: It means that every bit can represent a non negative integer ). Can find 、 Set up 、 Clear a person . You can modify another... With logical operators BitSet The content of . By default , All bits have a default value false.

You can see , It's similar to what we thought before

Use one long Array to store , Initial length 64,set Move right first when it's worth 6 position ( Equivalent to divided by 64) Calculate where in the array , Then change the status bit

It doesn't matter if you don't understand anything else , It's enough to understand these two sentences :

int wordIndex = wordIndex(bitIndex);
words[wordIndex] |= (1L << bitIndex);

Bloom Filters

Bloom filter It's a data structure , It can be used to determine whether an element is in a collection , With fast operation , Small memory footprint .

The cost of efficient insertion and query is ,Bloom Filter It's a probability based data structure : It can only tell us that an element is absolutely not or may be in a collection .

Bloom filter The basic data structure of is a Bit vector ( It can be understood as an array ).

It is mainly used in large-scale data without precise filtering , Such as checking spam address , Reptiles URL Address to heavy , Solve the problem of cache penetration

If you want to determine whether an element is in a set , The general idea is to save all the elements in the collection , And then through comparison to determine . Linked list 、 Trees 、 Hash table ( Hashtable ) And so on, the data structure is this way of thinking , But as the number of elements in the collection increases , Need more and more storage space ; At the same time, the retrieval speed is getting slower and slower , The complexity of retrieval time is O(n)、O(log n)、O(1).

The principle of Bloom filter is , When an element is added to a collection , adopt K Hash functions map this element to a set of bits (Bit array) Medium K A little bit , Set them as 1 . Search time , Just to see if all these points are 1 To know if the element is in the collection ; If any of these points 0, The inspected element must not be in ; If it's all 1, Then the inspected element is likely to be ( Reason why “ Probably ” It's the error ).

BloomFilter technological process

1、 The first thing you need to k individual hash function , Every function can put key Hash to 1 It's an integer ;

2、 On initialization , Need a length of n Array of bits , Each bit is initialized to 0;

3、 Some key When you join the assembly , use k individual hash Function calculated k Hash values , And the corresponding bit position in the array is 1;

4、 Judge a certain key Is it in the assembly , use k individual hash Function calculated k Hash values , And query the corresponding bits in the array , If all bits are 1, Think in a collection .

<dependency>
    <groupId>com.google.guava</groupId>
    <artifactId>guava</artifactId>
    <version>28.1-jre</version> 
</dependency>

source | https://www.cnblogs.com/cjsblog/p/11613708.html

Want to know more ? sweep Trace the QR code below and follow me

The background to reply " technology ", Join the technology group

【 Highlights 】

Point a praise + Looking at , Less bug ????

版权声明
本文为[Zhu Xiaosi]所创,转载请带上原文链接,感谢
https://chowdera.com/2020/12/20201206070037504h.html