当前位置：网站首页>Interviewer: do you understand bitmap? In what scenarios have you used it? What problems have you encountered?
Interviewer: do you understand bitmap? In what scenarios have you used it? What problems have you encountered?
20201206 07:03:09 【Zhu Xiaosi】
Click on the above “ Zhu Xiaosi's blog ”, choice “ Set to star ”
The background to reply " book ", obtain
Bitmap The basic idea is to use one bit Bit to mark the corresponding Value, and Key That's the element . Because of the adoption of Bit Store data in units , So in terms of storage space , Can save a lot .（PS： Focus on Save storage space ）
Suppose there is such a need ： stay 20 Find a number out of 100 million random integers m Whether there is , And suppose 32 Bit operating system ,4G Memory
stay Java in ,int Occupy 4 byte ,1 byte =8 position （1 byte = 8 bit）
If you use int Storage , That's it 20 One hundred million int, So the space occupied is about (2000000000*4/1024/1024/1024)≈7.45G
If bit by bit storage is different ,20 One hundred million is 20 Billion , The occupied space is about (2000000000/8/1024/1024/1024)≈0.23G
Degree awarded , Needless to say
that , The problem is coming. , How to express a number ？
Said just now , Each bit represents a number ,0 Does not exist ,1 Indicates presence , That's exactly what binary
In this way, we can easily express {1,2,4,6} These numbers ：
The minimum unit of computer memory allocation is bytes , That is to say 8 position , If it means {12,13,15} What shall I do? ？
In the other, of course 8 It means ：
In this case , It's like a twodimensional array
1 individual int Occupy 32 position , Then we only need to apply for one int The array length is int tmp[1+N/32] You can store , among N Represents the maximum of these numbers to store , So ：
tmp[0]： Can be said 0~31
tmp[1]： Can be said 32~63
tmp[2]： Can be said 64~95
...
In this way , Given any integer M, that M/32 You get the subscript ,M%32 I know where it is in this subscript
add to
Here's the problem , How can we put a number in it ？ for example , Want to put 5 Put this number in , How to do it? ？
First ,5/32=0,5%32=5, It should be in tmp[0] Of the 5 A place , Then we put the 1 Move to the left 5 position , Then press the position or
To binary is to
This is equivalent to 86  32 = 118
86  (1<<5) = 118
b[0] = b[0]  (1<<5)
in other words , To insert a number , take 1 The left shift band represents the digit , Then with the original number by bit or operation
Simplify , Namely 86 + (5/8)  (1<<(5%8))
therefore , The formula can be summed up as ：p + (i/8)(1<<(i%8)) among ,p It means the present value ,i Indicates the number to be inserted
eliminate
Above is the addition of , So what to do if you want to clean it up ？
Or the example above , Suppose we want to 6 remove , How do you do that ？
See from the graph , Just place the number in 0 that will do
1 Move left 6 position , Just get there 6 The digit represented by this number , And then reverse by bit , Finally, it is bitwise with the original number , In this way, the position is 0 了
b[0] = b[0] & (~(1<<6))
b[0] = b[0] & (~(1<<(i%8)))
lookup
We said that before , Each represents a number ,1 Express （ Or existence ）,0 It means nothing （ Or it doesn't exist ）. By setting it to 1 perhaps 0 To add and remove guys , So to judge whether a data store exists or not is to judge the bit where the number is 0 still 1
hypothesis , We want to know 3 Is here or not , Then just judge b[0] & (1<<3) If the value is 0, There is no , If it is 1, It means being
Bitmap What's the usage?
Fast sorting of large amount of data 、 lookup 、 duplicate removal
Quick sort
Suppose we want to be right 07 Internal 5 Elements (4,7,2,5,3) Sort （ Let's assume that these elements are not repeated ）, We can use Bitmap To achieve the purpose of sorting .
To express 8 Number , We just need 8 individual Bit（1Bytes）, First we open up 1Byte Space , Put all of these spaces Bit All bits are set to 0, Then the corresponding position is 1.
Last , Go through it Bit Area , Output the number of the one bit （2,3,4,5,7）, In this way, the purpose of sorting is achieved , Time complexity O(n).
advantage :
It's very efficient , There is no need to compare and shift ;
Less memory , such as N=10000000; Just use... Memory N/8=1250000Byte=1.25M
shortcoming :
All the data can't be repeated . That is, duplicate data cannot be sorted and searched .
It's only when the data is dense that there are advantages
Get rid of the weight quickly
20 The number of non repeated integers found in billion integers , There's not enough memory for this 20 Million integers .
First , according to “ There's not enough memory for this 05 Million integers ” We can quickly associate with Bitmap. The next key question is how to design our Bitmap To show that 20 The state of a billion numbers . In fact, the problem is very simple , There are only three states of a number , There is no such thing as , only one , Repeat . therefore , We just need 2bits You can store the status of a number , Suppose we set a number that doesn't exist as 00, Once in a while 01, There are two or more times 11. Then we need storage space 2G about .
The next task is to put this 20 Put in a hundred million figures （ Storage ）, If the corresponding status bit is 00, Turn it into 01, It means that there is once ; If the corresponding status bit is 01, Turn it into 11, Indicates that there is already one , That is, many times ; If 11, Then the corresponding status bit remains the same , Still means there are many times .
Last , Statistics status bit is 01 The number of , And you get the number of numbers that don't repeat , The time complexity is O(n).
Quick search
This is what we said earlier ,int One element in the array is 4 Bytes take up 32 position , Then divide by 32 We know the subscript of the element , Yes 32 Mod （%32） Just know who it is , If that bit is 1, It means that there is .
Summary & review
Bitmap It is mainly used to quickly retrieve keyword status , The keyword is usually required to be a continuous sequence （ Or keywords are most of a sequential sequence ）, The most basic situation , Use 1bit Indicates the status of a keyword （ Two states can be marked ）, But you can also use 2bit（ Express 4 States ）,3bit（ Express 8 States ）.
Bitmap The main applications of ： Means continuous （ Or close to continuous , That is, most of them will appear ） The status of the keyword sequence （ Number of States / Number of keywords The smaller the better. ）.
32 On the bit machine , For an integer number , such as int a=1 In memory 32bit position , This is for the convenience of computer operation . But for some application scenarios , It's a huge waste , Because we can use the corresponding 32bit Bits correspond to decimal bits 031 Number , And this is Bitmap The basic idea of .Bitmap The algorithm uses this idea to process a large number of data sorting 、 Inquiry and de duplication .
Add 1
Without overflowing the numbers , For positive and negative numbers , Moving one bit to the left is equivalent to multiplying by 2 Of 1 Power , Move left n Bit is equal to times 2 Of n Power , To shift one bit to the right is to divide 2, Move right n Bits are equal to dividing by 2 Of n Power .
<< Move left , Equivalent to times 2 Of n Power , for example ：1<<6 amount to 1×64=64,3<<4 amount to 3×16=48
>> Move right , Equivalent to divided by 2 Of n Power , for example ：64>>3 amount to 64÷8=8
^ Exclusive or , It's equivalent to finding the remainder , for example ：48^32 amount to 48%32=16
Add 2
Don't use thirdparty variables , Exchange the values of two variables
// Mode one
a = a + b;
b = a  b;
a = a  b;
// Mode two
a = a ^ b;
b = a ^ b;
a = a ^ b;
BitSet
BitSet It implements a bit vector , It can grow as needed . Every bit has a Boolean value . One BitSet The bits of can be indexed by nonnegative integers （PS： It means that every bit can represent a non negative integer ）. Can find 、 Set up 、 Clear a person . You can modify another... With logical operators BitSet The content of . By default , All bits have a default value false.
You can see , It's similar to what we thought before
Use one long Array to store , Initial length 64,set Move right first when it's worth 6 position （ Equivalent to divided by 64） Calculate where in the array , Then change the status bit
It doesn't matter if you don't understand anything else , It's enough to understand these two sentences ：
int wordIndex = wordIndex(bitIndex);
words[wordIndex] = (1L << bitIndex);
Bloom Filters
Bloom filter It's a data structure , It can be used to determine whether an element is in a collection , With fast operation , Small memory footprint .
The cost of efficient insertion and query is ,Bloom Filter It's a probability based data structure ： It can only tell us that an element is absolutely not or may be in a collection .
Bloom filter The basic data structure of is a Bit vector （ It can be understood as an array ）.
It is mainly used in largescale data without precise filtering , Such as checking spam address , Reptiles URL Address to heavy , Solve the problem of cache penetration
If you want to determine whether an element is in a set , The general idea is to save all the elements in the collection , And then through comparison to determine . Linked list 、 Trees 、 Hash table （ Hashtable ） And so on, the data structure is this way of thinking , But as the number of elements in the collection increases , Need more and more storage space ; At the same time, the retrieval speed is getting slower and slower , The complexity of retrieval time is O(n)、O(log n)、O(1).
The principle of Bloom filter is , When an element is added to a collection , adopt K Hash functions map this element to a set of bits （Bit array） Medium K A little bit , Set them as 1 . Search time , Just to see if all these points are 1 To know if the element is in the collection ; If any of these points 0, The inspected element must not be in ; If it's all 1, Then the inspected element is likely to be （ Reason why “ Probably ” It's the error ）.
BloomFilter technological process
1、 The first thing you need to k individual hash function , Every function can put key Hash to 1 It's an integer ;
2、 On initialization , Need a length of n Array of bits , Each bit is initialized to 0;
3、 Some key When you join the assembly , use k individual hash Function calculated k Hash values , And the corresponding bit position in the array is 1;
4、 Judge a certain key Is it in the assembly , use k individual hash Function calculated k Hash values , And query the corresponding bits in the array , If all bits are 1, Think in a collection .
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>28.1jre</version>
</dependency>
source  https://www.cnblogs.com/cjsblog/p/11613708.html
Want to know more ？ sweep Trace the QR code below and follow me
The background to reply " technology ", Join the technology group
【 Highlights 】
original OpenAPI Standard specification
It's so simple  ES The most detailed use of the tutorial
ClickHouse What is it ？ Why is it so cool ！
original ElasticSearch That's understandable
interviewer ：InnoDB One of them B+ How many rows of data can a tree hold ？
How to decouple in microservices ？ How to reconstruct under tight coupling ？
How to build a set of high performance 、 High availability 、 Low cost video processing system ？
The way of Architecture ： Separate business logic from technical details
Starbucks doesn't use twophase commit
Point a praise + Looking at , Less bug ????
版权声明
本文为[Zhu Xiaosi]所创，转载请带上原文链接，感谢
https://chowdera.com/2020/12/20201206070037504h.html
边栏推荐
 C++ 数字、string和char*的转换
 C++学习——centos7上部署C++开发环境
 C++学习——一步步学会写Makefile
 C++学习——临时对象的产生与优化
 C++学习——对象的引用的用法
 C++编程经验（6）：使用C++风格的类型转换
 Won the CKA + CKS certificate with the highest gold content in kubernetes in 31 days!
 C + + number, string and char * conversion
 C + + Learning  capacity() and resize() in C + +
 C + + Learning  about code performance optimization
猜你喜欢

C + + programming experience (6): using C + + style type conversion

Latest party and government work report ppt  Park ppt

在线身份证号码提取生日工具

Online ID number extraction birthday tool

️野指针？悬空指针？️ 一文带你搞懂！

Field pointer? Dangling pointer? This article will help you understand!

HCNA Routing＆Switching之GVRP

GVRP of hcna Routing & Switching

Seq2Seq实现闲聊机器人

【闲聊机器人】seq2seq模型的原理
随机推荐
 LeetCode 91. 解码方法
 Seq2seq implements chat robot
 [chat robot] principle of seq2seq model
 Leetcode 91. Decoding method
 HCNA Routing＆Switching之GVRP
 GVRP of hcna Routing & Switching
 HDU7016 Random Walk 2
 [Code+＃1]Yazid 的新生舞会
 CF1548C The Three Little Pigs
 HDU7033 Typing Contest
 HDU7016 Random Walk 2
 [code + 1] Yazid's freshman ball
 CF1548C The Three Little Pigs
 HDU7033 Typing Contest
 Qt Creator 自动补齐变慢的解决
 HALCON 20.11：如何处理标定助手品质问题
 HALCON 20.11：标定助手使用注意事项
 Solution of QT creator's automatic replenishment slowing down
 Halcon 20.11: how to deal with the quality problem of calibration assistant
 Halcon 20.11: precautions for use of calibration assistant
 “十大科学技术问题”揭晓！青年科学家50²论坛
 "Top ten scientific and technological issues" announced Young scientists 50 ² forum
 求反转链表
 Reverse linked list
 js的数据类型
 JS data type
 记一次文件读写遇到的bug
 Remember the bug encountered in reading and writing a file
 单例模式
 Singleton mode
 在这个 N 多编程语言争霸的世界，C++ 究竟还有没有未来？
 In this world of N programming languages, is there a future for C + +?
 es6模板字符
 js Promise
 js 数组方法 回顾
 ES6 template characters
 js Promise
 JS array method review
 【Golang】️走进 Go 语言️ 第一课 Hello World
 [golang] go into go language lesson 1 Hello World