当前位置:网站首页>How to choose when doing distributed cache and how to solve the problem?

How to choose when doing distributed cache and how to solve the problem?

2020-12-06 13:10:23 Magnon blossoms

Now , Cache system is widely used , Can be used to increase the number of concurrent 、 Data throughput , Improve the ability to respond quickly . So when the amount of data reaches a certain level , The stand-alone environment may seem to be a little inadequate , You need a distributed caching system .

1. The choice of caching system

file

1.1 Cache classification

As shown in the figure above , First of all, caching can be roughly divided into four categories .

  • CDN cache :CDN The content distribution network ,CDN Edge nodes cache data .
  • Reverse proxy cache : Such as Nginx The cache of .
  • Local cache : Represented by EhCache and Guava Cache.
  • Distributed cache : Each cache system .

1.2 Distributed cache

This paper mainly discusses the distributed cache system , Pictured 1-1 Shown , There are five kinds of :

among EvCache and Aerospike Usage scenarios are not so general and extensive .

  • EvCache: yes Netflix Based on the Memcached & Spymemcached Caching scheme for .
  • Aerospike: Can be based on SSD Of KV NoSQL database .

besides , There are also three common caching systems .

file

  • Tair: Alibaba open source , Cross machine room 、 Performance increases linearly with the addition of nodes 、 Suitable for large amount of data .Tair There are also three kinds of engines .

    • LDB: be based on google levelDB, Support KV And the class HashMap structure , Slightly lower performance , Persistence is the most reliable .
    • MDB: be based on Memcache, Support KV And the class HashMap, Best performance , Persistent storage is not supported .
    • RDB: be based on Redis.
  • Memcache: Data synchronization is not supported 、 Poor distributed support .

  • Redis: The community is active 、 Most used .

in summary , In general , Considering applicability and stability ,Redis Is the best choice to build a cache system . The following will be based on Redis Introduce .

2. Redis Cluster caching scheme

As shown in the top picture 1-1 Shown , Lists Redis Cluster high availability solution of , There are basically three kinds of .

2.1 The master-slave mechanism

Common cluster architecture , It's easy to build , It mainly realizes read-write separation and backup , Can be Master Read and write ,Slave Responsible for backup . But there is the complexity of fault recovery 、 It's hard to develop horizontally 、 Limited writing ability and so on . Structure diagram is as follows :

file

2.2 Sentinel mechanism

Redis Sentinel Is a native high availability solution from the community version . Any master-slave server is monitored by one or more sentinel instances , And in Master outage , Automatically shut down the server's Slave Server upgrade to master server , So as to ensure the availability of the system . Monitoring of master-slave implementation 、 Elector . But the main problem is to make sure Master Of HA Switch . Structure diagram is as follows :

file

2.3 " Distributed "

So far, the above two mechanisms can only be regarded as “ colony ”, Not strictly “ Distributed ”. Then let's take a look at distributed solutions .

Cluster emphasizes high availability , On the basis of distributed cluster .

3. Redis Distributed caching scheme

Any distributed storage system , The first thing to face is sharding( Fragmentation ) problem , As shown in the top picture 1-1 There are three solutions to this problem .

3.1 Client fragmentation

seeing the name of a thing one thinks of its function , The routing function of data fragmentation is handed over to the client , But it's a static slicing , Poor maintenance . Basically, it's not considered .

3.2 Agency segmentation

Distribute to specific through proxy redis example . There are two common solutions .

  • Twemproxy:Twitter Open source , Lightweight , No more maintenance , Can't expand smoothly / Shrinkage capacity , Operation and maintenance is not very friendly , Average performance .
  • Codis: Pea pods open source , Support level development , The operation and maintenance platform is perfect , Performance comparison Twemproxy fast .Codis It is widely used in China , At the same time, many companies have developed their own secondary scheme based on this idea . however Codis And no longer maintain .

Actually , These two proxy fragmentation schemes , It's all in Redis When the government did not launch a good distributed solution , After the official update provides better strategies, it is no longer maintained .

3.3 Server side fragmentation

This is about Redis Official program Redis-cluster .

stay Redis 3.0 There was no better distributed solution before , That's why third-party solutions have emerged .3.0 Start , The government has launched a decentralized distributed solution . The cluster contains 16384 Hash slots , Each node is responsible for part of it .

Let's look at the topology first :

file

Each node opens two TCP Connect , One is responsible for providing services to clients , One is responsible for communication between nodes .

Now let's talk about CAP 了 :Consistency( Uniformity )、Availability( Usability )、Partition tolerance( Partition tolerance ) . For distributed systems ,CAP One must be sacrificed .Redis Cluster The main design goal of is high performance 、 High availability and high expansion , We have to abandon some data consistency .

  • Data consistency : because Redis Cluster Use asynchronous replication , In some cases, for example Master Down but not synchronized to Slave, May result in lost writes . When it is absolutely necessary to support synchronous writing , It can be done by WAIT Command implementation , It can greatly reduce the possibility of losing writes .

  • Usability : When some nodes in the cluster fail , The cluster as a whole can respond to read and write requests from clients .

    • Nodes are timed to each other ping , When more than half Master Determine that a node failed , Then it is marked with FAIL, And it will send messages offline to the cluster broadcast node . For example, the offline node is a master node with slots , Select a replacement from its slave node .

file

  • High performance and expansion : Operate on something key when , It won't find the node before processing it , Instead, direct redirection to that node , At the same time, compared with the agent, there are fewer pieces proxy Connection loss of .

    • But it's going on multiple key Operation requires keys Located in the same slot On , Need to use hash tags, Use {} Force some of key Map to each slot, In order to carry out multiple .
    • In terms of expansion ,Redis Cluster Maximum support for linear expansion 1000 Nodes , After adding new nodes to the cluster, you can assign them evenly from the existing nodes through the command slot.

4. Cache FAQ

The above introduction briefly introduces the common cache system , And specifically listed based on Redis The cluster scheme of . Let's talk about the common problems of cache system .

As shown in the figure below , List seven common questions .

file

4.1. Cache penetration

Access to nonexistent data , To bypass the cache , Direct request to data source , When there are too many requests , Would be right. DB Create pressure .

  • empty key: For data that does not exist, it will also key Save null value into cache system , So the next visit will be returned . But only for empty data key Co., LTD. 、key The probability of repeated requests is high , If the quantity is large and does not repeat , It will cause a lot of useless key The creation of .
  • The bloon filter : The bloom filter is a very long binary vector and a series of random mapping functions . It can be used to retrieve whether an element adds a layer of filter to null values in a collection , Space and time efficiency is high . But because of hash The resulting collision may be misjudged , And because it doesn't store key It can't be deleted . For empty data key It's different 、 The probability of repeated requests is low .

4.2. Cache breakdown

Cache breakdown is actually a special case of cache avalanche . When something is hot key expires , There will be a lot of requests that are going to break down DB.

  • The mutex : When the cache fails , Not immediately load db, You can start with something like SETNX Wait for the order set One mutex key, When the operation returns success , It means you get the lock , At this point, the thread is in progress load db And update the cache ; Or you don't get the lock ( You can sleep for a while ) retry get Caching method . But pay attention to the risk of deadlock .

  • Not overdue

    • There are two concepts of non expiration , One means no expiration time , It really doesn't expire , That's OK .
    • The other is through business logic , take key The expiration time of , The request is to determine whether it is less than the value , Yes, background asynchronous update .

4.3. Cache avalanche

A large number of cache failures at the same time ( fault ), Request arrived. DB.

  • Random time : When setting the expiration time , It can be based on the base time + A random time , Equal to the realization of batch expiration .
  • Background update : To update the invalid work to the background timing thread .
  • Current limiting + Local cache : Such as ehcache Local cache + Hystrix Current limiting .
  • Double cache : It's similar to setting the master-slave cache , from key Not overdue .

4.4. Cache update and consistency

If data consistency is guaranteed . List four update strategies :

  • Cache Aside : Most commonly used . In case of failure, the data is returned to the source , to update ; Hit, , Return cached data ; Update the data source before updating , Update the cache again .

  • Write Back : When updating data , Update cache only , Do not update data sources . Cache asynchronous batch update database .

  • Read/Write Through

    • Write Through : When data is updated , Cache miss , Update database directly , And back to . Such as hit cache , Update cache , Again by Cache Update the database yourself .
    • Read Through : Updating the data source is operated by the caching system , When reading data, such as cache failure , Then retrieve the source data update cache .

4.5. Hot data

For hot data processing methods .

  • Split complex structures : Such as secondary data structure , To break up , Such a hot spot key It's broken down into a number of key Distributed to different nodes .
  • Migration hotspots : about Redis Cluster In terms of hot spots key Where slot Migrate to a single node alone , Reduce the pressure on other nodes .
  • Multiple copies : Duplicate multiple cached copies , Spread requests over multiple nodes , Reduce the pressure on a single cache server , Suitable for more reading and less writing .

4.6. Cache preheating

It refers to that some cache data can be loaded into the cache system ahead of time , Avoid in advance, such as hot data large number of requests to the library .

4.7. Cache degradation

When the number of visits is increasing 、 When there is a problem with a service or a non core service affects the performance of the core process , Still need to ensure that the main service is available . It can be automatically degraded according to some key data , The switch can also be configured for manual degradation .

5. Redis Cluster Use

about Redis Cluster It's very simple to build and use the environment .

No matter how it's based , Just set it up n platform redis Service and ensure that the services can communicate with each other , Arbitrarily enter a redis Service type :

redis-cli --cluster create IP1:port1 IP2:port2 IP3:port3 IP4:port4 IP5:port5 IP6:port6 ... --cluster-replicas 1

that will do . Can be used later cluster node and cluster info Command to view the cluster 、 Node information .

And for the general JAVA Development ,Spring Data Redis from 1.7 Support from the beginning Redis Cluster, Just configure Master Node address ( And password ).

spring.redis.cluster.nodes=ip1:port1,ip2:port2,ip3:port3

Join the rely on

compile("org.springframework.boot:spring-boot-starter-data-redis")

You can pass RedisTemplate Use .

6. summary

This paper starts from the choice of cache system , be based on Redis This paper introduces several cluster schemes and focuses on Redis Cluster programme . After that, the common problems and solutions of cache system are listed , Finally, it gives a brief explanation of its use .

Of course , How to land , How to solve these problems also needs to be analyzed and handled according to the actual scene .

Reference material

Link to the original text :https://mp.weixin.qq.com/s/cONKlGQze1WzyC2yzbZ0RA Welcome to the official account 【 Magnon blossoms 】 Learn and grow together I'll share it all the time Java dried food , We will also share free learning materials, courses and interviews reply :【 Computer 】【 Design patterns 】【 interview 】 There's a surprise

版权声明
本文为[Magnon blossoms]所创,转载请带上原文链接,感谢
https://chowdera.com/2020/12/20201206130711227u.html