当前位置:网站首页>An architect's way of cache training

An architect's way of cache training

2020-12-07 13:57:26 Career advancement of it people

A senior architect of qiniu once said such a sentence :

Nginx+ Business logic layer + database + Cache layer + Message queue , This model can adapt to almost most business scenarios .

All these years have passed , This sentence deeply or shallowly affected my technical choice , So I spent a lot of time focusing on cache related technologies .

I am here 10 I started using caching years ago , From local cache 、 To distributed caching 、 And then to multi-level caching , Stepped on many pits . Now I combine my own experience of using cache , Talk about my understanding of caching .

01 Local cache

1. Page level cache

I used caching very early ,2010 I used it about two years ago OSCache, At that time, it was mainly used in JSP Page is used to implement page level caching . Pseudo code is like this :

<cache:cache key="foobar" scope="session">   
      some jsp content   

The middle part JSP The code will use key="foobar" cached session in , So that other pages can share this cache . In the use of JSP In the context of this ancient technology , By introducing OSCache after , The loading speed of the page does improve very fast .

But with the separation of front end and back end and the rise of distributed caching , Server side page level caching has been rarely used . But in the front end , Page level caching is still very popular .

2. Object caching

2011 About years ago , The sweet potato brother of open source China has written many articles about cache . He mentioned : Open source China has millions of dynamic requests every day , Only 1 platform 4 Core 8G The server will be able to carry , Thanks to the caching framework Ehcache.

It makes me so fascinated , A simple framework can achieve such a single performance , I want to try . therefore , I refer to sweet potato brother's example code , For the first time in the company's balance withdrawal service Ehcache.

Logic is very simple , It is to cache the successful or failed orders , So the next time you look up , No more inquiries about Alipay services . Pseudo code is like this :

After adding the cache , The effect of optimization is obvious , The task takes time from the original 40 The minutes are down to 5~10 minute .

The example above is typical of 「 Object caching 」, It's the most common application scenario for local caching . Compared to page caching , It's finer grained 、 More flexible , Often used to cache data that changes little , such as : Global configuration 、 Orders with status closed, etc , Used to improve the overall query speed .

3. Refresh strategy

2018 year , My partner and I have developed the configuration center , In order to make the client read the configuration as fast as possible , The local cache uses Guava, The overall structure is shown in the figure below :

How is the local cache updated ? There are two mechanisms :

  • The client starts the scheduled task , Pull data from configuration center .

  • When data changes in the configuration center , Actively push it to the client . I don't use it here websocket, It USES RocketMQ Remoting Communication framework .

And then I read Soul Source code of gateway , Its local cache update mechanism is shown in the figure below , In support of 3 Strategies :

▍ zookeeper watch Mechanism

soul-admin At startup , The data will be written in full zookeeper, When subsequent data changes , Will be incrementally updated zookeeper The node of . meanwhile ,soul-web Nodes that listen for configuration information , Once there is a change in information , Will update the local cache .

▍ websocket Mechanism

websocket and zookeeper The mechanism is a little bit similar , When gateway and admin Set up for the first time websocket When the connection ,admin Will push a full amount of data , Later, if the configuration data changes , Then the incremental data is passed through websocket Actively push to soul-web.

▍ http Long polling mechanism

http After the request arrives at the server , Not immediately , But the use of Servlet 3.0 The asynchronous mechanism of response data . When the configuration changes , The server will remove the long polling requests in the queue one by one , Tell me which Group Data has changed , When the gateway receives a response , Ask again that Group Configuration data of .

I don't know if you found out ?

  • pull Patterns are essential
  • Incremental push is much the same

Long polling is an interesting topic , This pattern is in RocketMQ The consumer model is also used , Close to quasi real time , And it can reduce the pressure on the server .

02 Distributed cache

About distributed caching , memcached and Redis It should be the most commonly used technology selection . I believe programmers are very familiar with , I'd like to share two cases here .

1. Reasonable control of object size and reading strategy

2013 year , I work for a lottery company , Our live score module also uses distributed caching . at that time , I met a Young GC Frequent online questions , adopt jstat After checking the tools , It turns out that the new generation is full every two seconds .

Further positioning analysis , It turns out to be some key The cache value It's too big , On average 300K about , The biggest one is 500K. In this way, high and low , It's easy Lead to GC frequent .

After finding the root cause , How to change it ? I didn't have a clear idea . therefore , I went to my peers' websites to see how they did the same thing , Include : 360 lottery , Passenger network . I found two things :

1、 The data format is very compact , Only return the necessary data to the front end , Part of the data is returned as an array

2、 Use websocket, Push the full amount of data after entering the page , Data changes push incremental data

Back to my question , What is the final solution ? at that time , Our live score module cache format is JSON Array , Each array element contains 20 Multiple key value pairs , Below JSON I just list the examples 4 Attributes .

     "guestTeamName":" Calf ",
     "hostTeamName":" Lakers ",

This data structure , In general, there is no problem . But when the number of fields reaches 20 Multiple , And there are many matches every day , In fact, it is easy to cause problems in high concurrency requests .

Based on the duration and risk considerations , In the end, we adopt a conservative optimization scheme :

1) Change the size of the new generation , From the original 2G Modified into 4G

2) The format of the cached data is changed from JSON Change to array , As shown below :

[["2399"," Calf "," Lakers ","123"]]

After the modification is completed , The size of the cache varies from average 300k From left to right 80k about ,YGC The frequency drop is obvious , At the same time, the page response is much faster .

But after a while ,cpu load It will fluctuate higher in an instant . so , Although we reduced the cache size , But reading large objects is still a great loss of system resources , Lead to Full GC The frequency is not low .

3) In order to solve the problem completely , We use a more refined cache read strategy .

We split the cache into two parts , The first part is full data , The second part is incremental data ( Small amount of data ). Pull data for the first time , When the score changes , adopt websocket Push incremental data .

The first 3 After step , Page access speed is extremely fast , The server also uses very little resources , The effect of optimization is excellent .

After this optimization , I understand : Although caching can improve the overall speed , But in a high concurrency scenario , Cache object size is still a concern , If you don't pay attention, there will be accidents . In addition, we also need to control the reading strategy reasonably , Minimize GC The frequency of , To improve overall performance .

2. Pagination list query

How to cache lists is a skill I'm eager to share with you . This knowledge point is also me 2012 What I learned from open source China in , Now I will 「 Check the list of blogs 」 For example .

Let's start with 1 Kind of plan : Cache the paged content as a whole . This kind of plan will Combine page numbers and page sizes into a cache key, The cache value is the list of blog information . If a blog content changes , We're going to reload the cache , Or delete the entire page cache .

This program , The granularity of the cache is relatively large , If blog updates are frequent , The cache is easy to fail . Now I'll introduce the next 2 Kind of plan : Only cache blogs . The process is as follows :

1) First query the blog of the current page from the database id list ,sql similar :

select id from blogs limit 0,10 

2) Batch get blogs from cache id List corresponding cache data , And keep track of missed blogs id, If you don't hit id List greater than 0, Query again from the database , And put it in the cache ,sql similar :

select id from blogs where id in (noHitId1, noHitId2)

3) Cache blog objects that are not cached

4) Return to the list of blog objects

Theoretically , If the cache is warmed up , A simple database query , One cache batch fetch , You can return all the data . in addition , About slow Storage batch access , How to achieve ?

  • Local cache : Extremely high performance ,for Circulation is enough
  • memcached: Use mget command
  • Redis: If the cache object structure is simple , Use mget 、hmget command ; If the structure is complex , Consider using pipleline,lua Script mode

The first 1 These solutions are suitable for scenarios where data rarely changes , Like the leaderboard , Home page news information, etc .

The first 2 This scheme is suitable for most paging scenarios , And it can be integrated with other resources . give an example : In the search system , We can find the blog through the filter conditions id list , And then through the above way , Get a quick list of blogs .

03 Multi level cache

First, make sure why you want to use multi-level caching ?

Local caching is extremely fast , But the capacity is limited , And can't share memory . The distributed cache capacity is scalable , But in high concurrency scenarios , If all the data has to be retrieved from the remote cache , It's easy to cause bandwidth to run full , Throughput drops .

There is a good saying , The closer the cache is to the user, the more efficient it is !

The advantage of using multilevel caching is that : High concurrency scenarios , It can improve the throughput of the whole system , Reduce the pressure on distributed caching .

2018 year , An e-commerce company I work for needs to app Home interface performance optimization . It took me about two days to complete the whole project , It's a two-level cache model , At the same time, it makes use of guava The lazy loading mechanism of , The overall structure is shown in the figure below :

The cache read process is as follows :

1、 When the service gateway is just started , The local cache has no data , Read Redis cache , If Redis There's no data in the cache , Through RPC Read data call service , Then write the data to the local cache and Redis in ; if Redis Cache is not empty , Write the cache data to the local cache .

2、 Because of the steps 1 The local cache has been warmed up , Subsequent requests read the local cache directly , Back to the client .

3、Guava Configured with refresh Mechanism , Custom is called every once in a while LoadingCache Thread pool (5 Maximum threads ,5 Core threads ) Go to shopping guide service to synchronize data to local cache and Redis in .

After optimization , Good performance , The average time spent is 5ms about . At first, I thought the probability of problems was very small , But one night , Suddenly found that app The data displayed on the front page is the same from time to time , Sometimes it's different .

in other words : although LoadingCache The thread has been calling the interface to update the cache information , But each The data in the server's local cache is not completely consistent . It illustrates two important points :

1、 Lazy loading can still cause data inconsistency between multiple machines

2、 LoadingCache The number of thread pools is not reasonable , It causes threads to pile up

Final , Our solution is :

1、 Lazy loading combines with message mechanism to update cache data , That is to say : When the configuration of shopping guide service changes , Inform the service gateway to pull data again , Update cache .

2、 Turn it up properly LoadigCache Thread pool parameters of , And bury some points in the pipeline pool , Monitor thread pool usage , When the thread is busy, it can send an alarm , Then dynamically modify the thread pool parameters .

At the end

Caching is a very important technology . If you can go from principle to practice , Keep going deep into it , This should be the most enjoyable thing for technicians .

This article is the beginning of the cache series , It's more about me 10 Typical problems encountered in years of work , It doesn't go very deep into the theoretical knowledge .

I think I should communicate with my friends more : How to systematically learn a new technology .

  • Choose the classic book of the technology , Understand the basic concepts
  • Build the knowledge context of the technology
  • Unity of , Practice or build your own wheels in a production environment
  • Keep replaying , Think about whether there is a better plan

I'll serialize some cache related content later : High availability mechanisms including caching 、codis The principle of , Welcome to continue to pay attention to .

About caching , If you have your own experience or what you want to know more about , Leave a comment in the comments section .

Author's brief introduction :985 master , Former Amazon Engineer , present 58 Transfer to technical director

Welcome to scan the QR code below , Pay attention to my official account :IT People's career advancement

本文为[Career advancement of it people]所创,转载请带上原文链接,感谢