当前位置:网站首页>FGC online service troubleshooting, this is enough!

FGC online service troubleshooting, this is enough!

2020-11-09 11:36:39 Geek Xiaozhi

GC The principle of the operation of

I shared an article earlier GC Problem case analysis

During the analysis of the whole case , Actually, it involves a lot of GC Knowledge of the principles of , If you don't understand these principles, deal with it , In fact, the whole investigation process is very blind .

here , I choose a few core knowledge points , Let's introduce GC Operating principle , Finally, a practical guide is given .

1. Heap memory structure

Everybody knows : GC It is divided into YGC and FGC, They all happen in JVM On the heap memory of . Let's start with JDK8 The heap memory structure of :

You can see , The heap memory has a generational structure , Including the new generation and the old generation . The Cenozoic is subdivided into :Eden District ,From Survivor District ( abbreviation S0),To Survivor District ( abbreviation S1 District ), The default ratio of the three is 8:1:1. in addition , The default ratio between the new generation and the old generation is 1:2.

The reason why heap memory is generational , Considering that most objects are short-lived , In this way, objects with different lifecycles can be placed in different areas , Then different garbage collection algorithms are used for the new generation and the old generation , Thus making GC Highest efficiency .

2. YGC When was it triggered ?

Most of the time , The object is directly in the younger generation Eden District Distribution , If Eden There's not enough space in the area , Then it will trigger YGC(Minor GC),YGC Processing area Only the new generation . Because most objects are retrievable in a short period of time , therefore YGC After that, only a very small number of objects survived , And moved to S0 District ( It's a replication algorithm ).

When triggered the next time YGC when , Will Eden Area and S0 The surviving objects of the zone are moved to S1 District , At the same time to empty Eden Area and S0 District . When triggered again YGC when , At this point, the processing area becomes Eden Area and S1 District ( namely S0 and S1 Make a role switch ). Every time YGC, The age of the living object will increase 1.

3. FGC When was it triggered again ?

below 4 In this case , The object goes into the older generation :

  • YGC when ,To Survivor The area is not enough to hold live objects , The object goes straight into the old age .
  • After many times YGC after , If the age of the surviving object reaches the set threshold , You'll be promoted to the older generation .
  • Dynamic age determination rules ,To Survivor Objects of the same age in the district , If the sum of its size takes up To Survivor More than half of the space , Then objects older than this age will go directly to the older generation , Instead of reaching the default generational age .
  • Big object : from -XX:PretenureSizeThreshold Start parameter control , If the object size is greater than this value , It will bypass the new generation , Distribute directly among the older generation .

When the number of people promoted to the old age is larger than the remaining space of the old age , It will trigger FGC(Major GC),FGC The processing area includes both the new and old generations . besides , And the following 4 There are also situations that trigger FGC:

  • Memory usage in the old age has reached a certain threshold ( It can be adjusted by parameters ), Direct trigger FGC.
  • Space allocation guarantee : stay YGC Before , It will check whether the maximum available continuous space of the old generation is greater than the total space of all objects of the new generation . If it is less than , explain YGC It's not safe , You will see the parameters HandlePromotionFailure Whether it is set to allow guarantee failure , If not, it will trigger directly Full GC; If allowed , Then it will further check whether the maximum available continuous space of the elderly is greater than the average size of the objects promoted to the old age , If it is less than, it will trigger Full GC.
  • Metaspace( Meta space ) When the space is insufficient, it will be expanded , When it's expanded to -XX:MetaspaceSize The specified value of the parameter , It will also trigger FGC.
  • System.gc() perhaps Runtime.gc() When explicitly called , Trigger FGC.

4. Under what circumstances ,GC Will have an impact on the program ?

No matter YGC still FGC, Will cause a certain degree of program jam ( namely Stop The World problem :GC The thread starts working , Other worker threads are suspended ), Even if used ParNew、CMS perhaps G1 These more advanced garbage collection algorithms , And it's just reducing the stuck time , And it doesn't completely eliminate the stuck .

Well, under what circumstances ,GC It will affect the program ? According to the severity, from high to low , I think it includes the following 4 In this case :

  • FGC all too often :FGC It's usually slower , Hundreds of milliseconds at least , A few seconds more , Normal condition FGC Every few hours or even days , The impact on the system is acceptable . however , Once it appears FGC frequent ( For example, it will be executed once in dozens of minutes ), There is a problem with this affirmation , It causes worker threads to be stopped frequently , Make the system look stuck all the time , It also makes the overall performance of the program worse .
  • YGC It takes too long : Generally speaking ,YGC It's normal to spend tens or hundreds of milliseconds , Although it can cause the system to stall for a few milliseconds or dozens of milliseconds , This situation is almost imperceptible to users , The impact on the program is negligible . But if YGC It took time to 1 Seconds or even seconds ( Catch up with FGC Time consuming ), Then the stuck time will increase , add YGC It's more frequent , It will lead to more service timeout problems .
  • FGC It takes too long :FGC Time consuming increase , The time to get stuck will also increase , Especially for high concurrency services , May lead to FGC There are many overtime problems during the period , Reduced availability , This also needs attention .
  • YGC all too often : Even if YGC Does not cause service timeout , however YGC Too frequent can also reduce the overall performance of the service , For high concurrency services, we also need to pay attention to .

among ,「FGC all too often 」 and 「YGC It takes too long 」, These two situations are typical GC problem , Probabilities have an impact on the quality of service of the program . The remaining two cases are less severe , But for highly concurrent or highly available programs, we also need to pay attention to .

screening FGC A practical guide to the problem

Through the above case analysis and theoretical introduction , Then sum up FGC How to find out the problem , As a practical guide for your reference .

1. Clearly, from a procedural point of view , What are the causes of FGC? 

  • Big object : The system loaded too much data into memory at one time ( such as SQL The query is not paginated ), This leads to the old age of big objects .
  • Memory leak : A large number of objects are created frequently , But it can't be recycled ( such as IO Object is not called after it is used close Method to release resources ), First trigger FGC, Finally lead to OOM.
  • Programs frequently generate long-lived objects , When the survival age of these objects exceeds the age of generations, they will enter the old generation , In the end FGC. ( This is the case in this paper )
  • Program BUG As a result, many new classes are generated dynamically , bring Metaspace Constantly occupied , First trigger FGC, Finally lead to OOM.
  • The code explicitly calls gc Method , Including your own code and even the code in the framework .
  • JVM Parameter setting : Including the total memory size 、 The size of the new generation and the old generation 、Eden Area and S The size of the area 、 Metaspace size 、 Garbage collection algorithm and so on .

2. Be clear about what tools you can use when troubleshooting

  • The company's monitoring system : Most companies have , It can be monitored in all directions JVM The indicators of .
  • JDK The tools that come with you , Include jmap、jstat And so on : # see The utilization rate of each area of heap memory as well as GC situation jstat -gcutil -h20 pid 1000 # View live objects in heap memory , And sort by space jmap -histo pid | head -n20 # dump Heap memory file jmap -dump:format=b,file=heap pid
  • Visual heap memory analysis tool :JVisualVM、MAT etc.

3. Troubleshooting guide

  • Check the monitor , To find out when the problem occurred and the current FGC The frequency of ( Can compare the normal situation to see whether the frequency is normal )
  • Find out if there is any program online before this time point 、 Basic component upgrade, etc .
  • understand JVM Parameter Settings , Include : The size setting of each area of heap space , Which garbage collectors are used by the new generation and the old generation respectively , Then analysis JVM Whether the parameter setting is reasonable .
  • And then the steps 1 Do the exclusion method for the possible causes listed in , Where the meta space is filled 、 Memory leak 、 The code explicitly calls gc The method is easier to check .
  • Caused by large objects or long-lived objects FGC, It can be done by jmap -histo Command and combine dump Heap memory file for further analysis , You need to locate the suspicious object first .
  • Locate the suspicious object to the specific code and analyze again , It's time to combine GC The principle and JVM Parameter setting , Make sure whether the suspicious object meets the conditions for entering the old age .

版权声明
本文为[Geek Xiaozhi]所创,转载请带上原文链接,感谢