当前位置:网站首页>Performance analysis and slow SQL optimization of 100% CPU consumption of PostgreSQL

Performance analysis and slow SQL optimization of 100% CPU consumption of PostgreSQL

2020-12-06 10:08:16 osc_ pmjgzgaa

 

See the number of connections changing

CPU Utilization reaches 100%, First doubt , Is the business peak active, the connection increases abruptly , And the result of insufficient resources reserved in the database . We need to look at , When the problem occurs , Whether the number of active connections is much more than usual . about RDS for PG, The number of connections on the database has changed , You can see from the monitoring information on the console . The current number of active connections > You can connect to the database directly , Use the following query to get :

select count( * ) from pg_stat_activity where state not like '%idle';

Tracking is slow SQL

  If the number of active connections changes in the normal range , It is very likely that there was a poor performance at that time SQL Caused by mass execution . because RDS It's slow SQL journal , We can use this log , It was time-consuming to locate at that time SQL To further analyze . But usually when problems happen , The whole system is at a standstill , all SQL Slow down , It was recorded at that time > slow SQL Probably a lot , It's not easy to find the culprit . Here we introduce a few when problems occur , That is to say, the intervention and tracking are slow SQL Methods .

1、 The first way is to use pg_stat_statements Plug in positioning is slow SQL, Steps are as follows .

1.1 If you don't create this plug-in , Manual creation required . We need to make use of the counting information in plug-ins and database systems ( Such as SQL Execution time accumulation, etc ), And this information is accumulating , Contains historical information . In order to facilitate the investigation of the current CPU Full question , We need to reset the counter first .

create extension pg_stat_statements;
select pg_stat_reset();
select pg_stat_statements_reset();

1.2 Wait for a while ( for example 1 minute ), Make the counter accumulate enough information .

1.3 The most time-consuming query SQL( It's usually the direct cause of the problem ).

select * from pg_stat_statements order by total_time desc limit 5;

1.4 Query read Buffer The most frequent SQL, these SQL It may be because the data being queried has no index , And it leads to too much Buffer read , At the same time, it consumes a lot of CPU.

select * from pg_stat_statements order by shared_blks_hit+shared_blks_read desc limit 5;

2、 The second way is , Directly through pg_stat_activity View , Use the query below , View the current long execution , It never ends SQL. these SQL The corresponding result is CPU full , There are also direct suspects .

select datname, usename, client_addr, application_name, state, backend_start, xact_start, xact_stay, query_start, query_stay, replace(query, chr(10), ' ') as query from (select pgsa.datname as datname, pgsa.usename as usename, pgsa.client_addr client_addr, pgsa.application_name as application_name, pgsa.state as state, pgsa.backend_start as backend_start, pgsa.xact_start as xact_start, extract(epoch from (now() - pgsa.xact_start)) as xact_stay, pgsa.query_start as query_start, extract(epoch from (now() - pgsa.query_start)) as query_stay , pgsa.query as query from pg_stat_activity as pgsa where pgsa.state != 'idle' and pgsa.state != 'idle in transaction' and pgsa.state != 'idle in transaction (aborted)') idleconnections order by query_stay desc limit 5;

 Copy code

datname |   usename   |  client_addr  |     application_name     | state  |         backend_start         |          xact_start           |   xact_stay   |          query_start          |  query_stay   |                                                                                                                                                                                  
                                                         query                                                                                                                                                                                                                                           
---------+-------------+---------------+--------------------------+--------+-------------------------------+-------------------------------+---------------+-------------------------------+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 denali  | denaliadmin | 10.222.16.45  | pgAdmin III - Query Tool | active | 2018-02-26 22:36:05.603781+00 | 2018-02-26 22:36:13.054396+00 | 187614.245395 | 2018-02-26 22:36:13.054396+00 | 187614.245395 | select * from gen3_search_eu_17q2_20171115_epl.place_name \r where place_id not in (select place_id from gen3_search_eu_17q1_20170308_epl.place_name ) \r and name not in (select name from gen3_search_eu_17q1_20170308_epl.place_name)\r and lang = 'ENG'\r limit 50
 denali  | denaliadmin | 10.222.16.45  | pgAdmin III - Query Tool | active | 2018-02-26 23:46:24.442846+00 | 2018-02-26 23:46:34.920261+00 |  183392.37953 | 2018-02-26 23:46:34.920261+00 |  183392.37953 | select * from gen3_search_eu_17q2_20171115_epl.place_name \r where place_id not in (select place_id from gen3_search_eu_17q1_20170308_epl.place_name ) \r and name not in (select name from gen3_search_eu_17q1_20170308_epl.place_name)\r and lang = 'ENG'\r limit 50\r 
 denali  | denaliadmin | 10.222.16.45  | pgAdmin III - Query Tool | active | 2018-02-27 01:19:53.83589+00  | 2018-02-27 01:20:01.519778+00 | 177785.780013 | 2018-02-27 01:20:01.519778+00 | 177785.780013 | select * from gen3_search_eu_17q2_20171115_epl.place_name \r where place_id not in (select place_id from gen3_search_eu_17q1_20170308_epl.place_name ) \r and name not in (select name from gen3_search_eu_17q1_20170308_epl.place_name)\r limit 50
 denali  | denaliadmin | 10.222.16.45  | pgAdmin III - Query Tool | active | 2018-02-27 01:46:05.207888+00 | 2018-02-27 01:47:52.039779+00 | 176115.260012 | 2018-02-27 01:47:52.039779+00 | 176115.260012 | select a.place_id, a.metadata_dictionary_id,a.value, a.lang, b.place_id, b.metadata_dictionary_id, b.value, b.lang\r from gen3_search_eu_17q1_20170308_epl.place_address a \r inner join gen3_search_eu_17q2_20171115_epl.place_address b\r on a.place_id = b.place_id \r where a.metadata_dictionary_id = b.metadata_dictionary_id and a.lang = b.lang and a.value!=b.value and b.place_id not in (select poi_id from gen3_search_eu_17q2_20171115_epl.place_embeded_ids)\r limit 100\r 
 denali  | denaliadmin | 10.224.14.148 | pgAdmin III - Query Tool | active | 2018-02-27 05:05:39.903885+00 | 2018-02-27 05:05:48.827779+00 | 164238.472012 | 2018-02-27 05:05:48.827779+00 | 164238.472012 | select a.place_id, a.metadata_dictionary_id,a.value, a.lang, b.place_id, b.metadata_dictionary_id, b.value, b.lang\r from gen3_search_eu_17q1_20170308_epl.place_address a \r inner join gen3_search_eu_17q2_20171115_epl.place_address b\r on a.place_id = b.place_id \r where a.metadata_dictionary_id = b.metadata_dictionary_id and a.lang = b.lang and a.value!=b.value and b.place_id not in (select poi_id from gen3_search_eu_17q2_20171115_epl.place_embeded_ids)\r limit 100\r 
(5 rows)

 Copy code

3、 The first 3 Methods , It's scanning from the data table (Table Scan) We're starting to look up , Find tables with missing indexes . If the data table is missing an index , Most of the hot data is in memory again ( For example, memory 8G, Thermal data 6G), At this point, the database can only use table scanning , And need to deal with a large number of unrelated records already in memory , And it costs a lot CPU. Especially for the number of records in the table 100 Table of , One table scan takes up a lot of CPU( Basically put one CPU completely fill ), Multiple connections are concurrent ( For example, hundreds of connections ), Put all the CPU completely fill .

3.1 Through the following query , Find the table that uses the most tables to scan :

select * from pg_stat_user_tables where n_live_tup > 100000 and seq_scan > 0 order by seq_tup_read desc limit 10;

3.2 Query the currently running slow query accessing the above table :

select * from pg_stat_activity where query ilike '%<table name>%' and query_start - now() > interval '10 seconds';

3.3 It can also be done through pg_stat_statements Plug in positioning involves querying these tables :

select * from pg_stat_statements where query ilike '%<table>%'order by shared_blks_hit+shared_blks_read desc limit 3;

Slow processing SQL

For the above method to find out the slow SQL, The first thing you need to do may be Cancel or Kill Get rid of them , Get the business back to business first :

select pg_cancel_backend(pid) from pg_stat_activity where  query like '%<query text>%' and pid != pg_backend_pid();
select pg_terminate_backend(pid) from pg_stat_activity where  query like '%<query text>%' and pid != pg_backend_pid();

If these SQL It's a business necessity , They need to be optimized . There are “ SanBanFu ”:

1、 For the tables involved in the query , perform ANALYZE <table> or VACUUM ANZLYZE <table>, Update table statistics , Make the query plan more accurate . Be careful , To avoid business impact , It's better to implement in the low peak of business .

2、 perform explain (query text) or explain (buffers true, analyze true, verbose true) (query text) command , see SQL Implementation plan of ( Be careful , The former will not actually execute SQL, The latter will actually execute and get detailed execution information ), About the Table Scan The tables involved , Index .

3、 Rewrite SQL, Remove unnecessary subqueries 、 rewrite UNION ALL、 Use JOIN CLAUSE Fix the connection sequence until , It's all further in-depth optimization SQL The means of , I won't go into details here .

版权声明
本文为[osc_ pmjgzgaa]所创,转载请带上原文链接,感谢
https://chowdera.com/2020/12/20201206100658058y.html