Oracle歸檔紀錄檔暴增排查優化

2022-07-11 18:00:17

1、ORACLE歸檔紀錄檔介紹

歸檔紀錄檔暴增是oracle比較常見的問題,遇到歸檔紀錄檔暴增,我們該如何排查:
  • 歸檔紀錄檔暴增一般都是應用或者人為引起的
  • 理解歸檔紀錄檔儲存的是什麼
  • 如何排查歸檔紀錄檔暴增原因
  • 如何優化歸檔紀錄檔暴增

1.1 歸檔紀錄檔是什麼

歸檔紀錄檔(Archive Log)是非活動的重做紀錄檔(redo)備份.
通過使用歸檔紀錄檔,可以保留所有重做歷史記錄,當資料庫處於ARCHIVELOG模式並進行紀錄檔切換式,後臺程序ARCH會將重做紀錄檔的內容儲存到歸檔紀錄檔中.
當資料庫出現媒介失敗時,使用資料檔案備份,歸檔紀錄檔和重做紀錄檔可以完全恢復資料庫。

1.2 歸檔紀錄檔儲存的是什麼

所有重做的歷史記錄,包括DML語句、資料改變等

1.3 歸檔紀錄檔暴增的原因

一般是DML操作大量的資料,導致歸檔紀錄檔暴增

1.4 排查歸檔紀錄檔暴增的方法

1.SQL語句
2.AWR
3.挖掘歸檔紀錄檔

2、歸檔紀錄檔暴增排查實戰

2.1 製造歸檔紀錄檔暴增

create table scott.object as select * from dba_objects;

-- 執行10次
-- insert
insert into scott.object select * from scott.object;
select count(1) from scott.object;
-- 49384448

-- update
update SCOTT.object set owner='aa';

-- delete
delete from SCOTT.object;
truncate table SCOTT.object;

2.2 檢視歸檔紀錄檔切換

SELECT
    THREAD# id,SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH:MI:SS'),1,5)                          DAY
  , SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'00',1,0)) H00
  , SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'01',1,0)) H01
  , SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'02',1,0)) H02
  , SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'03',1,0)) H03
  , SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'04',1,0)) H04
  , SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'05',1,0)) H05
  , SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'06',1,0)) H06
  , SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'07',1,0)) H07
  , SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'08',1,0)) H08
  , SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'09',1,0)) H09
  , SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'10',1,0)) H10
  , SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'11',1,0)) H11
  , SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'12',1,0)) H12
  , SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'13',1,0)) H13
  , SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'14',1,0)) H14
  , SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'15',1,0)) H15
  , SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'16',1,0)) H16
  , SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'17',1,0)) H17
  , SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'18',1,0)) H18
  , SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'19',1,0)) H19
  , SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'20',1,0)) H20
  , SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'21',1,0)) H21
  , SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'22',1,0)) H22
  , SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'23',1,0)) H23
FROM
  v$log_history  a
GROUP BY SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH:MI:SS'),1,5),THREAD#
ORDER BY id,SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH:MI:SS'),1,5) 
/
代表12月19號,H20(20-21時),共切換24個歸檔紀錄檔,如果每一個500M,那麼總共約500M*24,對比其餘時間,可以說該時間產生異常的歸檔紀錄檔,目標排查改時間段

2.3 SQL語句判斷

with aa as 
(SELECT IID,
       USERNAME,
       to_char(BEGIN_TIME,'mm/dd hh24:mi') begin_time,
       SQL_ID,
       decode(COMMAND_TYPE,3,'SELECT',2,'INSERT',6,'UPDATE',7,'DELETE',189,'MERGE INTO','OTH') "SQL_TYPE",
       executions "EXEC_NUM",
       rows_processed "Change_NUM"
  FROM (SELECT s.INSTANCE_NUMBER IID,
               PARSING_SCHEMA_NAME USERNAME,COMMAND_TYPE,
               cast(BEGIN_INTERVAL_TIME as date) BEGIN_TIME,
               s.SQL_ID,
               executions_DELTA executions,
               rows_processed_DELTA rows_processed,
               (IOWAIT_DELTA) /
               1000000 io_time,
               100*ratio_to_report(rows_processed_DELTA) over(partition by s.INSTANCE_NUMBER, BEGIN_INTERVAL_TIME) RATIO,
               sum(rows_processed_DELTA) over(partition by s.INSTANCE_NUMBER, BEGIN_INTERVAL_TIME) totetime,
               elapsed_time_DELTA / 1000000 ETIME,
               CPU_TIME_DELTA / 1000000 CPU_TIME,
               (CLWAIT_DELTA+APWAIT_DELTA+CCWAIT_DELTA+PLSEXEC_TIME_DELTA+JAVEXEC_TIME_DELTA)/1000000 OTIME,
               row_number() over(partition by s.INSTANCE_NUMBER,BEGIN_INTERVAL_TIME order by rows_processed_DELTA desc) TOP_D
                   FROM dba_hist_sqlstat s, dba_hist_snapshot sn,dba_hist_sqltext s2
         where s.snap_id = sn.snap_id
           and s.INSTANCE_NUMBER = sn.INSTANCE_NUMBER 
           and rows_processed_DELTA is not null
           and s.sql_id = s2.sql_id and COMMAND_TYPE in (2,6,7,189)
           and sn.BEGIN_INTERVAL_TIME > sysdate - nvl(180,1)/1440         and PARSING_SCHEMA_NAME<>'SYS')
 WHERE TOP_D <= nvl(20,1)
  )
select aa.*,s.sql_fulltext "FULL_SQL" from aa left join  v$sql s on  aa.sql_id=s.sql_id ORDER BY IID, BEGIN_TIME desc,"Change_NUM" desc
檢視2小時的資料該變數,可以看出Change_NUM資料該變數和執行次數EXEC_NUM和SQL語句,update回滾了,所以沒有該變數。
此時可以判斷大量插入資料導致歸檔紀錄檔暴增,此時並不能判斷update。此語句不一定有資料,只能做參考。

2.4 AWR

 建立AWR報告
建立AWR報告
@?/rdbms/admin/awrrpt.sql
SQL> @?/rdbms/admin/awrrpt.sql

Current Instance
~~~~~~~~~~~~~~~~

   DB Id    DB Name     Inst Num Instance
----------- ------------ -------- ------------
 3830097027 .....        1 .....


Specify the Report Type
~~~~~~~~~~~~~~~~~~~~~~~
Would you like an HTML report, or a plain text report?
Enter 'html' for an HTML report, or 'text' for plain text
Defaults to 'html'
Enter value for report_type: html

Type Specified:  html


Instances in this Workload Repository schema
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

   DB Id     Inst Num DB Name       Instance    Host
------------ -------- ------------ ------------ ------------
* 3830097027        1 .....       .....    dbserver01

Using 3830097027 for database Id
Using           1 for instance number


Specify the number of days of snapshots to choose from
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Entering the number of days (n) will result in the most recent
(n) days of snapshots being listed.  Pressing <return> without
specifying a number lists all completed snapshots.

Enter value for num_days: 1

Listing the last day's Completed Snapshots

                            Snap
Instance     DB Name        Snap Id    Snap Started    Level
------------ ------------ --------- ------------------ -----
.....         .....         36 19 Dec 2021 14:03       1
                 37 19 Dec 2021 15:00       1
                 38 19 Dec 2021 16:00       1
                 39 19 Dec 2021 17:00       1
                 40 19 Dec 2021 18:00       1

                 41 19 Dec 2021 20:12       1
                 42 19 Dec 2021 21:03       1

Specify the Begin and End Snapshot Ids
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Enter value for begin_snap: 41
Begin Snapshot Id specified: 41

Enter value for end_snap: 42
End   Snapshot Id specified: 42

Specify the Report Name
~~~~~~~~~~~~~~~~~~~~~~~
The default report file name is awrrpt_1_41_42.html.  To use this name,
press <return> to continue, otherwise enter an alternative.

Enter value for report_name: /tmp/awrrpt_1_41_42.html
解析AWR報告
 
可以看出大量redo,該時間段總該變數3762494/1024/1024=3674,每秒約產生3.5M
產生塊最多的是scott使用者,object物件,改變數是44684992,佔比99%,說明是該物件產生的
根據物件可以在AWR報告中檢視是否有懷疑的SQL,發現update語句。
其實根據SQL語句和AWR報告可以排查出大部分歸檔紀錄檔暴增的問題,如果無法排查可以繼續進行挖掘歸檔紀錄檔。

2.5 挖掘歸檔紀錄檔

-rw-r-----. 1 oracle oinstall 794697216 Dec 19 20:37 1_66_1077902149.dbf
-rw-r-----. 1 oracle oinstall 794697216 Dec 19 20:37 1_67_1077902149.dbf
-rw-r-----. 1 oracle oinstall 794697216 Dec 19 21:03 1_68_1077902149.dbf
-rw-r-----. 1 oracle oinstall 733794304 Dec 19 21:03 1_69_1077902149.dbf
-rw-r-----. 1 oracle oinstall 756531200 Dec 19 21:03 1_70_1077902149.dbf
-rw-r-----. 1 oracle oinstall 761492480 Dec 19 21:14 1_71_1077902149.dbf
-rw-r-----. 1 oracle oinstall 794697216 Dec 19 21:14 1_72_1077902149.dbf
-rw-r-----. 1 oracle oinstall 265107968 Dec 19 21:14 1_73_1077902149.dbf
-- 最好sys或相關許可權的使用者,也可以使用toad工具
-- 第一次
@?/rdbms/admin/dbmslm.sql
@?/rdbms/admin/dbmslmd.sql

-- 開始執行
execute dbms_logmnr.add_logfile(logfilename=>'../../1_66_1077902149.dbf',options=>dbms_logmnr.new);
execute dbms_logmnr.add_logfile(logfilename=>'../../1_67_1077902149.dbf',options=>dbms_logmnr.new);
execute dbms_logmnr.add_logfile(logfilename=>'../../1_68_1077902149.dbf',options=>dbms_logmnr.new);
execute dbms_logmnr.add_logfile(logfilename=>'../../1_69_1077902149.dbf',options=>dbms_logmnr.new);
execute dbms_logmnr.add_logfile(logfilename=>'../../1_70_1077902149.dbf',options=>dbms_logmnr.new);
execute dbms_logmnr.start_logmnr(options=>dbms_logmnr.dict_from_online_catalog); 
-- 依次類推小批次解析歸檔紀錄檔

-- 儲存記錄
create table scott.logmnr_contents as select * from v$logmnr_contents;

-- 分批執行...迴圈執行上面記錄
alter session set nls_date_format='yyyy-mm-dd hh24:mi:ss'; 

-- 最後釋放pga
execute dbms_logmnr.end_logmnr;
select sql_redo from scott.logmnr_contents where table_name='OBJECT';
select count(*) from scott.logmnr_contents where table_name='OBJECT';
可以從歸檔紀錄檔中檢視大量的update語句,此時基本可以排查出歸檔紀錄檔暴增原因

2.6 歸檔紀錄檔暴增優化

1.delete是否可以改造成truncate分割區表(ps: truncate需謹慎,無法恢復相關資料)
2.dml可以適量使用臨時表
3.避免大事務
4.避免大量for迴圈dml