a Lost write logfile的恢复如果人为破坏redo log 应该不能恢复

2024 年 9 月 8 日 5点热度 0人点赞 0条评论

APPLIES TO:

Oracle Database - Enterprise Edition - Version 12.2.0.1 and later
Information in this document applies to any platform.

SYMPTOMS

6 Node Rac instance.

All node crashed.

On Starting up Instance 1 Below error is reported

ORA-00322: log 11 of thread 1 is not current copy
ORA-00312: online log 11 thread 1: '+<diskgroupname>/onlinelog/group_11.309.747926829'
ORA-00312: online log 11 thread 1: '+<diksgroup 2name>/onlinelog/group_11.309.747926831'

database fails to open using srvctl

CHANGES

CAUSE

There has been a Lost write

SOLUTION

Mount the instance which is failing .

Run the below queries and find what is the checkpoint SCN of datafiles and whether they are non Fuzzy

select ,count(*) , checkpoint_change# "SCN", fuzzy from v$datafile_header group by fuzzy, checkpoint_change# ;

COUNT(*) SCN FUZZY
---------- -------------------- ---------------
34 13465426360102 NO ============>All datafiles are at this SCN and fuzziness is No.

说明数据是一致的

Verify that the checkpoint_time / checkpoint_change# is in line with checkpoint SCN in the controlfile

select dbid, name, created, open_mode, log_mode,
to_char(checkpoint_change#, '999999999999999') as checkpoint_change#,
controlfile_type,
to_char(controlfile_change#, '999999999999999') as controlfile_change#,
to_char(controlfile_time, 'DD-MON-RRRR HH24:MI:SS') controlfile_time,
resetlogs_change#, resetlogs_time, prior_resetlogs_change#, prior_resetlogs_time
from v$database;

DBID NAME    CREATED             OPEN_MODE LOG_MODE    CHECKPOINT_CHANG CONTROL   CONTROLFILE_CHAN     CONTROLFILE_TIME RESETLOGS_CHANGE# RESETLOGS_TIME PRIOR_RESETLOGS_CHANGE# PRIOR_RESETLOGS_TIME
-------------------------- ----------------- -------------------- ----------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------------
2817309534 <DBNAME> 06-JAN-2011 11:56:14 MOUNTED ARCHIVELOG 13465426360102 CURRENT   13465426260077     25-JUN-2019 07:57:28 13464250257478 05-JAN-2019 14:19:32 7907050 27-MAY-2011 16:07:01

-----上面标错了，应该是13465426360102

Controlfile has checkpoint scn 13465426360102

Datafiles are at SCN --->13465426360102

Alert log shows

Before the crash last sequence written to group 11 of thread 1 was 9607

2019-06-24T10:02:37.152693+02:00
Thread 1 advanced to log sequence 9607 (LGWR switch)
Current log# 11 seq# 9607 mem# 0: +<Diskgroup 1name>/onlinedb/onlinelog/group_11.309.747926829
Current log# 11 seq# 9607 mem# 1: +<Diskgroup 1name>/onlinelog/group_11.309.747926831
2019-06-24T10:02:37.202531+02:00

v$log shows group 11 is inactive and sequence 9613 . Infact all the Groups from thread 1 is shown as Inactive

11 1 9613 157286400 512 2 YES INACTIVE 13465423338995 24-JUN-2019 12:37:49 1.3465E+13 24-JUN-2019 12:48:15

Sequence 9613 was not generated as per alert log. However group 11 shows it has sequence 9613

To resolve the issue

Since the files are shown consistent try opening other instance one by one .

> Startup other instance (expect for the one failing one by one)

> Once the other instance starts

For Instance 1 (thread 1 on which we see the error ORA-00322)

Use asmcmd and make a copy of the redo log associated with thread 1(In this case it was reported on thread 1)

asmcmd> cp <redo log > <filesystem path>

once copied to go filesystem

ls -lrt <filesystem path> ----->to ensure they have been copied