Last week I was testing our backups and got this error: ORA-15204: database version 126.96.36.199.0 is incompatible with diskgroup DATA_EDT1. This surprised me because all of our databases and Oracle homes on this Exadata are at version 188.8.131.52. I was testing full restore/recovery without a control file or an spfile. The first step was to restore spfile and controlfile from autobackup to their ASM locations.
I queried V$ASM_DISKGROUP in ASM and saw that all of our diskgroups have compatibility set to 184.108.40.206.0. In order to start RMAN restore I had created a minimal pfile. I didn’t bother setting the compatible parameter. I edited the pfile and set compatible=”220.127.116.11.0″ – that fixed it and the restore/recovery worked fine.
Today we got a request to refresh a development database from a copy of production. I decided to use the canned clone process that ships with OEM 12c instead of the usual scripts or manual processes. I was pleasantly surprised with the results. There was a failure related to an NFS problem and I was able to restart the process easily.
- Minimizes the amount of scratch disk space by backing up and restoring one datafile at a time.
- No scripting or manual process required.
- Uses standard RMAN under the covers.
- Works with older versions (This one was 10.2)
- Works with existing backups or directly from the source database (This database was not previously backed up and was not in archivelog mode)
- Can duplicate a live database if it is in archivelog mode and running at version 11
- Easy to restart after errors.
- Runs as a database job in the background, unattended.
- Works on the same machine or across the network.
- More time required – one datafile at a time, no parallelization. 763G ran for 6.5 hours.
- No email notification for errors or completion.
- Black box – you really need to know how to do the steps manually and this will not teach you.
- Needs scratch space on local disks to avoid NFS performance problems. (size of largest datafile)
This morning I noticed one of the managed standby databases was in a pending shutdown state. I restarted it and verified the logs were getting applied from production. However, I noticed the WAIT_FOR_GAP status on one of the old logs. Some of the logs had not been applied and were no longer on the production system. So I found out which logs were in the gap and used RMAN to recover them in production. Oracle automatically shipped them to the standby and applied them.
Discover a gap:
1* SELECT PROCESS, CLIENT_PROCESS, SEQUENCE#, STATUS FROM V$MANAGED_STANDBY
PROCESS CLIENT_P SEQUENCE# STATUS
--------- -------- ---------- ------------
ARCH ARCH 136626 CLOSING
ARCH ARCH 0 CONNECTED
ARCH ARCH 136627 CLOSING
ARCH ARCH 136628 CLOSING
RFS LGWR 136629 IDLE
RFS UNKNOWN 0 IDLE
RFS UNKNOWN 0 IDLE
MRP0 N/A 130107 WAIT_FOR_GAP
RFS UNKNOWN 0 IDLE
Find out the extent of the gap:
1 SELECT high.thread#, "LowGap#", "HighGap#"
4 SELECT thread#, MIN(sequence#)-1 "HighGap#"
7 SELECT a.thread#, a.sequence#
10 SELECT *
11 FROM v$archived_log
12 ) a,
14 SELECT thread#, MAX(next_change#)gap1
15 FROM v$log_history
16 GROUP BY thread#
17 ) b
18 WHERE a.thread# = b.thread#
19 AND a.next_change# > gap1
21 GROUP BY thread#
22 ) high,
24 SELECT thread#, MIN(sequence#) "LowGap#"
27 SELECT thread#, sequence#
28 FROM v$log_history, v$datafile
29 WHERE checkpoint_change# <= next_change#
30 AND checkpoint_change# >= first_change#
32 GROUP BY thread#
33 ) low
34* WHERE low.thread# = high.thread#
THREAD# LowGap# HighGap#
---------- ---------- ----------
1 130106 130153
Recover the logs in Production:
RMAN> restore archivelog from sequence 130106 until sequence 130153;
I had to repeat this sequence of commands several times because there were multiple gaps.
I was re-running an rman duplicate from active database today and got this error. I had assumed that the duplicate command would overwrite the files from the previous run. Instead it created new filenames and caused the ASM disk group to fill up.
Here’s the rman message on the target system:
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-03002: failure of Duplicate Db command at 11/16/2010 10:08:27
RMAN-03015: error occurred in stored script Memory Script
RMAN-03009: failure of backup command on ORA_DISK_6 channel at 11/16/2010 10:00:25
ORA-17628: Oracle error 19505 returned by remote Oracle server
Here’s the alert log entry on the auxiliary system:
ORA-19505: failed to identify file "+data"
ORA-17502: ksfdcre:4 Failed to create file +data
ORA-15041: diskgroup "DATA" space exhausted
The target (source) is on ASM using OMF (Oracle Managed Files). The auxiliary (destination) is also using ASM and OMF. I looked through the rman log and found a lot of set newname commands like this:
executing command: SET NEWNAME
No filename is specified so it makes up new filenames, which are different from the previous run. So that’s why I ran out of space.
I used asmcmd to remove all of the files and reclaim the space.
Easy fix, but initially confusing.
The retention policy for RMAN defaults to 1. Which means than any more than one backup copy of any file is obsolete. If you run a command like “delete obsolete” RMAN will clean out everything except the last copy. All of our RMAN backups write to Netbackup which has several retention policies already configured. When Netbackup deletes backups according to these policies, RMAN sees the files as expired. So the right way to clean up RMAN’s repository is to use “delete expired”. In order to prevent “delete obsolete” from destroying the wrong backups; issue this command: “configure retention policy to none;”
I spent this week taking the ORACLE ENTERPRISE DBA 1B: BACKUP AND RECOVERY class in Chicago. It was a very good class and I’m ready to start writing RMAN scripts 🙂