Getting ORA-15204 on full database restore

Last week I was testing our backups and got this error: ORA-15204: database version 11.2.0.0.0 is incompatible with diskgroup DATA_EDT1. This surprised me because all of our databases and Oracle homes on this Exadata are at version 11.2.0.3. I was testing full restore/recovery without a control file or an spfile. The first step was to restore spfile and controlfile from autobackup to their ASM locations.
I queried V$ASM_DISKGROUP in ASM and saw that all of our diskgroups have compatibility set to 11.2.0.2.0. In order to start RMAN restore I had created a minimal pfile. I didn’t bother setting the compatible parameter. I edited the pfile and set compatible=”11.2.0.3.0″ – that fixed it and the restore/recovery worked fine.

 

 

.rman

Using OEM 12c canned clone process

Today we got a request to refresh a development database from a copy of production. I decided to use the canned clone process that ships with OEM 12c instead of the usual scripts or manual processes. I was pleasantly surprised with the results. There was a failure related to an NFS problem and I was able to restart the process easily.

Pro:

  • Minimizes the amount of scratch disk space by backing up and restoring one datafile at a time.
  • No scripting or manual process required.
  • Uses standard RMAN under the covers.
  • Works with older versions (This one was 10.2)
  • Works with existing backups or directly from the source database (This database was not previously backed up and was not in archivelog mode)
  • Can duplicate a live database if it is in archivelog mode and running at version 11
  • Easy to restart after errors.
  • Runs as a database job in the background, unattended.
  • Works on the same machine or across the network.

Con:

  • More time required – one datafile at a time, no parallelization. 763G ran for 6.5 hours.
  • No email notification for errors or completion.
  • Black box – you really need to know how to do the steps manually and this will not teach you.
  • Needs scratch space on local disks to avoid NFS performance problems.  (size of largest datafile)

 

WAIT_FOR_GAP status on standby

This morning I noticed one of the managed standby databases was in a pending shutdown state. I restarted it and verified the logs were getting applied from production. However, I noticed the WAIT_FOR_GAP status on one of the old logs. Some of the logs had not been applied and were no longer on the production system. So I found out which logs were in the gap and used RMAN to recover them in production. Oracle automatically shipped them to the standby and applied them.

Discover a gap:
 1* SELECT PROCESS, CLIENT_PROCESS, SEQUENCE#, STATUS FROM V$MANAGED_STANDBY 
PROCESS   CLIENT_P  SEQUENCE# STATUS
--------- -------- ---------- ------------
ARCH      ARCH         136626 CLOSING
ARCH      ARCH              0 CONNECTED
ARCH      ARCH         136627 CLOSING
ARCH      ARCH         136628 CLOSING
RFS       LGWR         136629 IDLE
RFS       UNKNOWN           0 IDLE
RFS       UNKNOWN           0 IDLE
MRP0      N/A          130107 WAIT_FOR_GAP
RFS       UNKNOWN           0 IDLE

Find out the extent of the gap:

  1  SELECT high.thread#, "LowGap#", "HighGap#"
  2  FROM
  3       (
  4       SELECT thread#, MIN(sequence#)-1 "HighGap#"
  5       FROM
  6       (
  7           SELECT a.thread#, a.sequence#
  8           FROM
  9           (
 10               SELECT *
 11               FROM v$archived_log
 12           ) a,
 13           (
 14               SELECT thread#, MAX(next_change#)gap1
 15               FROM v$log_history
 16               GROUP BY thread#
 17           ) b
 18           WHERE a.thread# = b.thread#
 19           AND a.next_change# > gap1
 20       )
 21       GROUP BY thread#
 22   ) high,
 23   (
 24       SELECT thread#, MIN(sequence#) "LowGap#"
 25       FROM
 26       (
 27           SELECT thread#, sequence#
 28           FROM v$log_history, v$datafile
 29           WHERE checkpoint_change# <= next_change#
 30           AND checkpoint_change# >= first_change#
 31       )
 32       GROUP BY thread#
 33   ) low
 34*  WHERE low.thread# = high.thread#
11:13:08 BIPROD1SYS>/
   THREAD#    LowGap#   HighGap#
---------- ---------- ----------
         1     130106     130153

Recover the logs in Production:

RMAN>  restore archivelog from sequence 130106 until sequence 130153;

I had to repeat this sequence of commands several times because there were multiple gaps.

Oracle error 19505 returned by remote Oracle server

I was re-running an rman duplicate from active database today and got this error. I had assumed that the duplicate command would overwrite the files from the previous run. Instead it created new filenames and caused the ASM disk group to fill up.

Here’s the rman message on the target system:

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of Duplicate Db command at 11/16/2010 10:08:27
RMAN-03015: error occurred in stored script Memory Script
RMAN-03009: failure of backup command on ORA_DISK_6 channel at 11/16/2010 10:00:25
ORA-17628: Oracle error 19505 returned by remote Oracle server

Here’s the alert log entry on the auxiliary system:

ORA-19505: failed to identify file "+data"
ORA-17502: ksfdcre:4 Failed to create file +data
ORA-15041: diskgroup "DATA" space exhausted

The target (source) is on ASM using OMF (Oracle Managed Files). The auxiliary (destination) is also using ASM and OMF. I looked through the rman log and found a lot of set newname commands like this:

executing command: SET NEWNAME

No filename is specified so it makes up new filenames, which are different from the previous run. So that’s why I ran out of space.

I used asmcmd to remove all of the files and reclaim the space.

Easy fix, but initially confusing.

RMAN and NetBackup retention policy conflicts

The retention policy for RMAN defaults to 1. Which means than any more than one backup copy of any file is obsolete. If you run a command like “delete obsolete” RMAN will clean out everything except the last copy. All of our RMAN backups write to Netbackup which has several retention policies already configured. When Netbackup deletes backups according to these policies, RMAN sees the files as expired. So the right way to clean up RMAN’s repository is to use “delete expired”. In order to prevent “delete obsolete” from destroying the wrong backups; issue this command: “configure retention policy to none;”