Last week I was testing our backups and got this error: ORA-15204: database version 188.8.131.52.0 is incompatible with diskgroup DATA_EDT1. This surprised me because all of our databases and Oracle homes on this Exadata are at version 184.108.40.206. I was testing full restore/recovery without a control file or an spfile. The first step was to restore spfile and controlfile from autobackup to their ASM locations.
I queried V$ASM_DISKGROUP in ASM and saw that all of our diskgroups have compatibility set to 220.127.116.11.0. In order to start RMAN restore I had created a minimal pfile. I didn’t bother setting the compatible parameter. I edited the pfile and set compatible=”18.104.22.168.0″ – that fixed it and the restore/recovery worked fine.
This was a last minute request to move five databases on two servers from an old failing storage system to a new one. There was a total of 5.6 terabytes to move and the databases were in heavy use by the development team. The DBA who usually supported these systems was unavailable so I started working with the storage team to get this done.
I was able to get the storage team to create 67 new disks with the same sizes as the original disks and assign them to the correct servers. I researched how to get the disks to show up for ASM on Windows. There were 37 diskgroups and I assigned the new disks to each disk group according to the size of the original disks. So each diskgroup now had double the storage.
The next step shows the power of this technique: I dropped the old disks from each diskgroup and ASM moved all the data from the old disks to the new disks, then released the old disks. I started these late in the day and they finished the next morning. No downtime, no impact to the development team.
Lessons learned: use standard disk sizes, minimize the number of diskgroups
I just finished the class: X0161 Oracle RAC on AIX Systems Workshop taught by Andrei Socoliuc of IBM Romania. Andrei knew a lot about Oracle RAC and AIX so it was a good class. I thought the best part was the hands on labs, especially the pre-installation preparation of the operating system. There was also a lot of good information on hardware and LPAR configuration. This is my first RAC class since 2003 when it had just been released, so the overview of RAC was a good way for me to get a refresher on RAC internals. My only complaint is that we spent a lot of time on IBM’s shared disk solution (GPFS) and very little on Oracle’s ASM .
I got this error when I was installing Oracle RAC 11.2 on Red Hat Enterprise Linux 5.6. I was installing Clusterware using ASM on VMware shared disks. When I created the independent persistent virtual disks, I left the “allocate all disk space now” option unselected. Oracleasm was happy on both RAC nodes. The Oracle installer was happy when it created +ASM1 on the first RAC node. But when the ASM instance started on node 2 it did not like the “virtually provisioned” disk. The +ASM2 instance was not open and was complaining about one of the shared disks being corrupt at a certain byte. When I checked the virtual disk files, I saw that Oracle was trying to read past the end of the file. I started all over with new fully allocated shared disk and that fixed the problem. Everything is up and running now.
I was re-running an rman duplicate from active database today and got this error. I had assumed that the duplicate command would overwrite the files from the previous run. Instead it created new filenames and caused the ASM disk group to fill up.
Here’s the rman message on the target system:
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-03002: failure of Duplicate Db command at 11/16/2010 10:08:27
RMAN-03015: error occurred in stored script Memory Script
RMAN-03009: failure of backup command on ORA_DISK_6 channel at 11/16/2010 10:00:25
ORA-17628: Oracle error 19505 returned by remote Oracle server
Here’s the alert log entry on the auxiliary system:
ORA-19505: failed to identify file "+data"
ORA-17502: ksfdcre:4 Failed to create file +data
ORA-15041: diskgroup "DATA" space exhausted
The target (source) is on ASM using OMF (Oracle Managed Files). The auxiliary (destination) is also using ASM and OMF. I looked through the rman log and found a lot of set newname commands like this:
executing command: SET NEWNAME
No filename is specified so it makes up new filenames, which are different from the previous run. So that’s why I ran out of space.
I used asmcmd to remove all of the files and reclaim the space.
Easy fix, but initially confusing.