Last week I was testing our backups and got this error: ORA-15204: database version 18.104.22.168.0 is incompatible with diskgroup DATA_EDT1. This surprised me because all of our databases and Oracle homes on this Exadata are at version 22.214.171.124. I was testing full restore/recovery without a control file or an spfile. The first step was to restore spfile and controlfile from autobackup to their ASM locations.
I queried V$ASM_DISKGROUP in ASM and saw that all of our diskgroups have compatibility set to 126.96.36.199.0. In order to start RMAN restore I had created a minimal pfile. I didn’t bother setting the compatible parameter. I edited the pfile and set compatible=”188.8.131.52.0″ – that fixed it and the restore/recovery worked fine.
We have been having performance problems on our test Exadata for several months. I have opened five Oracle service requests for multiple symptoms. While the cpu utilization was fairly low, Oracle background processes would hang, the OEM 12c agent would hang, backup jobs would hang, we would experience slow communications between RAC nodes, half of the cpus would be in 100% IO wait state, the system load average would exceed 6000, etc. We noticed one of the NFS mounts was unreachable and it happened to be the mount point where we keep our DBA scripts. Processes would be in IO wait state “D” and we noticed several were the DBA scripts running from NFS. We could resolve the problem by killing these scripts. So I moved the backup scripts to local drives and eliminated some issues.
The problem kept returning though so I kept opening more SRs with no solution. Yesterday I escalated and had one of my SRs reopened. I was finally able to get to the correct person in Oracle support who gave me two things to try. The first was to add the “noac” option for the NFS mounts. The idea was that this would resolve issues where synchronous writes are induced. Since we are backing up to NFS using RMAN and tar this seemed a good bet. And it did help a lot. But we were still able to bring the problem back by tarring to NFS.
These NFS mounts are across Infiniband to an Exalogic ZFS storage system. The second fix was based on the fact that the new OEL kernel 184.108.40.206.1 update has memory management changes that may result in high TCP/IP traffic causing memory starvation for contiguous memory free space. See Knowledge base article 1546861.1 System hung with large numbers of page allocation failures with “order:5″ : <Future Exadata releases will be changing the MTU size on the InfiniBand Interfaces to 7000 (down from 65520) for new installations, so the 7000 MTU for Exadata environments is known to be appropriate> So I changed the Infiniband MTU from 65520 to 7000 and restarted the network service. That finally fixed the issue.
I was in St. Charles, Mo all last week attending this class. It was taught at Quilogy Services and the instructor Doug Wallin was excellent. We only had two other students in the class and both of them were Oracle employees. I had no ODI experience so it was a great learning experience for me. I finished all the hands on labs and I have referred back to some of them since returning to work this week. It is a fast paced course with a lot of information. I think retention would be low if you couldn’t immediately start applying the learning on the job. My goal is to start using as much of the knowledge possible. Since we had such a small class we were able to get through all the material on time and also have some interesting discussions about “best practices”. I stayed at a hotel less than a mile from the class location so I was able to ride a bike to class every day. The Katy Trail runs alongside the Missouri River and a lot of non-chain restaurants are near the class location. So I biked about 12-14 miles every day and ate well It was a great week and I returned to work with a lot of energy and enthusiasm.
This is a combination of two courses, “Oracle Grid Infrastructure 11g: Manage Clusterware” and “ASM and Oracle Database 11g: RAC Administration”. The two courses are normally taught in five days and three days respectively. This combination was taught in five days by extending the classroom time (8:30AM – 6:30PM). Knowledge Services instructor Pat Frey taught the class in Columbus/Dublin OH. There is some overlap in the two courses so it’s not as bad as it sounds
There are lots of good labs in these courses and we did all of them except the overlapping parts. I was surprised that the virtual machines for our three node clusters worked so well. There were some performance issues and one student’s cluster became unresponsive on the last day. Mine stayed up and I always finished the labs in about half the time other students took.
I couldn’t understand why it took the other students so long to work through the labs until I realized how much more experience I have with Oracle. For some of them it was their first exposure to Clusterware and RAC. I got the training materials the week before and one of the exercises actually solved an issue I was having on our Exadata RAC systems. Also, I have been using RAC since it first came out on 9i. Here’s a review of the RAC class I took back in 2003. So I think my experience and knowledge of Oracle in general made a huge difference in my ability to perform the lab work efficiently and quickly. I also asked a lot more questions for clarification than anyone else because I was able to see how the different features were relevant to production uptime and performance.
This was a very good course for me and I recommend it to anyone who already has a basic grounding in single-instance Oracle.
I spent last week in downtown Chicago attending this class. It was a very good class and the instructor, Andy Fortunak was excellent. We had lots of labs to work through. The computers we used ran virtual database servers and virtual storage servers. This setup caused some problems, mainly slowness. My virtual database machine’s clusterware failed towards the end of the week but the virtual storage servers continued to work fine.
The main message was: invest the time needed to set up services and Database Resource Manager. IO Resource Manager and Instance Caging depends on it.
We also discussed consolidation and using RAC one-node. We are consolidating scores of databases on Exadata. Most of them do not require multiple RAC nodes for workload but we do want high availability and load balancing. This week I tested manually moving one-node databases between nodes and it works great. The client does not get disconnected but does get an error message. Srvctl actually makes it a two node RAC and then stops the original node. I testted automatic failover of rac one-node and compared it with a full rac two-node with one instance down. They both work the way I wanted, a second instance starts up. I can’t see any technical reason for going with one-node.
We recently bought another company and we are in the process of integrating our systems with theirs. One requirement was to import their daily business data into our Business Intelligence system for reporting. We agreed they would provide an Oracle datapump export file for us to import. We wanted to automate this and have a non-privileged application owner perform the import. I created a stored procedure that could run the import as the system user and granted execute on the procedure to the application owner. The dbms_datapump package has lots of options and it was fun exploring them to find the best way of doing this.
Run as sys :
grant execute on dbms_lock to system;
Run as system:
CREATE OR REPLACE DIRECTORY SOURCE AS '/nfsmount/exp/incoming';
CREATE OR REPLACE PROCEDURE SYSTEM.SCHEMA_IMPORT (
dumpfilename IN VARCHAR2)
h1 NUMBER; -- data pump job handle
job_state VARCHAR2 (30);
status ku$_Status; -- data pump status
PRAGMA EXCEPTION_INIT (job_not_exist, -31626);
DBMS_DATAPUMP.open (operation => 'IMPORT',
job_mode => 'SCHEMA',
job_name => NULL);
DBMS_DATAPUMP.set_parameter (h1, 'TABLE_EXISTS_ACTION', 'TRUNCATE');
DBMS_DATAPUMP.add_file (h1, dumpfilename, 'SOURCE');
dumpfilename || '.log',
job_state := 'UNDEFINED';
WHILE (job_state != 'COMPLETED') AND (job_state != 'STOPPED')
handle => h1,
mask => DBMS_DATAPUMP.ku$_status_job_error
timeout => -1);
job_state := status.job_status.state;
DBMS_OUTPUT.put_line ('job finished');
grant execute on schema_import to APP_OWNER;
Run as APP_OWNER:
Today we got a request to refresh a development database from a copy of production. I decided to use the canned clone process that ships with OEM 12c instead of the usual scripts or manual processes. I was pleasantly surprised with the results. There was a failure related to an NFS problem and I was able to restart the process easily.
- Minimizes the amount of scratch disk space by backing up and restoring one datafile at a time.
- No scripting or manual process required.
- Uses standard RMAN under the covers.
- Works with older versions (This one was 10.2)
- Works with existing backups or directly from the source database (This database was not previously backed up and was not in archivelog mode)
- Can duplicate a live database if it is in archivelog mode and running at version 11
- Easy to restart after errors.
- Runs as a database job in the background, unattended.
- Works on the same machine or across the network.
- More time required – one datafile at a time, no parallelization. 763G ran for 6.5 hours.
- No email notification for errors or completion.
- Black box – you really need to know how to do the steps manually and this will not teach you.
- Needs scratch space on local disks to avoid NFS performance problems. (size of largest datafile)