Oracle Application Express 5.1.1 on Exadata cluster

Our DBA team really likes using APEX to develop DBA applications. We are a pretty small team and we like to automate and standardize everything we do. We make a habit of using database tables to store our configuration information and we write scripts that read those DBA tables in order to support our business users.

We recently took control of a new Exadata cluster and of course we created a DBA repository database as we were creating the business databases. Since APEX 5.1.1 was just released, we decided to replace our old APEX system. That system was was a single instance running on a virtual machine and controlled by an outsourced administration team. We are a lot more comfortable having control over installation and maintenance of a system we depend on to get our work done.

The new Exadata is a four node RAC and our new DBA repository has an instance running on each node. We did not want to install software on the Exadata so that ruled out using Rest Data Services (aka APEX Listener). It also ruled out using Oracle HTTP Server with mod_plsql.

Since this APEX system is only used by our DBA team and is not exposed to outside networks we felt comfortable using the Embedded PL/SQL Gateway. However, the Oracle documentation contains this ominous note:”Oracle recommends that you do not select the Embedded PL/SQL Gateway option for Oracle RAC installations. Because the Embedded PL/SQL Gateway uses an HTTP Server built into the database instance, it does not take advantage of the Oracle RAC shared architecture.” (https://docs.oracle.com/database/apex-5.1/HTMIG/choosing-web-listener.htm#HTMIG370)

What to do? We decided to move forward with the Embedded PL/SQL Gateway and see how useable it would be. We could always spin up another virtual machine later if we needed to go the Rest Data Services route. I went through the steps in the APEX documentation to set up the Embedded PL/SQL Gateway. I could not get a response when I hit the database machine APEX URL that I had configured. I poked around on the servers and noticed that the XML db listeners were not running. I had forgotten that when I created the DBA database I intentionally did not install any options like Java and XML db. After installing XML db and restarting the database APEX started working. (https://docs.oracle.com/database/121/ADXDB/appaman.htm#ADXDB2700)

But that’s not all – here’s the real reason I wanted to blog about this. This APEX system is reachable using the scan listener! So that scary quote in the documentation should be ignored. We can connect using a URL like this: http://corpexa-scan.corp.com:8080/apex . No server names are required. So we CAN take advantage of RAC functionality.

This is just what we wanted so we are very satisfied with our new APEX system.

Automating Redo Log Maintenance

We asked a consultant to review our production database and one of the recommendations was to switch our redo logs from 2GB to 4GB. This surprised me at first but after checking the 11.2 documentation, I saw that 4GB is the current recommendation.  This is a 4 node RAC cluster and we had previously added several additional redo logs based on the application vendor’s  recommendations :-).  I decided I would take the opportunity to try out the PL/SQL debugger in SQL Developer 4.0.1.  Doing the development entirely in SQL Developer instead of vi turned out to be a good decision and saved me a lot of time.  In addition to changing the redo sizes, I wanted to move them from our original Exadata X2 ASM disk groups to our new Exadata X3 disk groups.

Manual process:

  • Create a new group like the original group
  • If the old group is inactive and has been archived, drop the old group
  • If the old group is active, current or has not been archived, force a log switch on all RAC nodes, wait until it’s inactive, then drop the old group
  • Repeat until all of the old log groups are gone (I’m using Oracle Managed Files so I don’t need to log on to asmcmd and manually delete the files)

I automated these manual steps in the sys.REDO_MAINT package:

CREATE OR REPLACE PACKAGE REDO_MAINT AS
type log_rec_type is record (log_group number,log_thread number,is_archived varchar2(3),log_status varchar2(10) ,log_size number);
cursor log_records return log_rec_type;
procedure manage_logs (
  new_size      varchar2,
  new_location  varchar2,
  new_location2 varchar2);
function create_new_group (
  newgroup    number,
  newthread   number,
  newfilespec varchar2,
  newfilespec2 varchar2,
  newfilesize varchar2)
    return boolean;
function drop_old_group (
  oldgroup  number)
    return boolean;
function redo_in_use (
  groupnum  number)
    return boolean;
function redo_not_archived (
  groupnum number)
    return boolean;
END REDO_MAINT;
/

CREATE OR REPLACE PACKAGE BODY REDO_MAINT AS

  cursor log_records return log_rec_type IS
    SELECT group# ,thread#,archived,status,BYTES from v$log;

  procedure manage_logs (
  new_size      varchar2,
  new_location  varchar2,
  new_location2 varchar2) AS
  BEGIN
    for existing_logs in log_records
    loop
      dbms_output.put_line('Group '||existing_logs.log_group 
                        || ' Thread ' || existing_logs.log_thread
                        || ' Archived ' || existing_logs.is_archived
                        || ' Status ' || existing_logs.log_status
                        || ' Size ' || existing_logs.log_size);

      if create_new_group(existing_logs.log_group + 100, existing_logs.log_thread,new_location,new_location2,new_size) 
      then
        dbms_output.put_line('create new group success');
        loop
          if drop_old_group(existing_logs.log_group)
          then
            dbms_output.put_line('Dropped old group ' || existing_logs.log_group);
            exit;
          end if;
          dbms_output.put_line('old group was not dropped - retrying ' || existing_logs.log_group);
        end loop;
      else
        dbms_output.put_line('create new group FAILED');
        exit;
      end if;
    end loop;
    NULL;
  END manage_logs;

  function create_new_group (
  newgroup    number,
  newthread   number,
  newfilespec varchar2,
  newfilespec2 varchar2,
  newfilesize varchar2)
    return boolean AS
  create_group varchar(200);
  BEGIN
    create_group:='ALTER DATABASE ADD LOGFILE THREAD ' || newthread 
    || ' GROUP ' || newgroup 
    || ' ( ' || '''' || newfilespec || '''' || ' , ' || '''' || newfilespec2 || '''' 
    || ' ) size ' || newfilesize ;
    dbms_output.put_line(create_group);
    execute immediate create_group;
    RETURN TRUE;
    exception
    when others then 
     dbms_output.put_line('An error was encountered - '||SQLCODE||' -ERROR- '||SQLERRM);
     return FALSE;
  END create_new_group;

  function drop_old_group (
  oldgroup  number)
    return boolean AS
  drop_group varchar(200);
  BEGIN
    drop_group:='alter database drop logfile group ' || oldgroup ;
     if redo_in_use(oldgroup) or redo_not_archived(oldgroup)
     then
      execute immediate 'ALTER SYSTEM ARCHIVE LOG CURRENT' ;
      -- wait until target redo log becomes inactive
      loop
        if redo_in_use(oldgroup) or redo_not_archived(oldgroup)
        then
          dbms_output.put_line('Sleeping - group ' || oldgroup);
          DBMS_LOCK.SLEEP(60);
        else
          -- was active - now inactive 
          dbms_output.put_line(drop_group);
          execute immediate drop_group;
          RETURN TRUE;
          exit;
        end if;
      end loop;
      return FALSE;
    end if;
    -- was not active - no wait required
    dbms_output.put_line(drop_group);
    execute immediate drop_group;
    RETURN TRUE;
        exception
    when others then 
     dbms_output.put_line('An error was encountered - '||SQLCODE||' -ERROR- '||SQLERRM);
     return FALSE;
  END drop_old_group;

  function redo_in_use (
  groupnum  number)
    return boolean AS
  current_status varchar2(10);  
  BEGIN
  select status into current_status from v$log where group# = groupnum;
  if current_status in ( 'ACTIVE','CURRENT') 
  then
    RETURN TRUE;
  else
    return FALSE;
  END IF;
        exception
    when others then 
     dbms_output.put_line('An error was encountered - '||SQLCODE||' -ERROR- '||SQLERRM);
     return FALSE;
  END redo_in_use;

  function redo_not_archived (
  groupnum number)
    return boolean AS
  is_it_archived varchar(3);
  BEGIN
    select archived into is_it_archived from v$log where group# = groupnum;
    if is_it_archived = 'NO'
    then
    -- redo not archived
       RETURN TRUE;
    else
      return FALSE;
    END IF;
      exception
    when others then 
     dbms_output.put_line('An error was encountered - '||SQLCODE||' -ERROR- '||SQLERRM);
     return FALSE;
  END redo_not_archived;

END REDO_MAINT;
/

Example:
SYS>select group#,thread#,bytes,archived,status from v$log order by  group#, thread#;

    GROUP#    THREAD#            BYTES ARC STATUS
---------- ---------- ---------------- --- ----------------
         1          1    2,147,483,648 YES INACTIVE
         2          1    2,147,483,648 YES INACTIVE
         3          2    2,147,483,648 YES INACTIVE
         4          2    2,147,483,648 YES INACTIVE
         5          1    2,147,483,648 YES INACTIVE
         6          2    2,147,483,648 YES INACTIVE
         7          3    2,147,483,648 YES INACTIVE
         8          3    2,147,483,648 YES ACTIVE
         9          3    2,147,483,648 NO  CURRENT
        10          4    2,147,483,648 YES INACTIVE
        11          4    2,147,483,648 YES INACTIVE
        12          4    2,147,483,648 YES INACTIVE
        13          1    2,147,483,648 YES INACTIVE
        14          1    2,147,483,648 YES INACTIVE
        15          1    2,147,483,648 YES INACTIVE
        16          1    2,147,483,648 NO  CURRENT
        17          1    2,147,483,648 YES INACTIVE
        20          2    2,147,483,648 YES ACTIVE
        21          2    2,147,483,648 NO  CURRENT
        22          2    2,147,483,648 YES INACTIVE
        23          2    2,147,483,648 YES INACTIVE
        24          2    2,147,483,648 YES INACTIVE
        30          3    2,147,483,648 YES INACTIVE
        31          3    2,147,483,648 YES INACTIVE
        32          3    2,147,483,648 YES INACTIVE
        33          3    2,147,483,648 YES INACTIVE
        34          3    2,147,483,648 YES INACTIVE
        40          4    2,147,483,648 YES INACTIVE
        41          4    2,147,483,648 YES INACTIVE
        42          4    2,147,483,648 YES INACTIVE
        43          4    2,147,483,648 NO  CURRENT
        44          4    2,147,483,648 YES INACTIVE

32 rows selected.
Kick it off:
BEGIN
  REDO_MAINT.MANAGE_LOGS( NEW_SIZE => '4g', NEW_LOCATION => '+RECO_EDP2', NEW_LOCATION2 => '+DATA_EDP2');
END;
/

on another session:
SYS>select group#,thread#,bytes,archived,status from v$log order by  group#, thread#;

    GROUP#    THREAD#            BYTES ARC STATUS
---------- ---------- ---------------- --- ----------------
         3          2    2,147,483,648 YES INACTIVE
         4          2    2,147,483,648 YES INACTIVE
         5          1    2,147,483,648 YES INACTIVE
         6          2    2,147,483,648 YES INACTIVE
         7          3    2,147,483,648 YES INACTIVE
         8          3    2,147,483,648 YES INACTIVE
         9          3    2,147,483,648 YES INACTIVE
        10          4    2,147,483,648 YES INACTIVE
        11          4    2,147,483,648 NO  CURRENT
        12          4    2,147,483,648 YES INACTIVE
        13          1    2,147,483,648 YES INACTIVE
        14          1    2,147,483,648 YES INACTIVE
        15          1    2,147,483,648 YES INACTIVE
        16          1    2,147,483,648 YES INACTIVE
        17          1    2,147,483,648 YES INACTIVE
        20          2    2,147,483,648 YES INACTIVE
        21          2    2,147,483,648 YES INACTIVE
        22          2    2,147,483,648 YES INACTIVE
        23          2    2,147,483,648 YES INACTIVE
        24          2    2,147,483,648 NO  CURRENT
        30          3    2,147,483,648 YES INACTIVE
        31          3    2,147,483,648 YES INACTIVE
        32          3    2,147,483,648 NO  CURRENT
        33          3    2,147,483,648 YES INACTIVE
        34          3    2,147,483,648 YES INACTIVE
        40          4    2,147,483,648 YES INACTIVE
        41          4    2,147,483,648 YES INACTIVE
        42          4    2,147,483,648 YES INACTIVE
        43          4    2,147,483,648 YES INACTIVE
        44          4    2,147,483,648 YES INACTIVE
       101          1    4,294,967,296 NO  CURRENT
       102          1    4,294,967,296 YES UNUSED

32 rows selected.

Output from PL/SQL:

Group 1 Thread 1 Archived NO Status CURRENT Size 2147483648
ALTER DATABASE ADD LOGFILE THREAD 1 GROUP 101 ( '+RECO_EDP2' , '+DATA_EDP2' ) size 4g
create new group success
Sleeping - group 1
Sleeping - group 1
Sleeping - group 1
Sleeping - group 1
Sleeping - group 1
alter database drop logfile group 1
Dropped old group 1
Group 2 Thread 1 Archived YES Status INACTIVE Size 2147483648
ALTER DATABASE ADD LOGFILE THREAD 1 GROUP 102 ( '+RECO_EDP2' , '+DATA_EDP2' ) size 4g
create new group success
alter database drop logfile group 2
Dropped old group 2
Group 3 Thread 2 Archived YES Status INACTIVE Size 2147483648
ALTER DATABASE ADD LOGFILE THREAD 2 GROUP 103 ( '+RECO_EDP2' , '+DATA_EDP2' ) size 4g
create new group success
alter database drop logfile group 3
Dropped old group 3
Group 4 Thread 2 Archived YES Status INACTIVE Size 2147483648
ALTER DATABASE ADD LOGFILE THREAD 2 GROUP 104 ( '+RECO_EDP2' , '+DATA_EDP2' ) size 4g
create new group success
alter database drop logfile group 4
Dropped old group 4
Group 5 Thread 1 Archived YES Status INACTIVE Size 2147483648
ALTER DATABASE ADD LOGFILE THREAD 1 GROUP 105 ( '+RECO_EDP2' , '+DATA_EDP2' ) size 4g
create new group success
alter database drop logfile group 5
Dropped old group 5
Group 6 Thread 2 Archived YES Status INACTIVE Size 2147483648
ALTER DATABASE ADD LOGFILE THREAD 2 GROUP 106 ( '+RECO_EDP2' , '+DATA_EDP2' ) size 4g
create new group success
alter database drop logfile group 6
Dropped old group 6
Group 7 Thread 3 Archived YES Status INACTIVE Size 2147483648
ALTER DATABASE ADD LOGFILE THREAD 3 GROUP 107 ( '+RECO_EDP2' , '+DATA_EDP2' ) size 4g
create new group success
alter database drop logfile group 7
Dropped old group 7
Group 8 Thread 3 Archived YES Status INACTIVE Size 2147483648
ALTER DATABASE ADD LOGFILE THREAD 3 GROUP 108 ( '+RECO_EDP2' , '+DATA_EDP2' ) size 4g
create new group success
alter database drop logfile group 8
Dropped old group 8
Group 9 Thread 3 Archived YES Status INACTIVE Size 2147483648
ALTER DATABASE ADD LOGFILE THREAD 3 GROUP 109 ( '+RECO_EDP2' , '+DATA_EDP2' ) size 4g
create new group success
alter database drop logfile group 9
Dropped old group 9
Group 10 Thread 4 Archived NO Status CURRENT Size 2147483648
ALTER DATABASE ADD LOGFILE THREAD 4 GROUP 110 ( '+RECO_EDP2' , '+DATA_EDP2' ) size 4g
create new group success
alter database drop logfile group 10
Dropped old group 10
Group 11 Thread 4 Archived YES Status INACTIVE Size 2147483648
ALTER DATABASE ADD LOGFILE THREAD 4 GROUP 111 ( '+RECO_EDP2' , '+DATA_EDP2' ) size 4g
create new group success
Sleeping - group 11
Sleeping - group 11
Sleeping - group 11
Sleeping - group 11
alter database drop logfile group 11
Dropped old group 11
Group 12 Thread 4 Archived YES Status INACTIVE Size 2147483648
ALTER DATABASE ADD LOGFILE THREAD 4 GROUP 112 ( '+RECO_EDP2' , '+DATA_EDP2' ) size 4g
create new group success
alter database drop logfile group 12
Dropped old group 12
Group 13 Thread 1 Archived YES Status INACTIVE Size 2147483648
ALTER DATABASE ADD LOGFILE THREAD 1 GROUP 113 ( '+RECO_EDP2' , '+DATA_EDP2' ) size 4g
create new group success
alter database drop logfile group 13
Dropped old group 13
Group 14 Thread 1 Archived YES Status INACTIVE Size 2147483648
ALTER DATABASE ADD LOGFILE THREAD 1 GROUP 114 ( '+RECO_EDP2' , '+DATA_EDP2' ) size 4g
create new group success
alter database drop logfile group 14
Dropped old group 14
Group 15 Thread 1 Archived YES Status INACTIVE Size 2147483648
ALTER DATABASE ADD LOGFILE THREAD 1 GROUP 115 ( '+RECO_EDP2' , '+DATA_EDP2' ) size 4g
create new group success
alter database drop logfile group 15
Dropped old group 15
Group 16 Thread 1 Archived YES Status INACTIVE Size 2147483648
ALTER DATABASE ADD LOGFILE THREAD 1 GROUP 116 ( '+RECO_EDP2' , '+DATA_EDP2' ) size 4g
create new group success
alter database drop logfile group 16
Dropped old group 16
Group 17 Thread 1 Archived YES Status INACTIVE Size 2147483648
ALTER DATABASE ADD LOGFILE THREAD 1 GROUP 117 ( '+RECO_EDP2' , '+DATA_EDP2' ) size 4g
create new group success
alter database drop logfile group 17
Dropped old group 17
Group 20 Thread 2 Archived YES Status INACTIVE Size 2147483648
ALTER DATABASE ADD LOGFILE THREAD 2 GROUP 120 ( '+RECO_EDP2' , '+DATA_EDP2' ) size 4g
create new group success
alter database drop logfile group 20
Dropped old group 20
Group 21 Thread 2 Archived YES Status INACTIVE Size 2147483648
ALTER DATABASE ADD LOGFILE THREAD 2 GROUP 121 ( '+RECO_EDP2' , '+DATA_EDP2' ) size 4g
create new group success
alter database drop logfile group 21
Dropped old group 21
Group 22 Thread 2 Archived YES Status INACTIVE Size 2147483648
ALTER DATABASE ADD LOGFILE THREAD 2 GROUP 122 ( '+RECO_EDP2' , '+DATA_EDP2' ) size 4g
create new group success
alter database drop logfile group 22
Dropped old group 22
Group 23 Thread 2 Archived NO Status CURRENT Size 2147483648
ALTER DATABASE ADD LOGFILE THREAD 2 GROUP 123 ( '+RECO_EDP2' , '+DATA_EDP2' ) size 4g
create new group success
alter database drop logfile group 23
Dropped old group 23
Group 24 Thread 2 Archived YES Status INACTIVE Size 2147483648
ALTER DATABASE ADD LOGFILE THREAD 2 GROUP 124 ( '+RECO_EDP2' , '+DATA_EDP2' ) size 4g
create new group success
alter database drop logfile group 24
Dropped old group 24
Group 30 Thread 3 Archived YES Status INACTIVE Size 2147483648
ALTER DATABASE ADD LOGFILE THREAD 3 GROUP 130 ( '+RECO_EDP2' , '+DATA_EDP2' ) size 4g
create new group success
alter database drop logfile group 30
Dropped old group 30
Group 31 Thread 3 Archived NO Status CURRENT Size 2147483648
ALTER DATABASE ADD LOGFILE THREAD 3 GROUP 131 ( '+RECO_EDP2' , '+DATA_EDP2' ) size 4g
create new group success
alter database drop logfile group 31
Dropped old group 31
Group 32 Thread 3 Archived YES Status INACTIVE Size 2147483648
ALTER DATABASE ADD LOGFILE THREAD 3 GROUP 132 ( '+RECO_EDP2' , '+DATA_EDP2' ) size 4g
create new group success
alter database drop logfile group 32
Dropped old group 32
Group 33 Thread 3 Archived YES Status INACTIVE Size 2147483648
ALTER DATABASE ADD LOGFILE THREAD 3 GROUP 133 ( '+RECO_EDP2' , '+DATA_EDP2' ) size 4g
create new group success
alter database drop logfile group 33
Dropped old group 33
Group 34 Thread 3 Archived YES Status INACTIVE Size 2147483648
ALTER DATABASE ADD LOGFILE THREAD 3 GROUP 134 ( '+RECO_EDP2' , '+DATA_EDP2' ) size 4g
create new group success
alter database drop logfile group 34
Dropped old group 34
Group 40 Thread 4 Archived YES Status INACTIVE Size 2147483648
ALTER DATABASE ADD LOGFILE THREAD 4 GROUP 140 ( '+RECO_EDP2' , '+DATA_EDP2' ) size 4g
create new group success
alter database drop logfile group 40
Dropped old group 40
Group 41 Thread 4 Archived YES Status INACTIVE Size 2147483648
ALTER DATABASE ADD LOGFILE THREAD 4 GROUP 141 ( '+RECO_EDP2' , '+DATA_EDP2' ) size 4g
create new group success
alter database drop logfile group 41
Dropped old group 41
Group 42 Thread 4 Archived YES Status INACTIVE Size 2147483648
ALTER DATABASE ADD LOGFILE THREAD 4 GROUP 142 ( '+RECO_EDP2' , '+DATA_EDP2' ) size 4g
create new group success
alter database drop logfile group 42
Dropped old group 42
Group 43 Thread 4 Archived YES Status INACTIVE Size 2147483648
ALTER DATABASE ADD LOGFILE THREAD 4 GROUP 143 ( '+RECO_EDP2' , '+DATA_EDP2' ) size 4g
create new group success
alter database drop logfile group 43
Dropped old group 43
Group 44 Thread 4 Archived YES Status INACTIVE Size 2147483648
ALTER DATABASE ADD LOGFILE THREAD 4 GROUP 144 ( '+RECO_EDP2' , '+DATA_EDP2' ) size 4g
create new group success
alter database drop logfile group 44
Dropped old group 44

PL/SQL procedure successfully completed.

Elapsed: 00:15:12.47

SYS>select group#,thread#,bytes,archived,status from v$log order by  group#, thread#;

    GROUP#    THREAD#            BYTES ARC STATUS
---------- ---------- ---------------- --- ----------------
       101          1    4,294,967,296 YES INACTIVE
       102          1    4,294,967,296 NO  CURRENT
       103          2    4,294,967,296 NO  CURRENT
       104          2    4,294,967,296 YES UNUSED
       105          1    4,294,967,296 YES UNUSED
       106          2    4,294,967,296 YES UNUSED
       107          3    4,294,967,296 NO  CURRENT
       108          3    4,294,967,296 YES UNUSED
       109          3    4,294,967,296 YES UNUSED
       110          4    4,294,967,296 NO  CURRENT
       111          4    4,294,967,296 YES UNUSED
       112          4    4,294,967,296 YES UNUSED
       113          1    4,294,967,296 YES UNUSED
       114          1    4,294,967,296 YES UNUSED
       115          1    4,294,967,296 YES UNUSED
       116          1    4,294,967,296 YES UNUSED
       117          1    4,294,967,296 YES UNUSED
       120          2    4,294,967,296 YES UNUSED
       121          2    4,294,967,296 YES UNUSED
       122          2    4,294,967,296 YES UNUSED
       123          2    4,294,967,296 YES UNUSED
       124          2    4,294,967,296 YES UNUSED
       130          3    4,294,967,296 YES UNUSED
       131          3    4,294,967,296 YES UNUSED
       132          3    4,294,967,296 YES UNUSED
       133          3    4,294,967,296 YES UNUSED
       134          3    4,294,967,296 YES UNUSED
       140          4    4,294,967,296 YES UNUSED
       141          4    4,294,967,296 YES UNUSED
       142          4    4,294,967,296 YES UNUSED
       143          4    4,294,967,296 YES UNUSED
       144          4    4,294,967,296 YES UNUSED

32 rows selected.

High load average, high IO wait, slow process startup, slow RAC communications

We have been having performance problems on our test Exadata for several months. I have opened five Oracle service requests for multiple symptoms. While the cpu utilization was fairly low, Oracle background processes would hang, the OEM 12c agent would hang, backup jobs would hang, we would experience slow communications between RAC nodes, half of the cpus would be in 100% IO wait state, the system load average would exceed 6000, etc. We noticed one of the NFS mounts was unreachable and it happened to be the mount point where we keep our DBA scripts. Processes would be in IO wait state “D” and we noticed several were the DBA scripts running from NFS. We could resolve the problem by killing these scripts. So I moved the backup scripts to local drives and eliminated some issues.
The problem kept returning though so I kept opening more SRs with no solution. Yesterday I escalated and had one of my SRs reopened. I was finally able to get to the correct person in Oracle support who gave me two things to try. The first was to add the “noac” option for the NFS mounts. The idea was that this would resolve issues where synchronous writes are induced. Since we are backing up to NFS using RMAN and tar this seemed a good bet. And it did help a lot. But we were still able to bring the problem back by tarring to NFS.
These NFS mounts are across Infiniband to an Exalogic ZFS storage system. The second fix was based on the fact that the new OEL kernel 11.2.3.2.1 update has memory management changes that may result in high TCP/IP traffic causing memory starvation for contiguous memory free space. See Knowledge base article 1546861.1 System hung with large numbers of page allocation failures with “order:5” : <Future Exadata releases will be changing the MTU size on the InfiniBand Interfaces to 7000 (down from 65520) for new installations, so the 7000 MTU for Exadata environments is known to be appropriate>  So I changed the Infiniband MTU from 65520 to 7000 and restarted the network service. That finally fixed the issue.

RAC and Grid Infrastructure Administration Accelerated Ed 3

This is a combination of two courses, “Oracle Grid Infrastructure 11g: Manage Clusterware” and “ASM  and Oracle Database 11g: RAC Administration”. The two courses are normally taught in five days and three days respectively.  This combination was taught in five days by extending the classroom time (8:30AM – 6:30PM). Knowledge  Services instructor Pat Frey taught the class in Columbus/Dublin OH. There is some overlap in the two courses so it’s not as bad as it sounds 🙂

There are lots of good labs in these courses and we did all of them except the overlapping parts. I was surprised that the virtual machines for our three node clusters worked so well. There were some performance issues and one student’s cluster became unresponsive on the last day.  Mine stayed up and I always finished the labs in about half the time other students took.

I couldn’t understand why it took the other students so long to work through the labs until I realized how much more experience I have with Oracle. For some of them it was their first exposure to Clusterware and RAC. I got the training materials the week before and one of the exercises actually solved an issue I was having on our Exadata RAC systems. Also, I have been using RAC since it first came out on 9i. Here’s a review of the RAC class I took back in 2003. So I think my experience and knowledge of Oracle in general made a huge difference in my ability to perform the lab work efficiently and quickly. I also asked a lot more questions for clarification than anyone else because I was able to see how the different features were relevant to production uptime and performance.

This was a very good course for me and I recommend it to anyone who already has a basic grounding in single-instance Oracle.

Exadata Database Machine Administration Workshop

I spent last week in downtown Chicago attending this class. It was a very good class and the instructor, Andy Fortunak was excellent. We had lots of labs to work through. The computers we used ran virtual database servers and virtual storage servers. This setup caused some problems, mainly slowness. My virtual database machine’s clusterware failed towards the end of the week but the virtual storage servers continued to work fine.

The main message was: invest the time needed to set up services and Database Resource Manager.  IO Resource Manager and Instance Caging depends on it.

We also discussed consolidation and using RAC one-node. We are consolidating scores of databases on Exadata. Most of them do not require multiple RAC nodes for workload but we do want high availability and load balancing. This week I tested manually moving one-node databases between nodes and it works great. The client does not get disconnected but does get an error message. Srvctl actually makes it a two node RAC and then stops the original node. I tested automatic failover of rac one-node and compared it with a full rac two-node with one instance down. They both work the way I wanted, a second instance starts up. I can’t see any technical reason for going with one-node.

Installing ASM/RAC without asmlib

I’m installing RAC (using ASM) on Red Hat Linux 6.2. Oracle has decided not to provide asmlib for Red Hat 6 and above. So I needed to figure out how to ensure correct permissions and ownership for the shared disks. Red Hat has an article on the support website which recommends using udev or multipath. I set up multipath on my VMware systems but it did not see any disks. VMware was not reporting the UUID’s of the disks to Linux. A little research on the VMware website gave me a clue, and I inserted this line into the vmx files:

disk.EnableUUID = “TRUE”

After rebooting, the multipath software could see the shared disks and automatically created the multipath devices.

RAC on AIX

I just finished the class: X0161 Oracle RAC on AIX Systems Workshop taught by Andrei Socoliuc of IBM Romania. Andrei knew a lot about Oracle RAC and AIX so it was a good class. I thought the best part was the hands on labs, especially the pre-installation preparation of the operating system. There was also a lot of good information on hardware and LPAR configuration. This is my first RAC class since 2003 when it had just been released, so the overview of RAC was a good way for me to get a refresher on RAC internals. My only complaint  is that we spent a lot of time on IBM’s shared disk solution (GPFS) and very little on Oracle’s ASM .

CRS-2800: Cannot start resource ‘ora.asm’ as it is already in the INTERMEDIATE state on server

I got this error when I was installing Oracle RAC 11.2 on Red Hat Enterprise Linux 5.6.  I was installing Clusterware using ASM on VMware shared disks.  When I created the independent persistent virtual disks, I left the “allocate all disk space now” option unselected. Oracleasm was happy on both RAC nodes. The Oracle installer was happy when it created +ASM1 on the first RAC node. But when the ASM instance started on node 2 it did not like the “virtually provisioned” disk. The +ASM2 instance was not open and was complaining about one of the shared disks being corrupt at a certain byte. When I checked the virtual disk files, I saw that Oracle was trying to read past the end of the file. I started all over with new fully allocated shared disk and that fixed the problem. Everything is up and running now.