When I first heard about the virtual floppy bug called Venom I wondered if all Xen guests were affected. I quickly discovered that paravirtualized X86 guests are NOT affected. I was pretty sure the Linux guests running on our Exalogic were paravirtualized so I didn’t worry about it. Over the weekend I noticed more publicity about Venom and I decided I should make sure. I don’t have access to Dom0 so I needed to see the virtualization mode from within the guest OS. After a little research I found that if you are using the PVHVM drivers (xen-blkfront for disk, and xen-netfront for network) you are paravirtualized on Xen. I checked lsmod and verified my initial assumption was correct.
> lsmod | grep -i xen
xen_netfront 16420 0
xen_blkfront 13602 7
In order to provide good coverage at work I needed to take this class between the time we hired a new DBA and the time our contract DBA rolled off our project. That narrowed the choice down to classes scheduled for September. The only class available was in San Jose CA. I had never visited any of the towns south of San Francisco so this was a good chance to experience the Silicon Valley culture. My hotel was walking distance from the classroom and there are several great places to eat along the way.
At work we are using Exalogic to host virtual machines on OVM. I had already dabbled in OVM and had set up some RAC databases on OVM running under Virtualbox on my Toshiba Qosmio before class (I bought the Qosmio with 32G specifically to run database VMs). I think this was good preparation for the class since there was a similar setup in our labs. We used remote OVM servers that hosted our lab OVM manager and our lab OVM servers. So we were running two levels of virtualization. We installed OVM manager and OVM server in the first few labs (one OVM server was pre-built). There were other labs for configuring storage, networks, guest OS creation, templates, etc. I enjoyed the labs a lot.
The last virtualization class I had attended was for VMware back in 2008. I have used VMware ESX and Workstation over the past several years. I was pleasantly surprised with the performance of our lab systems.
We had an excellent instructor, Hans Forbrich (aka Fuzzy Graybeard). I knew of Hans because he’s an Oracle Ace director and Daniel Morgan, another Ace director had presented to our local Oracle User’s Group in July. One of Dan’s slides is about Hans. The name was also familiar because Hans has contributed to the Oracle-l mail group for several years and before that was a frequent contributor to the Usenet group comp.databases.oracle. If you attend “RAC Attack” at Oracle Open World you will probably see him. He is very knowledgeable about Oracle Virtual Machines and was able to answer most questions immediately. If he couldn’t answer, he would research during our lab time and was always able to provide a satisfactory answer.
I would recommend this class to anyone who wants to get started administering an Oracle Virtual Machine environment.
We have been having performance problems on our test Exadata for several months. I have opened five Oracle service requests for multiple symptoms. While the cpu utilization was fairly low, Oracle background processes would hang, the OEM 12c agent would hang, backup jobs would hang, we would experience slow communications between RAC nodes, half of the cpus would be in 100% IO wait state, the system load average would exceed 6000, etc. We noticed one of the NFS mounts was unreachable and it happened to be the mount point where we keep our DBA scripts. Processes would be in IO wait state “D” and we noticed several were the DBA scripts running from NFS. We could resolve the problem by killing these scripts. So I moved the backup scripts to local drives and eliminated some issues.
The problem kept returning though so I kept opening more SRs with no solution. Yesterday I escalated and had one of my SRs reopened. I was finally able to get to the correct person in Oracle support who gave me two things to try. The first was to add the “noac” option for the NFS mounts. The idea was that this would resolve issues where synchronous writes are induced. Since we are backing up to NFS using RMAN and tar this seemed a good bet. And it did help a lot. But we were still able to bring the problem back by tarring to NFS.
These NFS mounts are across Infiniband to an Exalogic ZFS storage system. The second fix was based on the fact that the new OEL kernel 18.104.22.168.1 update has memory management changes that may result in high TCP/IP traffic causing memory starvation for contiguous memory free space. See Knowledge base article 1546861.1 System hung with large numbers of page allocation failures with “order:5” : <Future Exadata releases will be changing the MTU size on the InfiniBand Interfaces to 7000 (down from 65520) for new installations, so the 7000 MTU for Exadata environments is known to be appropriate> So I changed the Infiniband MTU from 65520 to 7000 and restarted the network service. That finally fixed the issue.