Use cx-Oracle in Python to bulk load arbitrary tables using csv files

I recently got a request to load hundreds of database tables in a day and a half using csv files our developers created. I considered using SQL Loader but I didn’t want to spend the time understanding the table structure and creating the control file for each table. Another alternative was to use SQL Developer to import the csv files. That goes against my nature because it would be a manual, repetitive and error prone process.

The csv files were fairly small so plain inserts would work. I looked at writing a BASH script to generate the insert statements and it became very complex very quickly. PL/SQL was an option too but I ruled it out because of the requirement to read flat files. I have been using the Python cx-Oracle module recently so I decided to write a Python script for this task.

Python turned out to be the right choice and the script was very easy to write. The only tricky part was recognizing the date fields. Those are unquoted like a numeric field but contain literals like JAN,FEB, etc. Python and cx-Oracle handled everything with a minimum of effort on my part.

Create the csv files in a dedicated subdirectory like : tablename.csv. The first row of each csv file must contain the column names. The script truncates the target table before inserting the csv records. Any double quoted field is considered a literal and unquoted fields are numeric except date fields which must have the format 01-JAN-16.

Use these GitHub links for more information : ReadmeSource code

Oracle ZFS Storage Appliance Administration Ed 1 LVC

This was a four day course delivered online. The Oracle labs were inaccessible from my work laptop and from my work network so I used my personal laptop from home. This turned out to be a good thing because work interruptions were minimal and I could focus on the class.

The instructor Barry Isaacman has many years of experience with Sun hardware and has done many ZFS Appliance installations. So he had lots of tips on how to set up things optimally. He also provided documents outside the course materials to support further learning. I was really interested in the API and Python integration and he was very helpful.

The labs are excellent and they utilize real ZFS Appliance hardware so performance is pretty good. There is also a ZFS Appliance simulator here.The hardest part was interpreting the class specific instructions for student IP address assignments. After getting the initial settings correct everything worked fine.

I was able to immediately apply my new knowledge as soon as I returned to work this week. We have recently installed ZFS appliances for database backups over Infiniband in two data centers and the installer set up replication between them. I reviewed the configuration and was pleased that the best practices in the class were implemented in the new installation.

I’m just starting to experiment with the API using Python. Here are some links that are promising:




Oracle In Memory Column Store

I attended the Greater Cincinnati Oracle User’s Group meeting yesterday at the local Oracle office. I was expecting a good presentation because Maria Colgan (@sqlmaria) was presenting.

I was not disappointed and this was an excellent presentation and conversation with Maria who used to head up the optimizer team and now runs the In Memory team.  She went into deep dives of a lot of technical innovations they have used for performance and reliability. I took lots of notes.

All of the code is embedded in Oracle database software (12.1) so it’s just a parameter change to turn it on. The only other step is to alter table INMEMORY. Some of the inmemory stuff can be offloaded to Exadata but you have to pay the license when you turn it on.

In Oracle 12.2 the whole thing can run in exadata flash instead of in the db server dram. Oracle 12.2 will be out “sometime this year”.

Setting up Ansible and cx_Oracle

I have been automating and centralizing dba and system administration processes using Python recently. One of the problems I solved was how to create Bash and SQL scripts to run on the remote database servers, copy them to the database servers and then execute them as another Linux account. After I got my Python scripts working perfectly, I discovered that I had re-invented Ansible:-)

My scripts would have been much less complex and easier to write if I had done them with Ansible in mind. So that’s how I will write them going forward. I also wanted to implement a DBA Data Warehouse so I added cx_Oracle to the mix and used it to store the admin data in a database. I have been using a local virtual machine for Python development and the database was just a local installation on the VM.

Before moving to production I needed to figure out an easy way to get this stuff installed on the production Linux box. Setting it up on my VM was an adventure in Googling and trial and error. So here’s the documentation on how to set it up. You need to define the shell variables first.

export http_proxy=http://$my_userid:$my_password@$proxy_server:$port
export https_proxy=https://$my_userid:$my_password@$proxy_server:$port
wget -O - | python
easy_install pip
export ORACLE_HOME=/base/oracle/product/12102
yum install python-devel
pip install --proxy=https://$my_userid:$my_pasword@$proxy_server:$port cx_Oracle
pip install --proxy=https://$my_userid:$my_pasword@$proxy_server:$port ansible


How to tell if your guest VM is vulnerable to Venom

When I first heard about the virtual floppy bug called Venom I wondered if all Xen guests were affected. I quickly discovered that paravirtualized X86 guests are NOT affected. I was pretty sure the Linux guests running on our Exalogic were paravirtualized so I didn’t worry about it. Over the weekend I noticed more publicity about Venom and I decided I should make sure. I don’t have access to Dom0 so I needed to see the virtualization mode from within the guest OS. After a little research I found that if you are using the PVHVM drivers (xen-blkfront for disk, and xen-netfront for network)  you are paravirtualized on Xen. I checked lsmod and verified my initial assumption was correct.

> lsmod | grep -i xen
xen_netfront 16420 0
xen_blkfront 13602 7


Data Masking using SQL Loader

I use Enterprise Manager Cloud Control to mask Oracle databases so they can be shared with our ERP vendor. I intended to do the same with our legacy Informix database but I found that we did not have the Database Gateway licenses required for using Enterprise Manager Data Masking Pack on Informix. We had a requirement to replace customer names and addresses with a set of twenty arbitrary values. Due to referential integrity, the replacement values had to be deterministic/repeatable – not random.

The Informix DBA sent me flat file exports from the Informix tables containing the customer records. I mounted these as external tables and wrote SQL to select from them and write to Oracle tables. This was so slow that I abandoned that idea and decided to load the files into Oracle tables using SQL Loader. It’s been so long since I have used SQL Loader that I had to relearn how to build control files. I was amazed at how fast the data loaded. I had the idea of using SQL Loader to mask the data during the load.

I decided to use the oracle_hash function inside a case statement in the SQL Loader control file specification for the sensitive columns. This let me replace the names and addresses with new values in a repeatable way. I forced the names to uppercase and then produced a hash value which I associated with the fictional name. That fictional name got loaded into the Oracle table.

I am very satisfied with the results and happy I had the opportunity to re-familiarize myself with SQL Loader.

Here’s the content of the control file:

direct=true, rows=200000, multithreading=true
load data
infile customers.unl badfile customers.bad discardfile customers.dis
into table customers
fields terminated by ‘|’
first_name “case
when (ora_hash(upper(:first_name),20,0) = 0) then ‘Andy’
when (ora_hash(upper(:first_name),20,0) = 1) then ‘Brenda’
when (ora_hash(upper(:first_name),20,0) = 2) then ‘Jim’
when (ora_hash(upper(:first_name),20,0) = 3) then ‘Cathy’
when (ora_hash(upper(:first_name),20,0) = 4) then ‘Bob’
when (ora_hash(upper(:first_name),20,0) = 5) then ‘Sue’
when (ora_hash(upper(:first_name),20,0) = 6) then ‘Joe’
when (ora_hash(upper(:first_name),20,0) = 7) then ‘Carmen’
when (ora_hash(upper(:first_name),20,0) = 8) then ‘Jose’
when (ora_hash(upper(:first_name),20,0) = 9) then ‘Claire’
when (ora_hash(upper(:first_name),20,0) = 10) then ‘Ron’
when (ora_hash(upper(:first_name),20,0) = 11) then ‘Misty’
when (ora_hash(upper(:first_name),20,0) = 12) then ‘Jason’
when (ora_hash(upper(:first_name),20,0) = 13) then ‘Glenda’
when (ora_hash(upper(:first_name),20,0) = 14) then ‘Rick’
when (ora_hash(upper(:first_name),20,0) = 15) then ‘Suzy’
when (ora_hash(upper(:first_name),20,0) = 16) then ‘Peter’
when (ora_hash(upper(:first_name),20,0) = 17) then ‘Sara’
when (ora_hash(upper(:first_name),20,0) = 18) then ‘Isaac’
when (ora_hash(upper(:first_name),20,0) = 19) then ‘Helen’
when (ora_hash(upper(:first_name),20,0) = 20) then ‘Nathan’
else ‘ERROR’ end”,
last_name “case
when (ora_hash(upper(:last_name),20,0) = 0) then ‘Jones’
when (ora_hash(upper(:last_name),20,0) = 1) then ‘Hay’
when (ora_hash(upper(:last_name),20,0) = 2) then ‘Smith’
when (ora_hash(upper(:last_name),20,0) = 3) then ‘Gonzales’
when (ora_hash(upper(:last_name),20,0) = 4) then ‘Wu’
when (ora_hash(upper(:last_name),20,0) = 5) then ‘Cho’
when (ora_hash(upper(:last_name),20,0) = 6) then ‘Einstein’
when (ora_hash(upper(:last_name),20,0) = 7) then ‘Ward’
when (ora_hash(upper(:last_name),20,0) = 8) then ‘Magee’
when (ora_hash(upper(:last_name),20,0) = 9) then ‘Morgan’
when (ora_hash(upper(:last_name),20,0) = 10) then ‘Tenkiller’
when (ora_hash(upper(:last_name),20,0) = 11) then ‘Rice’
when (ora_hash(upper(:last_name),20,0) = 12) then ‘Namath’
when (ora_hash(upper(:last_name),20,0) = 13) then ‘DeMarco’
when (ora_hash(upper(:last_name),20,0) = 14) then ‘Obama’
when (ora_hash(upper(:last_name),20,0) = 15) then ‘Bush’
when (ora_hash(upper(:last_name),20,0) = 16) then ‘Grant’
when (ora_hash(upper(:last_name),20,0) = 17) then ‘Lee’
when (ora_hash(upper(:last_name),20,0) = 18) then ‘Blount’
when (ora_hash(upper(:last_name),20,0) = 19) then ‘Bain’
when (ora_hash(upper(:last_name),20,0) = 20) then ‘Adams’
else ‘ERROR’ end”,
address1 “case
when (ora_hash(upper(:address1),20,0) = 0) then ’10 Main’
when (ora_hash(upper(:address1),20,0) = 1) then ’11 1st’
when (ora_hash(upper(:address1),20,0) = 2) then ’22 2nd’
when (ora_hash(upper(:address1),20,0) = 3) then ’33 3rd’
when (ora_hash(upper(:address1),20,0) = 4) then ’44 4th’
when (ora_hash(upper(:address1),20,0) = 5) then ’55 5th’
when (ora_hash(upper(:address1),20,0) = 6) then ’66 6th’
when (ora_hash(upper(:address1),20,0) = 7) then ’77 7th’
when (ora_hash(upper(:address1),20,0) = 8) then ’88 8th’
when (ora_hash(upper(:address1),20,0) = 9) then ’99 9th’
when (ora_hash(upper(:address1),20,0) = 10) then ‘100 10th’
when (ora_hash(upper(:address1),20,0) = 11) then ‘111 11th’
when (ora_hash(upper(:address1),20,0) = 12) then ‘222 22nd’
when (ora_hash(upper(:address1),20,0) = 13) then ‘333 33rd’
when (ora_hash(upper(:address1),20,0) = 14) then ‘444 44th’
when (ora_hash(upper(:address1),20,0) = 15) then ‘555 55th’
when (ora_hash(upper(:address1),20,0) = 16) then ‘666 66th’
when (ora_hash(upper(:address1),20,0) = 17) then ‘777 77th’
when (ora_hash(upper(:address1),20,0) = 18) then ‘888 88th’
when (ora_hash(upper(:address1),20,0) = 19) then ‘999 99th’
when (ora_hash(upper(:address1),20,0) = 20) then ‘2000 20th’
else ‘ERROR’ end”,
phone “upper(‘8005551212’)”,
email “upper(‘me\’)” ,
create_date date ‘mm/dd/yyyy’,