We recently bought another company and we are in the process of integrating our systems with theirs. One requirement was to import their daily business data into our Business Intelligence system for reporting. We agreed they would provide an Oracle datapump export file for us to import. We wanted to automate this and have a non-privileged application owner perform the import. I created a stored procedure that could run the import as the system user and granted execute on the procedure to the application owner. The dbms_datapump package has lots of options and it was fun exploring them to find the best way of doing this.
Run as sys :
grant execute on dbms_lock to system;
Run as system:
CREATE OR REPLACE DIRECTORY SOURCE AS '/nfsmount/exp/incoming';
CREATE OR REPLACE PROCEDURE SYSTEM.SCHEMA_IMPORT (
dumpfilename IN VARCHAR2)
h1 NUMBER; -- data pump job handle
job_state VARCHAR2 (30);
status ku$_Status; -- data pump status
PRAGMA EXCEPTION_INIT (job_not_exist, -31626);
DBMS_DATAPUMP.open (operation => 'IMPORT',
job_mode => 'SCHEMA',
job_name => NULL);
DBMS_DATAPUMP.set_parameter (h1, 'TABLE_EXISTS_ACTION', 'TRUNCATE');
DBMS_DATAPUMP.add_file (h1, dumpfilename, 'SOURCE');
dumpfilename || '.log',
job_state := 'UNDEFINED';
WHILE (job_state != 'COMPLETED') AND (job_state != 'STOPPED')
handle => h1,
mask => DBMS_DATAPUMP.ku$_status_job_error
timeout => -1);
job_state := status.job_status.state;
DBMS_OUTPUT.put_line ('job finished');
grant execute on schema_import to APP_OWNER;
Run as APP_OWNER:
We have a new DBA in training and part of my job is to mentor and demonstrate good Oracle practices. Here’s a write up I made for him on how I solved a performance issue.
Issue: The BI team complained about a large query in the production BI database that had been running slowly for the past few days.
Steps to resolve:
|Find the past three days SQL ids
||Found three different SQL ids for the past three days.
||@find_sql_awr (uses dba_hist_* views)
|Review execution plan for past three days
||The optimizer saw each day as separate sql because bind variables are not used but it used the same plan anyway. The implication is that stats are stable over past three days. SQL Monitor showed a mismatch between expected rows and actual rows returned each step (this was apparent to me while the SQL was running).
||@dplan_awr (uses dbms_xplan.display_awr)
|Determine the driving tables
||Identified three tables – queried dba_tables to find owner
|Check statistics variability over the past 10 days
||Two tables showed no change in stats for past 10 days.
||@diff_table_stats (uses dbms_stats.diff_table_stats_in_history)
|Gather fresh stats on all three tables
|Check stats variability again
||One indexed column on one table changed from having 2 unique values to 70.
|Generate a new execution plan, force matching
||New plan’s calculated cost is less than half the original plan. However the real performance improvement is a lot better because the original plan was calculated on old statistics and the original calculated cost was reported as much, much lower than the actual cost.
||SQL Tuning Advisor
The underlying data changed over the past 10 days. The SQL was being re-parsed every day because bind variables are not used. The automatic statistics gathering job did not recalculate statistics on two of the tables because it has a 10% threshold. The data changed less than 10% for the whole table even though one column changed significantly. When the optimizer saw the new statistics, it changed the indexes it uses. I added the force matching option so that the optimizer wouldn’t need to re-parse every day just because bind variables are not used. Quote from the BI team: “It is now running in single digit minutes as opposed to hours.”
The cardinality presentation by Maria Colgan and the scripts presentation by Kerry Osborne provide more background on this method.
Here’s a good article about scaling data warehouses to infinity.