Carlos Sierra's Tools and Tips

Tools and Tips for Oracle Performance and SQL Tuning

Posts Tagged ‘Execution Plan

Why do you need to gather CBO Statistics?

with 4 comments

As I help a peer with a SQL Tuning engagement, I face the frequent case of: “We do not want to gather CBO schema object statistics because we don’t want our Execution Plans to change”. Well, the bad news is that: not gathering stats only gives you a false sense of safety because your Execution Plans can change anyways. The reason has to do with Predicates referencing values out of range. Typical cases include range of dates, or columns seeded with values out of a sequence (surrogate keys). Most applications use them both. Example: predicate that references last X days of data. Imagine that date column on this predicate actually has statistics with low and high value that are outdated, lets say the high value refers to last time we gather stats (several months old). In such cases, the CBO uses some heuristics starting on 10g, where the cardinality of the Predicate is computed according to range of low/high and how far the value on Predicate is from this low/high range as per the stats. In short, the cardinality changes over time, as the Predicate on the last X days of data changes from one day to the next, and the next, and so on. At some point, the CBO may decide for a different Plan (with lower cost) and the performance of such SQL may indeed change abruptly. Then we scratch our heads and repeat to ourselves: but we did not gather statistics, why did the plan change?

So, if you understand the rationale above, then you would agree with the fact that: not updating CBO schema stats do not offer any real Plan Stability. So, my recommendation is simple: have reasonable CBO statistics and live with the possibility that some Plans will change (they would change anyways, even if you do not gather stats). Keep always in mind this:

The CBO has better chances to produce optimal Plans if we provide reasonable CBO statistics.

Now the good news: if you have some business critical SQL statements and you want them to have stable Plans, then Oracle already provides SQL Plan Management, which is designed exactly for Plan Stability. So, instead of gambling everyday, hoping for your Plans not to change preserving outdated stats, rather face reality, then gather stats, and create SQL Plan Baselines in those few SQL statements that may prove to have an otherwise unstable Plan and are indeed critical for your business. On 10g you can use SQL Profiles instead.

On 10g and 11g, just let the automatic job that gathers CBO schema statistics do its part. In most cases, that is good enough. If you have transient data, for example ETL tasks, then you may want to have the process workflow to gather stats on particular Tables as soon as the data is loaded or transformed and before it is consumed. The trick is simple: “have the stats represent the data at all times”. At the same time, there is no need to over do the stats, just care when the change on the data is sensible.

Written by Carlos Sierra

November 11, 2014 at 11:58 am

Finding SQL with Performance changing over time

with 11 comments

I upgraded my database a couple of weeks ago and now my users complain their application is slower. They do not provide specifics but they “feel” it is running slower. Sounds familiar?

Every once in a while I get a request that goes like this: “how can I find if some SQL on my database is performing worse over time?”

It is very hard to deal with the ambiguities of some problems like “finding SQL that performs worse or better over time”. But if you simplify the problem and consider for example “Elapsed Time per Execution”, then you can easily produce a script like the one below, which returns a small list of SQL statements that seem to experience either a regression or an improvement over time. It uses linear regression on the ratio between “Elapsed Time per Execution” and its Median per SQL.

Then, If you are suspecting you have some SQL that may have regressed and need a hand to identify them, you can try this script below. It is now part of a small collection of scripts that you can download and use for free out of the cscripts link on the right hand side of this page, under “Downloads”.

----------------------------------------------------------------------------------------
--
-- File name:   sql_performance_changed.sql
--
-- Purpose:     Lists SQL Statements with Elapsed Time per Execution changing over time
--
-- Author:      Carlos Sierra
--
-- Version:     2014/10/31
--
-- Usage:       Lists statements that have changed their elapsed time per execution over
--              some history.
--              Uses the ration between "elapsed time per execution" and the median of 
--              this metric for SQL statements within the sampled history, and using
--              linear regression identifies those that have changed the most. In other
--              words where the slope of the linear regression is larger. Positive slopes
--              are considered "improving" while negative are "regressing".
--
-- Example:     @sql_performance_changed.sql
--
-- Notes:       Developed and tested on 11.2.0.3.
--
--              Requires an Oracle Diagnostics Pack License since AWR data is accessed.
--
--              To further investigate poorly performing SQL use sqltxplain.sql or sqlhc 
--              (or planx.sql or sqlmon.sql or sqlash.sql).
--             
---------------------------------------------------------------------------------------
--
SPO sql_performance_changed.txt;
DEF days_of_history_accessed = '31';
DEF captured_at_least_x_times = '10';
DEF captured_at_least_x_days_apart = '5';
DEF med_elap_microsecs_threshold = '1e4';
DEF min_slope_threshold = '0.1';
DEF max_num_rows = '20';

SET lin 200 ver OFF;
COL row_n FOR A2 HEA '#';
COL med_secs_per_exec HEA 'Median Secs|Per Exec';
COL std_secs_per_exec HEA 'Std Dev Secs|Per Exec';
COL avg_secs_per_exec HEA 'Avg Secs|Per Exec';
COL min_secs_per_exec HEA 'Min Secs|Per Exec';
COL max_secs_per_exec HEA 'Max Secs|Per Exec';
COL plans FOR 9999;
COL sql_text_80 FOR A80;

PRO SQL Statements with "Elapsed Time per Execution" changing over time

WITH
per_time AS (
SELECT h.dbid,
       h.sql_id,
       SYSDATE - CAST(s.end_interval_time AS DATE) days_ago,
       SUM(h.elapsed_time_total) / SUM(h.executions_total) time_per_exec
  FROM dba_hist_sqlstat h, 
       dba_hist_snapshot s
 WHERE h.executions_total > 0 
   AND s.snap_id = h.snap_id
   AND s.dbid = h.dbid
   AND s.instance_number = h.instance_number
   AND CAST(s.end_interval_time AS DATE) > SYSDATE - &&days_of_history_accessed. 
 GROUP BY
       h.dbid,
       h.sql_id,
       SYSDATE - CAST(s.end_interval_time AS DATE)
),
avg_time AS (
SELECT dbid,
       sql_id, 
       MEDIAN(time_per_exec) med_time_per_exec,
       STDDEV(time_per_exec) std_time_per_exec,
       AVG(time_per_exec)    avg_time_per_exec,
       MIN(time_per_exec)    min_time_per_exec,
       MAX(time_per_exec)    max_time_per_exec       
  FROM per_time
 GROUP BY
       dbid,
       sql_id
HAVING COUNT(*) >= &&captured_at_least_x_times. 
   AND MAX(days_ago) - MIN(days_ago) >= &&captured_at_least_x_days_apart.
   AND MEDIAN(time_per_exec) > &&med_elap_microsecs_threshold.
),
time_over_median AS (
SELECT h.dbid,
       h.sql_id,
       h.days_ago,
       (h.time_per_exec / a.med_time_per_exec) time_per_exec_over_med,
       a.med_time_per_exec,
       a.std_time_per_exec,
       a.avg_time_per_exec,
       a.min_time_per_exec,
       a.max_time_per_exec
  FROM per_time h, avg_time a
 WHERE a.sql_id = h.sql_id
),
ranked AS (
SELECT RANK () OVER (ORDER BY ABS(REGR_SLOPE(t.time_per_exec_over_med, t.days_ago)) DESC) rank_num,
       t.dbid,
       t.sql_id,
       CASE WHEN REGR_SLOPE(t.time_per_exec_over_med, t.days_ago) > 0 THEN 'IMPROVING' ELSE 'REGRESSING' END change,
       ROUND(REGR_SLOPE(t.time_per_exec_over_med, t.days_ago), 3) slope,
       ROUND(AVG(t.med_time_per_exec)/1e6, 3) med_secs_per_exec,
       ROUND(AVG(t.std_time_per_exec)/1e6, 3) std_secs_per_exec,
       ROUND(AVG(t.avg_time_per_exec)/1e6, 3) avg_secs_per_exec,
       ROUND(MIN(t.min_time_per_exec)/1e6, 3) min_secs_per_exec,
       ROUND(MAX(t.max_time_per_exec)/1e6, 3) max_secs_per_exec
  FROM time_over_median t
 GROUP BY
       t.dbid,
       t.sql_id
HAVING ABS(REGR_SLOPE(t.time_per_exec_over_med, t.days_ago)) > &&min_slope_threshold.
)
SELECT LPAD(ROWNUM, 2) row_n,
       r.sql_id,
       r.change,
       TO_CHAR(r.slope, '990.000MI') slope,
       TO_CHAR(r.med_secs_per_exec, '999,990.000') med_secs_per_exec,
       TO_CHAR(r.std_secs_per_exec, '999,990.000') std_secs_per_exec,
       TO_CHAR(r.avg_secs_per_exec, '999,990.000') avg_secs_per_exec,
       TO_CHAR(r.min_secs_per_exec, '999,990.000') min_secs_per_exec,
       TO_CHAR(r.max_secs_per_exec, '999,990.000') max_secs_per_exec,
       (SELECT COUNT(DISTINCT p.plan_hash_value) FROM dba_hist_sql_plan p WHERE p.dbid = r.dbid AND p.sql_id = r.sql_id) plans,
       REPLACE((SELECT DBMS_LOB.SUBSTR(s.sql_text, 80) FROM dba_hist_sqltext s WHERE s.dbid = r.dbid AND s.sql_id = r.sql_id), CHR(10)) sql_text_80
  FROM ranked r
 WHERE r.rank_num <= &&max_num_rows.
 ORDER BY
       r.rank_num
/

SPO OFF;

Once you get the output of this script above, you can use the one below to actually list the time series for one of the SQL statements of interest:

----------------------------------------------------------------------------------------
--
-- File name:   one_sql_time_series.sql
--
-- Purpose:     Performance History for one SQL
--
-- Author:      Carlos Sierra
--
-- Version:     2014/10/31
--
-- Usage:       Script sql_performance_changed.sql lists SQL Statements with performance
--              improvement or regressed over some History.
--              This script one_sql_time_series.sql lists the Performance Time Series for
--              one SQL.
--
-- Parameters:  SQL_ID
--
-- Example:     @one_sql_time_series.sql
--
-- Notes:       Developed and tested on 11.2.0.3.
--
--              Requires an Oracle Diagnostics Pack License since AWR data is accessed.
--
--              To further investigate poorly performing SQL use sqltxplain.sql or sqlhc 
--              (or planx.sql or sqlmon.sql or sqlash.sql).
--             
---------------------------------------------------------------------------------------
--
SPO one_sql_time_series.txt;
SET lin 200 ver OFF;

COL instance_number FOR 9999 HEA 'Inst';
COL end_time HEA 'End Time';
COL plan_hash_value HEA 'Plan|Hash Value';
COL executions_total FOR 999,999 HEA 'Execs|Total';
COL rows_per_exec HEA 'Rows Per Exec';
COL et_secs_per_exec HEA 'Elap Secs|Per Exec';
COL cpu_secs_per_exec HEA 'CPU Secs|Per Exec';
COL io_secs_per_exec HEA 'IO Secs|Per Exec';
COL cl_secs_per_exec HEA 'Clus Secs|Per Exec';
COL ap_secs_per_exec HEA 'App Secs|Per Exec';
COL cc_secs_per_exec HEA 'Conc Secs|Per Exec';
COL pl_secs_per_exec HEA 'PLSQL Secs|Per Exec';
COL ja_secs_per_exec HEA 'Java Secs|Per Exec';

SELECT h.instance_number,
       TO_CHAR(CAST(s.end_interval_time AS DATE), 'YYYY-MM-DD HH24:MI') end_time,
       h.plan_hash_value, 
       h.executions_total,
       TO_CHAR(ROUND(h.rows_processed_total / h.executions_total), '999,999,999,999') rows_per_exec,
       TO_CHAR(ROUND(h.elapsed_time_total / h.executions_total / 1e6, 3), '999,990.000') et_secs_per_exec,
       TO_CHAR(ROUND(h.cpu_time_total / h.executions_total / 1e6, 3), '999,990.000') cpu_secs_per_exec,
       TO_CHAR(ROUND(h.iowait_total / h.executions_total / 1e6, 3), '999,990.000') io_secs_per_exec,
       TO_CHAR(ROUND(h.clwait_total / h.executions_total / 1e6, 3), '999,990.000') cl_secs_per_exec,
       TO_CHAR(ROUND(h.apwait_total / h.executions_total / 1e6, 3), '999,990.000') ap_secs_per_exec,
       TO_CHAR(ROUND(h.ccwait_total / h.executions_total / 1e6, 3), '999,990.000') cc_secs_per_exec,
       TO_CHAR(ROUND(h.plsexec_time_total / h.executions_total / 1e6, 3), '999,990.000') pl_secs_per_exec,
       TO_CHAR(ROUND(h.javexec_time_total / h.executions_total / 1e6, 3), '999,990.000') ja_secs_per_exec
  FROM dba_hist_sqlstat h, 
       dba_hist_snapshot s
 WHERE h.sql_id = '&sql_id.'
   AND h.executions_total > 0 
   AND s.snap_id = h.snap_id
   AND s.dbid = h.dbid
   AND s.instance_number = h.instance_number
 ORDER BY
       h.sql_id,
       h.instance_number,
       s.end_interval_time,
       h.plan_hash_value
/

SPO OFF;

Written by Carlos Sierra

November 2, 2014 at 4:22 am

Oracle Queries are taking different Plans

with 4 comments

I recently saw a post with this title “Oracle Queries are taking different Plans” and since there was a lot of guessing I suggested to use SQLTXPLAIN and find there the root cause. For some reason or another the thread is still open after 20 days, and the guessing continues. Since the problem of “finding the root cause of a change on an execution plan” is old and still common, I decided to talk about it here.

In Oracle, execution plans do change for one SQL over time, that is a fundamental function of the Cost-based Optimizer (CBO). Not only is normal, but expected and desired in most circumstances. What is not desired, is when a SQL that had a good performing plan suddenly changes its plan to one that we consider sub-optimal in terms of performance. This should be the concern and focus. So WHY the CBO suddenly generates such a plan? The possible reasons are many but I enlist here the most common ones I can think of:

  • Schema object statistics changes
  • Small sample sizes when gathering CBO schema statistics (mostly on 10g)
  • Lack of statistics in some columns referenced by predicates
  • Inconsistent values of bind variables referenced by SQL
  • System wide changes like CBO parameters or system statistics
  • Index modifications including new and dropped indexes
  • Invalid or invisible indexes
  • Large SQL text with many predicates where several plans with similar cost are possible
  • Inconsistent ramp-up process of Adaptive Cursor Sharing (ACS)
  • SQL Profiles generated by the SQL Tuning Advisor (STA)
  • An index changing its blevel (for example from blevel 2 to 3)
  • Volatile values in some columns (for example value “Pending” exists in table column for a short time)
  • Asynchronous gathering of stats on volatile data instead of gathering right between load and consumption

The list above is not comprehensive. It only includes what came to my mind while writing it, so I may have forgotten several cases. And of course there are always some corner cases like special conditions activating uncommon heuristics, or simply hitting a CBO bug.

Guessing WHY a plan has changed is very time consuming and requires multiple queries to the metadata on the database. Since most of the metadata we are interested on is dynamic by nature, we base our guess work on an ever changing environment. Of course the root cause become elusive and we may give up for some time, or jump to false conclusions.

This is where SQLTXPLAIN (MOS 215187.1) or its younger and lighter sister SQLHC (MOS 1366133.1) come to the rescue. These free tools, specially SQLTXPLAIN (SQLT), capture the static and dynamic environment around the one SQL we want to analyze.

By having a consistent view of all elements affecting the execution plan, it becomes much easier to find the root cause of a plan change. And since SQLT also includes current and past details about parameters, statistics and bind variable values, we can even correlate those values to prior execution plans. So I encourage you to try SQLTXPLAIN on a SQL next time you are confronted by the old question: “WHY Oracle Queries are taking different Plans?”.

I am aware SQLTXPLAIN is intimidating. Its output is exuberant, but the point is that it includes what we need to find the root cause of most if not all SQL Tuning cases where a plan has changed.

I have presented this SQLT tool in several conferences during the past year or so. I think it is appropriate now to put together a 3 days workshop to master this tool… Just an idea at this time.