Carlos Sierra's Tools and Tips

Tools and Tips for Oracle Performance and SQL Tuning

Archive for the ‘Health-Checks’ Category

Non-intrusive SQL Trace instrumentation on legacy PL/SQL code

leave a comment »

Problem

Legacy PL/SQL code with intermittent performance degradation.

To improve the performance of this code, the first step is to diagnose it. But this code has no instrumentation whatsoever, it is in Production, and rolling any code into Production usually requires rigorous testing. So, whatever method we implement has to be light-weight and as safe as possible.

Using Oradebug is not a viable solution mainly for two reasons: It misses the “head” of the transaction, so we may not get to know the SQL taking longer; and second, internal procedures from finding about the issue, reporting it, then acting on it may take from several minutes to hours.

Solution

  1. Identify which PL/SQL libraries are known to be problematic in terms of intermittent performance degradation.
  2. At the beginning  of each callable PL/SQL Procedure or Function, add a call to DBMS_APPLICATION_INFO.SET_MODULE to set some appropriate MODULE and ACTION, for example “R252, LOAD”. Call this API also at the end, to NULL out these two parameters. This code change is very small and safe. It introduces practically no overhead. It simply labels every SQL executed by the PL/SQL library with some MODULE and ACTION that uniquely identify the code of concern.
  3. Activate SQL Trace on the module/action that needs to be traced, by calling DBMS_MONITOR.SERV_MOD_ACT_TRACE_ENABLE, passing parameters SERVICE, MODULE and ACTION. With this API request a SQL Trace to be generated with WAITs and BINDs (binds are optional but desirable). Once these traces are no longer needed (reviewed by someone), turn SQL Trace off using API DBMS_MONITOR.SERV_MOD_ACT_TRACE_DISABLE.
  4. Once the SQL Trace is produced, generate a TKPROF on it. You may want to include parameter “sort=exeela fchela”. This way you get the slower SQL at the top of the TKPROF report.
  5. With SQL Trace and TKPROF, identify the slower SQL and use SQL Monitor and/or SQL XTRACT to get more granular diagnostics (you need to identify SQL_ID). On any given PL/SQL library, it is common that 1~5 SQL statements consume > 80% of the Elapsed Time. Focus on these large consumers.

Conclusion

Producing a SQL Trace with EVENT 10046 level 8 or 12 is very useful to properly diagnose the code on a PL/SQL library which performs poorly. A follow-up on the slower SQL with SQL Monitor and/or SQLT XTRACT is in order. The method presented above is very easy and safe to implement.

Advertisements

Written by Carlos Sierra

December 23, 2013 at 12:23 pm

Counting rows fast

with one comment

A friend of mine asked me last night basically this: “How is that SQLTXPLAIN counts rows?”. In particular, he was referring to the use of the SAMPLE clause of the SELECT statement. Look at this SQLT’s log piece:

SQL_ID a9x1kc4ymyhkz
--------------------
SELECT /*+ FULL(t) PARALLEL */ COUNT(*) * 1e4
FROM "XYPZ"."INSTRUMENT" SAMPLE (.01) t

SQL_ID 025v6k1032t69
--------------------
SELECT /*+ FULL(t) PARALLEL */ COUNT(*) * 1e5
FROM "XYPZ"."POSITION_COMPOSITION" SAMPLE (.001) t

SQL_ID 8rby3340xpd9k
--------------------
SELECT /*+ FULL(t) PARALLEL */ COUNT(*) * 1e5
FROM "XYPZ"."POSITION_EVENT" SAMPLE (.001) t

WHY is it that SQLT has to count rows?

SQLT has to count rows so it can report side by side DBA_TABLES.NUM_ROWS and COUNT(*) from each Table. This is an easy way to see if your statistics are way off, and this mechanism exists on SQLT well before DBA_TAB_MODIFICATIONS came to existence. Actually, SQLT uses both methods to health-check how stale are your Table statistics.

The conundrum here is: “I use SQLT because I want to diagnose a performance issue on a QUERY on top of large Tables, but I do not want SQLT to take a long time just to produce a COUNT(*) of my Tables…”.

Fast versus Precise

In Performance tuning, there is always a trade-off. You want X but you sacrifice B. Counting rows is no different. Do you want it faster? Then you sacrifice precision. The SAMPLE clause of the SELECT statement allows you to do exactly that (syntax below):

SAMPLE [ BLOCK ] ( sample_percent ) [ SEED ( seed_value ) ]

So, if you specify a 10% sample size then you have to multiply the COUNT(*) by 10. If you sample 1% you multiply the COUNT(*) by 100. In large Tables if you sample, lets say 0.1%, your multiplier becomes 1,000, which is the same than 1e3 (10**3 or 10^3 depending where you went to school). Sample size can be as small as 0.000,001 and as large as 100 (but without including 100 itself). It represents probabilities more than an actual sample size.

The optional BLOCK clause simple says: use sample blocks instead of rows. And the optional SEED clause tries to provide some consistency in the result of the count when you use the same value for two executions of the exact same count. This SEED clause takes a value between 0 and 4,294,967,295.

How SQLT counts rows?

SQLT has over 40 tool parameters. One of them is count_star_threshold with a seeded value of 10,000.

SQLT includes a small algorithm (below) that determines the size of the SAMPLE according to the estimated size of the Table itself, by looking at its statistics as per DBA_TABLES.NUM_ROWS. No statistics? then skip the sample and do a normal full scan. If the Table is expected to be smaller then the count_star_threshold, then do a full scan. So is up to 10x this threshold. After that, use a sample size proportionally inverse to the Table size. The bigger the Table the smaller the Sample.

SQLT also forces a full Table scan and invokes Parallel Execution (PX) as a method to expedite the count. This count can be really fast on Exadata systems as you can imagine.

 /* -------------------------
 *
 * private perform_count_star
 *
 * called by: sqlt$i.common_calls and sqlt$i.remote_xtract
 *
 * ------------------------- */
 PROCEDURE perform_count_star (p_statement_id IN NUMBER)
 IS
 l_sql VARCHAR2(32767);
 l_number NUMBER;
 l_count NUMBER;
 BEGIN
 write_log('=> perform_count_star');

IF sqlt$a.get_param_n('count_star_threshold') = 0 THEN
 write_log('skip "count_star" as per corresponding parameter');
 ELSE
 FOR i IN (SELECT owner, table_name, num_rows, source
 FROM &&tool_administer_schema..sqlt$_dba_all_tables_v
 WHERE statement_id = p_statement_id
 ORDER BY
 owner, table_name)
 LOOP
 IF i.num_rows IS NULL THEN
 l_sql := 'SELECT /*+ FULL(t) PARALLEL */ COUNT(*)
FROM "'||i.owner||'"."'||i.table_name||'" t WHERE ROWNUM <= :number';
 l_number := sqlt$a.get_param_n('count_star_threshold');
 ELSIF i.num_rows < sqlt$a.get_param_n('count_star_threshold') THEN
 l_sql := 'SELECT /*+ FULL(t) PARALLEL */ COUNT(*)
FROM "'||i.owner||'"."'||i.table_name||'" t WHERE ROWNUM <= :number';
 l_number := sqlt$a.get_param_n('count_star_threshold') * 10;
 ELSIF i.num_rows < (sqlt$a.get_param_n('count_star_threshold') * 1e1) THEN
 l_sql := 'SELECT /*+ FULL(t) PARALLEL */ COUNT(*) * 1e1
FROM "'||i.owner||'"."'||i.table_name||'" SAMPLE (:number) t';
 l_number := 1e1;
 ELSIF i.num_rows < (sqlt$a.get_param_n('count_star_threshold') * 1e2) THEN
 l_sql := 'SELECT /*+ FULL(t) PARALLEL */ COUNT(*) * 1e2
FROM "'||i.owner||'"."'||i.table_name||'" SAMPLE (:number) t';
 l_number := 1e0;
 ELSIF i.num_rows < (sqlt$a.get_param_n('count_star_threshold') * 1e3) THEN
 l_sql := 'SELECT /*+ FULL(t) PARALLEL */ COUNT(*) * 1e3
FROM "'||i.owner||'"."'||i.table_name||'" SAMPLE (:number) t';
 l_number := 1/1e1;
 ELSIF i.num_rows < (sqlt$a.get_param_n('count_star_threshold') * 1e4) THEN
 l_sql := 'SELECT /*+ FULL(t) PARALLEL */ COUNT(*) * 1e4
FROM "'||i.owner||'"."'||i.table_name||'" SAMPLE (:number) t';
 l_number := 1/1e2;
 ELSIF i.num_rows < (sqlt$a.get_param_n('count_star_threshold') * 1e5) THEN
 l_sql := 'SELECT /*+ FULL(t) PARALLEL */ COUNT(*) * 1e5
FROM "'||i.owner||'"."'||i.table_name||'" SAMPLE (:number) t';
 l_number := 1/1e3;
 ELSIF i.num_rows < (sqlt$a.get_param_n('count_star_threshold') * 1e6) THEN
 l_sql := 'SELECT /*+ FULL(t) PARALLEL */ COUNT(*) * 1e6
FROM "'||i.owner||'"."'||i.table_name||'" SAMPLE (:number) t';
 l_number := 1/1e4;
 ELSIF i.num_rows < (sqlt$a.get_param_n('count_star_threshold') * 1e7) THEN
 l_sql := 'SELECT /*+ FULL(t) PARALLEL */ COUNT(*) * 1e7
FROM "'||i.owner||'"."'||i.table_name||'" SAMPLE (:number) t';
 l_number := 1/1e5;
 ELSE
 l_sql := 'SELECT /*+ FULL(t) PARALLEL */ COUNT(*) * 1e8
FROM "'||i.owner||'"."'||i.table_name||'" SAMPLE (:number) t';
 l_number := 1/1e6;
 END IF;

l_sql := REPLACE(l_sql, ':number', l_number);
 write_log('num_rows='||i.num_rows||' sql='||l_sql);
 l_count := NULL;

BEGIN
 EXECUTE IMMEDIATE l_sql INTO l_count;
 write_log(l_count||' rows counted');
 EXCEPTION
 WHEN OTHERS THEN
 write_log('** '||SQLERRM);
 write_log(l_sql||' failed with error above. Process continues.');
 END;

IF l_count IS NOT NULL THEN
 IF i.source = 'DBA_TABLES' THEN
 UPDATE &&tool_repository_schema..sqlt$_dba_tables
 SET count_star = l_count
 WHERE statement_id = p_statement_id
 AND owner = i.owner
 AND table_name = i.table_name;
 ELSIF i.source = 'DBA_OBJECT_TABLES' THEN
 UPDATE &&tool_repository_schema..sqlt$_dba_object_tables
 SET count_star = l_count
 WHERE statement_id = p_statement_id
 AND owner = i.owner
 AND table_name = i.table_name;
 END IF;
 END IF;
 END LOOP;

COMMIT;
 END IF;

write_log('<= perform_count_star');
 END perform_count_star;

Conclusion

Counting rows is like counting beans, you can count one at a time, or you can take some shortcuts. If you are willing to sacrifice some precision for the sake of gaining performance, consider then using the SAMPLE clause of the SELECT statement.

A healthy way to do an Oracle database health-check

with 12 comments

Q: How do I do an Oracle database health-check?

A: It all depends. (hint: we can use this answer for most Oracle related questions in general and Performance related in particular, but don’t try it at home).

This seems like a quite broad question and yes it is. And of course there are many ways to proceed with a database health-check. So at this post I ‘d rather talk about: what I think is a healthy way to approach an Oracle database health-check.

  1. Start with the basics: Listen to the users of this database. If nobody complains then most probably you would have to define the scope by yourself. In any case, go on.
  2. Gather environment information. This includes the understanding of the architecture used, the databases on such architecture and the applications on those databases. Also learn who is who: Users, DBAs and Developers.
  3. Gather metrics. I am referring to OS metrics (CPU, IO and Memory), and also database metrics (AWR) together with alert logs. When gathering these metrics focus on time periods where the pain has been reported, and slice the time as small as possible (i.e. AWR reports for each time slice captured, avoiding the 6-24 hours time frame common mistake).
  4. Let the combination of complains (concerns) and findings on metrics to guide you to the next step. This is where most get lost. So don’t panic and dive in into what you see as contention on your metrics. Keep in mind that the most important metric of all is “user response time”, so anything affecting it must be in your priority list.
  5. There are many more things to check, but they are more in the configuration and sound practices arena. For example: redundancy on control files, archive mode, backups, non-default parameters, PX setup, memory setup, etc. For these, creating a check list would help.
  6. At some point you will have many leads and you will start to lose focus. Do some yoga or go for a walk, then make an A, B, C list with what is really important, what is kind-of and what is mere style.
  7. You are not an expert on every aspect of the database (nobody is). So, do not pretend you can do everything yourself. Rely on your peers and/or contacts. Reach out for help in those areas where you feel insecure (feeling insecure is a good thing, much better than feeling secure without any solid foundation).
  8. Once you think you have it nailed, go to your peers, colleagues, boss(es), friends, partner, or strangers if necessarily, and solicit a review of your findings and recommendations. Accept feedback. This is key. Maybe what you thought was sound it makes absolutely no sense to someone with more experience or simply with a different view.
  9. Reconsider everything. Avoid the pitfall of assuming that what you have learn in your two-digits years of experience can be applied to every case. For example, if you have done mostly SQL Tuning, don’t expect every issue to be SQL Tuning. Health-checks are like fortune cookies, you never know what you will get.
  10. Last but not least: Learn from your new experience, practice listening to others, use your common sense, exercise your knowledge, and work as a team member. Add the word “collaboration” to your daily work and maybe one day you will learn you are not alone.

Cheers — Carlos

Written by Carlos Sierra

November 1, 2013 at 7:27 am

SQL Tuning with SQLTXPLAIN 2-days Workshop

with 6 comments

SQLTXPLAIN is a SQL Tuning tool widely used by the Oracle community. Available through My Oracle Support (MOS) under document 215187.1, this free tool is available for download and use to anyone with MOS access. It has helped thousands of times to expedite the resolution of SQL Tuning issues, and many Oracle DBAs and Developers benefit of its use on a daily basis.

Stelios Charalambides has done an excellent job writing a book on this topic. In his book Stelios covers many aspects about SQLTXPLAIN and some related topics. I highly recommend to get a copy of this book if you want to learn more about SQLTXPLAIN. It is available at Amazon and many other retailers.

The new 2-days SQLTXPLAIN Workshop offered by Enkitec (an Oracle Platinum business partner and my employer) is a completely new course that interleaves “how to use effectively SQLTXPLAIN” with important and related SQL Tuning Topics such as Plan Flexibility and Plan Stability. This hands-on workshop offers participants the unique opportunity to fully understand the contents of SQLTXPLAIN and its vast output through an interactive session. About half the time is dedicated to short guided labs, while the other half uses presentations and demos. This workshop is packed with lots of content. It was a real challenge packaging so much info in only two days, but I am very pleased with the result. It became a 2-days intensive knowledge transfer hands-on workshop on SQLTXPLAIN and SQL Tuning!

The first session of this workshop is scheduled for November 7-8 in Dallas, Texas. I expect this pilot session to fill out fast. Other sessions and onsite ones will be offered during 2014. I hope to meet many of you face to face on November 7!

Finding the Predicates that are different on the same Execution Plan

with 2 comments

Ok, this may not be common but when it happens it becomes very hard to spot. Couple of cases:

  1. You have a SQL with multiple child cursors. They all could have the same Plan Hash Value (PHV) or maybe more than one. Performance wise even the same PHV can exhibit differences for multiple reason, and one of them is the possibility of having the same PHV but not exactly the same filter or access Predicates. If you have two or more predicates on the same Plan Operation, the difference could be the order of these Predicates, and if their execution performance is quite different, one version could perform better than the other.
  2. You have two or more versions of a core query, where some of these versions have slightly different Predicates (typical case where the SQL is dynamically assembled by an application). This family of similar queries would have different SQL_IDs but they may or may not produce the same PHV. When they do, it is hard to find in which Plan Operations the different Predicates are used.

I have seen many cases where we have to scratch our heads thinking: “Why do I have such a different performance when it is clear we have the same Execution Plan?”. If you are having one of those issues, you may want to use one of the two scripts below. The first inputs a SQL_ID and it outputs a list of those Predicates for each PHV where there is more than one version for any Plan Operation ID. The second inputs a PHV and performs the same analysis regardless if the PHV is associated to one or many SQL_IDs. Both are RAC aware. Both are stand-alone and install nothing.

Script to find differences in access/filter Predicates for a given SQL_ID

SPO difpred1.txt;
SET HEA ON LIN 80 NEWP 1 PAGES 100 LIN 300 TRIMS ON TI OFF TIMI OFF;

-- shows which plan lines have different predicates for given sql_id
WITH d AS (
SELECT sql_id,
 plan_hash_value,
 id,
 COUNT(DISTINCT access_predicates) distinct_access_predicates,
 COUNT(DISTINCT filter_predicates) distinct_filter_predicates
 FROM gv$sql_plan_statistics_all
 WHERE sql_id = '&&sql_id.'
 GROUP BY
 sql_id,
 plan_hash_value,
 id
HAVING MIN(NVL(access_predicates, 'X')) != MAX(NVL(access_predicates, 'X'))
 OR MIN(NVL(filter_predicates, 'X')) != MAX(NVL(filter_predicates, 'X'))
)
SELECT v.plan_hash_value,
 v.id,
 'access' type,
 v.inst_id,
 v.child_number,
 v.access_predicates predicates
 FROM d,
 gv$sql_plan_statistics_all v
 WHERE v.sql_id = d.sql_id
 AND v.plan_hash_value = d.plan_hash_value
 AND v.id = d.id
 AND d.distinct_access_predicates > 1
 UNION ALL
SELECT v.plan_hash_value,
 v.id,
 'filter' type,
 v.inst_id,
 v.child_number,
 v.filter_predicates predicates
 FROM d,
 gv$sql_plan_statistics_all v
 WHERE v.sql_id = d.sql_id
 AND v.plan_hash_value = d.plan_hash_value
 AND v.id = d.id
 AND d.distinct_filter_predicates > 1
 ORDER BY
 1, 2, 3, 6, 4, 5;

SET HEA ON LIN 80 NEWP 1 PAGES 14 LIN 80 TRIMS OFF TI OFF TIMI OFF;
SPO OFF;

Output Sample:

old 8: WHERE sql_id = '&&sql_id.'
new 8: WHERE sql_id = 'cy9pxhp4y6u78'

PLAN_HASH_VALUE ID TYPE INST_ID CHILD_NUMBER
--------------- ---------- ------ ---------- ------------
PREDICATES
--------------------------------------------------------------------------------
 3724264953 2 filter 1 0
("C2"='WHATEVER' AND TRANSLATE(TO_CHAR(SIN("N4")),'1','a')='a')

3724264953 2 filter 1 1
(TRANSLATE(TO_CHAR(SIN("N4")),'1','a')='a' AND "C2"='WHATEVER')

Script to find differences in access/filter Predicates for a given PHV

SPO difpred2.txt;
SET HEA ON LIN 80 NEWP 1 PAGES 100 LIN 300 TRIMS ON TI OFF TIMI OFF;

-- shows which plan lines have different predicates for given phv
WITH d AS (
SELECT plan_hash_value,
 id,
 COUNT(DISTINCT access_predicates) distinct_access_predicates,
 COUNT(DISTINCT filter_predicates) distinct_filter_predicates
 FROM gv$sql_plan_statistics_all
 WHERE plan_hash_value = &&plan_hash_value.
 GROUP BY
 plan_hash_value,
 id
HAVING MIN(NVL(access_predicates, 'X')) != MAX(NVL(access_predicates, 'X'))
 OR MIN(NVL(filter_predicates, 'X')) != MAX(NVL(filter_predicates, 'X'))
)
SELECT v.id,
 'access' type,
 v.sql_id,
 v.inst_id,
 v.child_number,
 v.access_predicates predicates
 FROM d,
 gv$sql_plan_statistics_all v
 WHERE v.plan_hash_value = d.plan_hash_value
 AND v.id = d.id
 AND d.distinct_access_predicates > 1
 UNION ALL
SELECT v.id,
 'filter' type,
 v.sql_id,
 v.inst_id,
 v.child_number,
 v.filter_predicates predicates
 FROM d,
 gv$sql_plan_statistics_all v
 WHERE v.plan_hash_value = d.plan_hash_value
 AND v.id = d.id
 AND d.distinct_filter_predicates > 1
 ORDER BY
 1, 2, 6, 3, 4, 5;

SET HEA ON LIN 80 NEWP 1 PAGES 14 LIN 80 TRIMS OFF TI OFF TIMI OFF;
SPO OFF;

Output Sample:

Enter value for plan_hash_value: 2339135578
old 7: WHERE plan_hash_value = &&plan_hash_value.
new 7: WHERE plan_hash_value = 2339135578

ID TYPE SQL_ID INST_ID CHILD_NUMBER
---------- ------ ------------- ---------- ------------
PREDICATES
--------------------------------------------------------------------------------
 5 filter 72jhgzs8nb1p4 1 0
MAX("E2"."HIREDATE")=:B1

5 filter 8a4x4867rajuv 1 0
(MAX("E2"."HIREDATE")=:B1 OR :B2 IS NULL)

5 filter 8a4x4867rajuv 1 1
(MAX("E2"."HIREDATE")=:B1 OR :B2 IS NULL)

Written by Carlos Sierra

June 18, 2013 at 6:10 am

Oracle Queries are taking different Plans

with 4 comments

I recently saw a post with this title “Oracle Queries are taking different Plans” and since there was a lot of guessing I suggested to use SQLTXPLAIN and find there the root cause. For some reason or another the thread is still open after 20 days, and the guessing continues. Since the problem of “finding the root cause of a change on an execution plan” is old and still common, I decided to talk about it here.

In Oracle, execution plans do change for one SQL over time, that is a fundamental function of the Cost-based Optimizer (CBO). Not only is normal, but expected and desired in most circumstances. What is not desired, is when a SQL that had a good performing plan suddenly changes its plan to one that we consider sub-optimal in terms of performance. This should be the concern and focus. So WHY the CBO suddenly generates such a plan? The possible reasons are many but I enlist here the most common ones I can think of:

  • Schema object statistics changes
  • Small sample sizes when gathering CBO schema statistics (mostly on 10g)
  • Lack of statistics in some columns referenced by predicates
  • Inconsistent values of bind variables referenced by SQL
  • System wide changes like CBO parameters or system statistics
  • Index modifications including new and dropped indexes
  • Invalid or invisible indexes
  • Large SQL text with many predicates where several plans with similar cost are possible
  • Inconsistent ramp-up process of Adaptive Cursor Sharing (ACS)
  • SQL Profiles generated by the SQL Tuning Advisor (STA)
  • An index changing its blevel (for example from blevel 2 to 3)
  • Volatile values in some columns (for example value “Pending” exists in table column for a short time)
  • Asynchronous gathering of stats on volatile data instead of gathering right between load and consumption

The list above is not comprehensive. It only includes what came to my mind while writing it, so I may have forgotten several cases. And of course there are always some corner cases like special conditions activating uncommon heuristics, or simply hitting a CBO bug.

Guessing WHY a plan has changed is very time consuming and requires multiple queries to the metadata on the database. Since most of the metadata we are interested on is dynamic by nature, we base our guess work on an ever changing environment. Of course the root cause become elusive and we may give up for some time, or jump to false conclusions.

This is where SQLTXPLAIN (MOS 215187.1) or its younger and lighter sister SQLHC (MOS 1366133.1) come to the rescue. These free tools, specially SQLTXPLAIN (SQLT), capture the static and dynamic environment around the one SQL we want to analyze.

By having a consistent view of all elements affecting the execution plan, it becomes much easier to find the root cause of a plan change. And since SQLT also includes current and past details about parameters, statistics and bind variable values, we can even correlate those values to prior execution plans. So I encourage you to try SQLTXPLAIN on a SQL next time you are confronted by the old question: “WHY Oracle Queries are taking different Plans?”.

I am aware SQLTXPLAIN is intimidating. Its output is exuberant, but the point is that it includes what we need to find the root cause of most if not all SQL Tuning cases where a plan has changed.

I have presented this SQLT tool in several conferences during the past year or so. I think it is appropriate now to put together a 3 days workshop to master this tool… Just an idea at this time.

Understanding SQLTXPLAIN Health-Checks

with 2 comments

SQLTXPLAIN (SQLT) collects diagnostics details about a SQL statement that either performs poorly or it produces unexpected results (also known as wrong results). In addition to that, it also performs over a hundred health-checks around the SQL statement being analyzed. The results of these so called health-checks are displayed on the main HTML report under the “Observations” section. Within each observation there is description of the meaning and in some cases a pointer to known bugs or notes. Since there are so many health-checks, it happens that every once in a while we want to dig further into WHY we get this “observation”. If that is your case and you understand SQL and PL/SQL, I invite you to “hack” into the SQLT source code and see how this health-check is triggered.

Finding WHY SQLT triggered a particular Health-Check

Assume you get this “observation” in your main HTML report “Table contains 2 column(s) referenced in predicates where the number of distinct values does not match the number of buckets.“.

Open sqlt/install/sqcpkgh.pkb (this h is for health-checks) and search for a portion of this text “number of distinct values does not match the number of buckets”. You will find a piece of code that looks like this:

 -- 10174050 frequency histograms with less buckets than ndv
 BEGIN
 SELECT COUNT(*)
 INTO l_count
 FROM sqlt$_dba_all_table_cols_v
 WHERE statement_id = p_tab_rec.statement_id
 AND owner = p_tab_rec.owner
 AND table_name = p_tab_rec.table_name
 AND histogram = 'FREQUENCY'
 AND num_distinct <> num_buckets
 AND in_predicates = 'TRUE';

IF l_count > 0 THEN
 s_obs_rec.type_id := E_TABLE;
 s_obs_rec.object_type := 'TABLE';
 s_obs_rec.object_name := p_tab_rec.owner||'.'||p_tab_rec.table_name;
 s_obs_rec.observation := 'Table contains '||l_count||' column(s) referenced in predicates where the number of distinct values does not match the number of buckets.';
 s_obs_rec.more := 'Review <a href="#tab_cols_cbo_'||p_tab_rec.object_id||'">column statistics</a> for this table and look for "Num Distinct" and "Num Buckets". If there are values missing from the frequency histogram you may have Bug 10174050. If you are referencing in your predicates one of the missing values the CBO can over estimate table cardinality, and this may produce a sub-optimal plan. As a workaround: alter system/session "_fix_control"=''5483301:off'';';
 ins_obs;
 END IF;
 END;

In this case the health-check derives from view sqlt$_dba_all_table_cols_v. You can find all view definitions inside file sqlt/install/sqcvw.sql. This file shows that view sqlt$_dba_all_table_cols_v selects from tables sqlt$_dba_tab_cols and sqlt$_dba_nested_table_cols.

There is a predicate on the health-check that reads “num_distinct <> num_buckets“. So this condition is what triggered this particular health-check.

In some cases, the column driving the health-check is an addition to the base DBA or GV$ views, so in such cases you may have to search for that column in one of these two packages: sqlt/install/sqcpkgd.pkb or sqlt/install/sqcpkgt.pkb. Where the “d” stands for Diagnostics data collection and the “t” for Transformation of diagnostics data.

Conclusion

When you get a SQLT health-check that is of your concern and you need to find out WHY it was raised, you may need to look into SQLT source code. Always start on the “h” module and continue either in view definitions or in the two packages that seed special columns. Those two packages “d” and “t” are the ones responsible for data collection and transformations.

SQLT source code is transparent. It is written in an easy to follow style. Feel free to reverse engineer any part of it. No mysteries, no hidden pieces, no black magic…

Written by Carlos Sierra

April 19, 2013 at 10:43 am