With permission of the Northern California Oracle Users Group (NoCOUG) I am reproducing a warm interview on SQLTXPLAIN and SQLd360. During this interview Mauro Pagano and myself talk about the history behind these two free tools and how the former has evolved into the latter. You can find the full transcript of the interview here: YesSQL(T). If you want to read the entire free online NoCOUG Journal, you will discover other cool articles.
Anyways, I am glad Iggy Fernandez invited us to participate first on this interview, and second to collaborate on the meeting planned for January. On that meeting Mauro and I will conduct a one full day workshop on “Practical SQL Tuning” (January 28) in Northern California. We hope to see many of you guys there, and please bring questions and case studies.
Sometimes you do what you have to do. So here I confess doing something I usually avoid: forcing an Execution Plan (which is not the same as using a more conventional method for Plan stability).
This is a case on 22.214.171.124.0 base release where the application vendor sets the optimizer to 9i, and tweaks other CBO parameters in questionable ways, then some queries produce suboptimal plans (as expected); and you are called to help without changing the obvious.
There is a family of queries from an ad-hoc query generator that permits users to issue queries without a set of selective predicates. These queries join several large tables and their performance is poor (as expected as well!). On top of the previous, all these queries include the /*+ FIRST_ROWS */ CBO Hint and the questionable DISTINCT keyword. Note: it is quite common for developers to throw a DISTINCT keyword “to avoid duplicates” where the mere existence of duplicates would be an indication of an application bug; so “why fix it if I can hide it, right?”.
There is one caveat although: these queries include a generic predicate “rownum <= :b1”, and value passed defaults to 5000, so users rationale is “if I only want the first X rows my query should return fast”. This highlights still another questionable practice since it is hard to imagine a user scrolling 5000 rows and making any sense of such large set, especially when the full “filtered” set would be several million rows long. So the original problem is questionable in several ways. Nevertheless, sometimes we are called to help besides providing advice. And no, we are not allowed to slap hands 😉
The good news is that we can use this extra predicate on rownum and make these queries to return the first X rows really fast; and I mean less than 5 seconds instead of over one hour or more! And if users want not 5000 but 500 or even 50 rows, then we can be in the sub-second range!
You may be thinking FIRST_ROWS optimization, and that was my first try. Unfortunately, on 126.96.36.199.0, even reversing all the suboptimal CBO parameters at the session level, I would consistently get an Execution Plan with a few Hash Joins and a large Cost; and if I were to force a Nested Loop Plan, the cost would be several orders of magnitude larger so the CBO would not pick it! Nevertheless, such a “Nest Loop only” Execution Plan would fulfill the user’s expectations, regardless the validity of the initial request. And yes, CBO statistics are OK, not perfect but simply OK. One more piece of info: this is not Exadata! (if it were Exadata most probably these same Execution Plans with full table scans and Hash Joins would simply fly!).
So, my issue became: How do I force an Execution Plan that only contains Nested Loops? If I could do that, then the COUNT STOP operation could help me to halt my SQL execution once I fetched the first X rows (Hash Join does not allow me do that). Remember: these tables have literally millions of rows. I could pepper these queries with a ton of CBO Hints and I would get my desired “Nested Loop only” Execution Plan… But that would be a lot of work and tricky at best.
SQL Patch to the rescue
I could had used a SQL Profile, but I think this dirty trick of suppressing Hash Joins and Sort Merge Joins, would be better served with a SQL Patch. I also thought Siebel: They do tweak CBO parameters as well, and they suppress Hash Joins, but they change System and Session level parameters… Since I wanted my change to be very localized, SQL Patch could provide me just what I needed.
Under the Downloads section on the margin of this page, there is a “cscripts” link that includes the sqlpch.sql script. I used this script and passed as the second parameter the following string (1st parameter is SQL_ID). With a SQL Patch generated this way, I could systematically produce a “Nested-Loops only” Execution Plan for these few queries. I did not have to change the original SQL, nor change the CBO environment at the System or Session level, neither restrict the query generator, and I did not had to “educate” the users to avoid such unbounded queries.
OPT_PARAM("_optimizer_sortmerge_join_enabled" "FALSE") OPT_PARAM("_hash_join_enabled" "FALSE")
I have to concede doing something questionable, in this case using a SQL Patch to force a desired Execution Plan instead of fixing the obvious, simply because that was the shortest path to alleviate the user’s pain.
I consider this technique above a temporary work-around and not a solution to the actual issue. In this case the right way to handle this issue would be:
- Have the application vendor certify their application to the latest release of the database and reset all CBO related parameters, plus
- Have the application vendor remove CBO Hints and DISTINCT keyword from queries, plus
- Configure the ad-hoc query generator to restrict users from executing queries without selective predicates, then
- Tune those outlier queries that may still need some work to perform as per business requirements, and possibly
- Educate the users to provide as many selective predicates as possible
Anyways, the potential of using a SQL Patch to tweak an Execution Plan in mysterious ways is quite powerful, and something we may want to keep in the back of our minds for a rainy day…
In most cases edb360 takes less than 1hr to execute. But I often hear of cases where it takes a lot longer than that. In a corner case it was taking several days and it had to be killed.
So the question is WHY edb360 takes that long?
Well, edb360 executes thousands of SQL statements sequentially (intentionally). Many of these queries read data from AWR and in particular from ASH. So, lets say your ASH historical table has 2B rows, and on top of that you have not gathered statistics on AWR tables in years, thus CBO under-estimates cardinality and tends to use index access and nested loops. In such extreme cases you may end up with suboptimal execution plans that expect to return a few rows, but actually read a couple of billion rows using index access operations and nested loops. A query like this may take hours to complete!
As of version v1515, edb360 has a shortcut algorithm that ends an execution after 8 hours. So you may get an incomplete output, but it ends normally and the partial output can actually be used. This is not a solution but a workaround for those long executions.
How to troubleshoot edb360 taking long?
1. Review files 00002_edb360_dbname_log.txt, 00003_edb360_dbname_log2.txt, 00004_edb360_dbname_log3.txt and 00005_edb360_dbname_tkprof_sort.txt. First log shows the state of the statistics for AWR Tables. If stats are old then gather them fresh with script edb360/sql/gather_stats_wr_sys.sql
2. If number of rows on WRH$_ACTIVE_SESSION_HISTORY as per 00002_edb360_dbname_log.txt is several millions, then you may not be purging data periodically. There are some known bugs and some blog posts on this regard. Review MOS 387914.1 and proceed accordingly. Execute query below to validate ASH age:
SELECT TRUNC(sample_time, 'MM'), COUNT(*) FROM dba_hist_active_sess_history GROUP BY TRUNC(sample_time, 'MM') ORDER BY TRUNC(sample_time, 'MM') /
3. If edb360 version (first line on its readme) is older than 1 month, download and use latest version: https://github.com/carlos-sierra/edb360/archive/master.zip (link is also provided on the right-hand side of this blog under downloads).
4. Consider suppressing text and or csv reports. Each for an estimated gain of about 20%. Keep in mind that when suppressing reports, you start loosing some functionality. To suppress lets say text and csv reports, place the following two commands at the end of script edb360/sql/edb360_00_config.sql
DEF edb360_conf_incl_text = ‘N’;
DEF edb360_conf_incl_csv = ‘N’;
5. If after going through steps 1-4 above, edb360 still takes longer than a few hours, feel free to email author firstname.lastname@example.org and provide 4 files from step 1.
Every release of edb360 includes some “new goodies”. Latest version v1510 reduces the number of execution parameters from 2 to just 1. It also incorporates a new configuration file which is mostly static, but it can be user-modified for special cases. And new sections below are all now part of edb360. Enjoy!
Quite often I learn of a system where “nobody changed anything” and suddenly the system is experiencing some strange behavior. Then after diligent investigation it turns out someone changed a little parameter at the System level, but somehow disregarded mentioning it since he/she thought it had no connection to the unexpected behavior. As we all know, System parameters are big knobs that we don’t change lightly, still we often see “unknown” changes like the one described.
Script below produces a list of changes to System parameter values, indicating when a parameter was changed and from which value into which value. It does not filter out cache re-sizing operations, or resource manager plan changes. Both would be easy to exclude, but I’d rather see those global changes listed as well.
Note: This script below should only be executed if your site has a license for the Oracle Diagnostics pack (or Tuning pack), since it reads from AWR.
WITH all_parameters AS ( SELECT snap_id, dbid, instance_number, parameter_name, value, isdefault, ismodified, lag(value) OVER (PARTITION BY dbid, instance_number, parameter_hash ORDER BY snap_id) prior_value FROM dba_hist_parameter ) SELECT TO_CHAR(s.begin_interval_time, 'YYYY-MM-DD HH24:MI') begin_time, TO_CHAR(s.end_interval_time, 'YYYY-MM-DD HH24:MI') end_time, p.snap_id, p.dbid, p.instance_number, p.parameter_name, p.value, p.isdefault, p.ismodified, p.prior_value FROM all_parameters p, dba_hist_snapshot s WHERE p.value != p.prior_value AND s.snap_id = p.snap_id AND s.dbid = p.dbid AND s.instance_number = p.instance_number ORDER BY s.begin_interval_time DESC, p.dbid, p.instance_number, p.parameter_name /