Archive for the ‘Statistics’ Category
Sometimes you do what you have to do. So here I confess doing something I usually avoid: forcing an Execution Plan (which is not the same as using a more conventional method for Plan stability).
This is a case on 22.214.171.124.0 base release where the application vendor sets the optimizer to 9i, and tweaks other CBO parameters in questionable ways, then some queries produce suboptimal plans (as expected); and you are called to help without changing the obvious.
There is a family of queries from an ad-hoc query generator that permits users to issue queries without a set of selective predicates. These queries join several large tables and their performance is poor (as expected as well!). On top of the previous, all these queries include the /*+ FIRST_ROWS */ CBO Hint and the questionable DISTINCT keyword. Note: it is quite common for developers to throw a DISTINCT keyword “to avoid duplicates” where the mere existence of duplicates would be an indication of an application bug; so “why fix it if I can hide it, right?”.
There is one caveat although: these queries include a generic predicate “rownum <= :b1”, and value passed defaults to 5000, so users rationale is “if I only want the first X rows my query should return fast”. This highlights still another questionable practice since it is hard to imagine a user scrolling 5000 rows and making any sense of such large set, especially when the full “filtered” set would be several million rows long. So the original problem is questionable in several ways. Nevertheless, sometimes we are called to help besides providing advice. And no, we are not allowed to slap hands 😉
The good news is that we can use this extra predicate on rownum and make these queries to return the first X rows really fast; and I mean less than 5 seconds instead of over one hour or more! And if users want not 5000 but 500 or even 50 rows, then we can be in the sub-second range!
You may be thinking FIRST_ROWS optimization, and that was my first try. Unfortunately, on 126.96.36.199.0, even reversing all the suboptimal CBO parameters at the session level, I would consistently get an Execution Plan with a few Hash Joins and a large Cost; and if I were to force a Nested Loop Plan, the cost would be several orders of magnitude larger so the CBO would not pick it! Nevertheless, such a “Nest Loop only” Execution Plan would fulfill the user’s expectations, regardless the validity of the initial request. And yes, CBO statistics are OK, not perfect but simply OK. One more piece of info: this is not Exadata! (if it were Exadata most probably these same Execution Plans with full table scans and Hash Joins would simply fly!).
So, my issue became: How do I force an Execution Plan that only contains Nested Loops? If I could do that, then the COUNT STOP operation could help me to halt my SQL execution once I fetched the first X rows (Hash Join does not allow me do that). Remember: these tables have literally millions of rows. I could pepper these queries with a ton of CBO Hints and I would get my desired “Nested Loop only” Execution Plan… But that would be a lot of work and tricky at best.
SQL Patch to the rescue
I could had used a SQL Profile, but I think this dirty trick of suppressing Hash Joins and Sort Merge Joins, would be better served with a SQL Patch. I also thought Siebel: They do tweak CBO parameters as well, and they suppress Hash Joins, but they change System and Session level parameters… Since I wanted my change to be very localized, SQL Patch could provide me just what I needed.
Under the Downloads section on the margin of this page, there is a “cscripts” link that includes the sqlpch.sql script. I used this script and passed as the second parameter the following string (1st parameter is SQL_ID). With a SQL Patch generated this way, I could systematically produce a “Nested-Loops only” Execution Plan for these few queries. I did not have to change the original SQL, nor change the CBO environment at the System or Session level, neither restrict the query generator, and I did not had to “educate” the users to avoid such unbounded queries.
OPT_PARAM("_optimizer_sortmerge_join_enabled" "FALSE") OPT_PARAM("_hash_join_enabled" "FALSE")
I have to concede doing something questionable, in this case using a SQL Patch to force a desired Execution Plan instead of fixing the obvious, simply because that was the shortest path to alleviate the user’s pain.
I consider this technique above a temporary work-around and not a solution to the actual issue. In this case the right way to handle this issue would be:
- Have the application vendor certify their application to the latest release of the database and reset all CBO related parameters, plus
- Have the application vendor remove CBO Hints and DISTINCT keyword from queries, plus
- Configure the ad-hoc query generator to restrict users from executing queries without selective predicates, then
- Tune those outlier queries that may still need some work to perform as per business requirements, and possibly
- Educate the users to provide as many selective predicates as possible
Anyways, the potential of using a SQL Patch to tweak an Execution Plan in mysterious ways is quite powerful, and something we may want to keep in the back of our minds for a rainy day…
In most cases edb360 takes less than 1hr to execute. But I often hear of cases where it takes a lot longer than that. In a corner case it was taking several days and it had to be killed.
So the question is WHY edb360 takes that long?
Well, edb360 executes thousands of SQL statements sequentially (intentionally). Many of these queries read data from AWR and in particular from ASH. So, lets say your ASH historical table has 2B rows, and on top of that you have not gathered statistics on AWR tables in years, thus CBO under-estimates cardinality and tends to use index access and nested loops. In such extreme cases you may end up with suboptimal execution plans that expect to return a few rows, but actually read a couple of billion rows using index access operations and nested loops. A query like this may take hours to complete!
As of version v1515, edb360 has a shortcut algorithm that ends an execution after 8 hours. So you may get an incomplete output, but it ends normally and the partial output can actually be used. This is not a solution but a workaround for those long executions.
How to troubleshoot edb360 taking long?
1. Review files 00002_edb360_dbname_log.txt, 00003_edb360_dbname_log2.txt, 00004_edb360_dbname_log3.txt and 00005_edb360_dbname_tkprof_sort.txt. First log shows the state of the statistics for AWR Tables. If stats are old then gather them fresh with script edb360/sql/gather_stats_wr_sys.sql
2. If number of rows on WRH$_ACTIVE_SESSION_HISTORY as per 00002_edb360_dbname_log.txt is several millions, then you may not be purging data periodically. There are some known bugs and some blog posts on this regard. Review MOS 387914.1 and proceed accordingly. Execute query below to validate ASH age:
SELECT TRUNC(sample_time, 'MM'), COUNT(*) FROM dba_hist_active_sess_history GROUP BY TRUNC(sample_time, 'MM') ORDER BY TRUNC(sample_time, 'MM') /
3. If edb360 version (first line on its readme) is older than 1 month, download and use latest version: https://github.com/carlos-sierra/edb360/archive/master.zip (link is also provided on the right-hand side of this blog under downloads).
4. Consider suppressing text and or csv reports. Each for an estimated gain of about 20%. Keep in mind that when suppressing reports, you start loosing some functionality. To suppress lets say text and csv reports, place the following two commands at the end of script edb360/sql/edb360_00_config.sql
DEF edb360_conf_incl_text = ‘N’;
DEF edb360_conf_incl_csv = ‘N’;
5. If after going through steps 1-4 above, edb360 still takes longer than a few hours, feel free to email author firstname.lastname@example.org and provide 4 files from step 1.
A good friend of mine recently asked me if edb360 included a section showing indexes that would benefit of a rebuild. I replied “not yet” and basically committed to add something. This topic of the needs versus the implications of rebuilding an index has been recently discussed in Richard Foote’s Blog. In my opinion, if you want to know more about indexes, Richard’s blog is one of the first stops. To my surprise and delight, I learned that we can actually use a little trick of the EXPLAIN PLAN FOR command and actually use the CBO to estimate what would be the size of an index if we were to create (or rebuild) it. In this latter blog posting Richard explains and demonstrates how it can be done.
My blog posting today is about encapsulating this cool method to quickly and cheaply estimate the size of an index if it were to be rebuilt, and put it on a free script for the Oracle community to use. So, feel free to use script below, but I need to remind you that before you jump to conclusions that this or that index should be rebuilt, read first about the actual need of doing so. You may want to include in your reading Richard Foote’s numerous postings on this topic, and also a couple of Oracle MOS notes: 989093.1 and 989186.1
---------------------------------------------------------------------------------------- -- -- File name: estimate_index_size.sql -- -- Purpose: Reports Indexes with an Actual size > Estimated size for over 1 MB -- -- Author: Carlos Sierra -- -- Version: 2014/07/18 -- -- Description: Script to very quickly and cheaply estimate the size of an index if it -- were to be rebuilt. It uses EXPLAIN PLAN FOR CREATE INDEX technique. -- It can be used on a single index, or all the indexes on a table, or -- a particular application schema, or all application schemas. It does not -- lock indexes and only updates the plan_table, which is usually a global -- temporary table. -- -- Usage: Connect to SQL*Plus as SYS or DBA account and execute without parameters. -- It will ask for optional schema owner, table name and index name. If all -- 3 are given null values then it acts on all application schemas. It -- generates a simple text report with the indexes having an estimated size -- of at least 1 MB over their actual size. -- -- Example: @estimate_index_size.sql -- -- Notes: Developed and tested on 188.8.131.52. -- -- Inspired on blog posts from Richard Foote and Connor MacDonald: -- http://richardfoote.wordpress.com/2014/04/24/estimate-index-size-with-explain-plan-i-cant-explain/#comment-116966 -- http://connormcdonald.wordpress.com/2012/05/30/index-size/ -- -- If considering index rebuilds based on the output of this script, read -- first Richard Foote's numerous blog postings about this topic. Bottom -- line: there are only a few cases where you actually need to manually -- rebuild an index. -- -- This method to estimated size of an index is far from perfect, please -- scrutinize this script before using it. You may also want to read -- Oracle MOS notes: 989093.1 and 989186.1 on this topic. -- --------------------------------------------------------------------------------------- -- SPO estimate_index_size.txt; UNDEF owner table_name index_name exclusion_list exclusion_list2; DEF exclusion_list = "('ANONYMOUS','APEX_030200','APEX_040000','APEX_SSO','APPQOSSYS','CTXSYS','DBSNMP','DIP','EXFSYS','FLOWS_FILES','MDSYS','OLAPSYS','ORACLE_OCM','ORDDATA','ORDPLUGINS','ORDSYS','OUTLN','OWBSYS')"; DEF exclusion_list2 = "('SI_INFORMTN_SCHEMA','SQLTXADMIN','SQLTXPLAIN','SYS','SYSMAN','SYSTEM','TRCANLZR','WMSYS','XDB','XS$NULL')"; VAR random1 VARCHAR2(30); VAR random2 VARCHAR2(30); EXEC :random1 := DBMS_RANDOM.string('A', 30); EXEC :random2 := DBMS_RANDOM.string('X', 30); DELETE plan_table WHERE statement_id IN (:random1, :random2); SET SERVEROUT ON; DECLARE sql_text CLOB; BEGIN FOR i IN (SELECT idx.owner, idx.index_name FROM dba_indexes idx, dba_tables tbl WHERE idx.owner = NVL(UPPER(TRIM('&&owner.')), idx.owner) -- optional schema owner name AND idx.table_name = NVL(UPPER(TRIM('&&table_name.')), idx.table_name) -- optional table name AND idx.index_name = NVL(UPPER(TRIM('&&index_name.')), idx.index_name) -- optional index name AND idx.owner NOT IN &&exclusion_list. -- exclude non-application schemas AND idx.owner NOT IN &&exclusion_list2. -- exclude more non-application schemas AND idx.index_type IN ('NORMAL', 'FUNCTION-BASED NORMAL', 'BITMAP', 'NORMAL/REV') -- exclude domain and lob AND idx.status != 'UNUSABLE' -- only valid indexes AND idx.temporary = 'N' AND tbl.owner = idx.table_owner AND tbl.table_name = idx.table_name AND tbl.last_analyzed IS NOT NULL -- only tables with statistics AND tbl.num_rows > 0 -- only tables with rows as per statistics AND tbl.blocks > 128 -- skip small tables AND tbl.temporary = 'N') LOOP BEGIN sql_text := 'EXPLAIN PLAN SET STATEMENT_ID = '''||:random1||''' FOR '||REPLACE(DBMS_METADATA.get_ddl('INDEX', i.index_name, i.owner), CHR(10), ' '); -- cbo estimates index size based on explain plan for create index ddl EXECUTE IMMEDIATE sql_text; -- index owner and name do not fit on statement_id, thus using object_owner and object_name, using statement_id as processing state DELETE plan_table WHERE statement_id = :random1 AND (other_xml IS NULL OR NVL(DBMS_LOB.instr(other_xml, 'index_size'), 0) = 0); UPDATE plan_table SET object_owner = i.owner, object_name = i.index_name, statement_id = :random2 WHERE statement_id = :random1; EXCEPTION WHEN OTHERS THEN DBMS_OUTPUT.PUT_LINE(i.owner||'.'||i.index_name||': '||SQLERRM); DBMS_OUTPUT.PUT_LINE(DBMS_LOB.substr(sql_text)); END; END LOOP; END; / SET SERVEROUT OFF; WITH indexes AS ( SELECT pt.object_owner, pt.object_name, TO_NUMBER(EXTRACTVALUE(VALUE(d), '/info')) estimated_bytes FROM plan_table pt, TABLE(XMLSEQUENCE(EXTRACT(XMLTYPE(pt.other_xml), '/*/info'))) d WHERE pt.statement_id = :random2 AND pt.other_xml IS NOT NULL -- redundant AND DBMS_LOB.instr(pt.other_xml, 'index_size') > 0 -- redundant AND EXTRACTVALUE(VALUE(d), '/info/@type') = 'index_size' -- grab index_size type ), segments AS ( SELECT owner, segment_name, SUM(bytes) actual_bytes FROM dba_segments WHERE owner = NVL(UPPER(TRIM('&&owner.')), owner) -- optional schema owner name AND segment_name = NVL(UPPER(TRIM('&&index_name.')), segment_name) -- optional index name AND owner NOT IN &&exclusion_list. -- exclude non-application schemas AND owner NOT IN &&exclusion_list2. -- exclude more non-application schemas AND segment_type LIKE 'INDEX%' HAVING SUM(bytes) > POWER(2, 20) -- only indexes with actual size > 1 MB GROUP BY owner, segment_name ), list_bytes AS ( SELECT (s.actual_bytes - i.estimated_bytes) actual_minus_estimated, s.actual_bytes, i.estimated_bytes, i.object_owner, i.object_name FROM indexes i, segments s WHERE i.estimated_bytes > POWER(2, 20) -- only indexes with estimated size > 1 MB AND s.owner = i.object_owner AND s.segment_name = i.object_name ) SELECT ROUND(actual_minus_estimated / POWER(2, 20)) actual_minus_estimated, ROUND(actual_bytes / POWER(2, 20)) actual_mb, ROUND(estimated_bytes / POWER(2, 20)) estimated_mb, object_owner owner, object_name index_name FROM list_bytes WHERE actual_minus_estimated > POWER(2, 20) -- only differences > 1 MB ORDER BY 1 DESC, object_owner, object_name / DELETE plan_table WHERE statement_id IN (:random1, :random2); UNDEF owner table_name index_name exclusion_list exclusion_list2; SPO OFF;
I have this query that references a couple of Global Temporary Tables (GTT). These GTT have no CBO Statistics, thus Dynamic Sampling (DS) is used on them. The problem is default value of 2 at the instance level samples only 64 blocks and these GTT are large. Testing with DS on larger samples provides better quality on these dynamic statistics and a better performing Execution Plan for this particular SQL.
This SQL already uses an imported SQL Profile generated by coe_xfr_sql_profile.sql out of SQLT (MOS 215187.1) under sqlt/utl directory. This SQL Profile contains CBO Hints created out of the Outline Data contained on the other_xml column of the Plan. Still a higher level of DS is needed for these two GTT in order to produce a better performing Plan. Setting DS at the instance level or session level is not an option. Modifying the query is not an option. So the question is: “how to embed a dynamic sampling hint on a SQL that already has a SQL Profile based on an Outline“?
On the script generated by coe_xfr_sql_profile.sql add one extra Hint with the DS level that is needed. Look at sample below, where one extra Hint has added right below DB_VERSION. Since the two GTT have no CBO Statistics, they invoke DS, which would use now level 6 instead of 2.
h := SYS.SQLPROF_ATTR( q'[BEGIN_OUTLINE_DATA]', q'[IGNORE_OPTIM_EMBEDDED_HINTS]', q'[OPTIMIZER_FEATURES_ENABLE('184.108.40.206')]', q'[DB_VERSION('220.127.116.11')]', q'[OPT_PARAM('optimizer_dynamic_sampling' 6)]', q'[ALL_ROWS]',
Every so often I see on a distribution list a posting that starts like this: “I upgraded my application from database release X to release Y and now many queries are performing poorly, can you tell why?”
As everyone else on a distribution list, my first impulse is to make an educated guess permeated by a prior set of experiences. The intentions are always good, but the process is painful and time consuming. Many of us have seen this kind of question, and many of us have good hunches. Still I think our eagerness to help blinds us a bit. The right thing to do is to step back and analyze the facts, and I mean all the diagnostics supporting the observation.
What is needed to diagnose a SQL Tuning issue?
The list is large, but I will enumerate some of the most important pieces:
- SQL Text
- Version of the database (before and after upgrade)
- Database parameters (before and after)
- State of the CBO Statistics (before and after)
- Changes on Histograms
- Basics about the architecture (CPUs, memory, etc.)
- Values of binds if SQL has them
- Indexes compare, including state (visible?, usable?)
- Execution Plan (before and after)
- Plan stability? (Stored Outlines, Profiles, SQL Plan Management)
- Performance history as per evidence on AWR or StatsPack
- Trace from Event 10053 to understand the CBO
- Trace from Event 10046 level 8 or 12 to review Waits
- Active Session History (ASH) if 10046 is not available
I could keep adding bullets to the list, but I think you get the point: There are simply too many things to check! And each takes some time to collect. More important, the state of the system changes overtime, so you may need to re-collect the same diagnostics more than once.
SQLTXPLAIN to the rescue
SQLT or SQLTXPLAIN, has been available on MetaLink (now MOS) under note 215187.1 for over a decade. In short, SQLT collects all the diagnostics listed above and a lot more. That is WHY Oracle Support uses it every day. It simply saves a lot of time! So, I always encourage fellow Oracle users to make use of the FREE tool and expedite their own SQL Tuning analysis. When time permits, I do volunteer to help on an analysis. So, if you get to read this, and you want to help yourself while using SQLT but feel intimidated by this little monster, please give it a try and contact me for assistance. If I can help, I will, if I cannot, I will let you know.
It is fun to guess WHY a SQL is not performing as expected, and trying different guesses is educational but very time consuming. If you want to actually find root causes before trying to fix your SQL, you may want to collect relevant diagnostics. SQLT is there to help, and if installing this tool is not something you can do in a short term, consider then SQL Health-Check SQLHC.
This blog posting is about answering this first question below, which I often get asked:
Can I use SQLTXPLAIN (SQLT) instead of AWR?
The answer is: it depends. If you are doing SQL Tuning and you already know the SQL_ID, then you may want to use SQLT XTRACT (MOS 215187.1) directly on the SQL_ID of concern. But even in that case, keep in mind that SQLT accesses AWR data, so your environment must have a valid license to use the Oracle Diagnostics Pack. In fact, when you install SQLT, it asks if you have the Oracle Tuning Pack, the Oracle Diagnostics Pack or none. Depending how you respond, SQLT access or not the set of views and APIs included on those Oracle Packs. That means you can configure SQLT to access or not AWR data.
What is the difference between AWR and SQLT?
In short, the difference between these two is the scope. You use AWR to diagnose database performance, while you use SQLT to diagnose one SQL. Sometimes I explain this difference by saying: “AWR is to an entire database to what SQLT is to one SQL”. One is for system-wide performance, the other is very centric to one SQL at a time.
Why SQLT exists?
I envisioned SQLT on the late 90’s when I was a road-warrior fighting fires in the area of performance, and in particular SQL performance. I found back then that Oracle-provided tools like TKPROF were excellent, but I always needed something more, like knowing the state of the CBO Statistics, Tables and Indexes, etc.
These days, my good friend Mauro Pagano from Oracle, is keeping the ball rolling. He is keeping SQLT in constant motion, making it a better tool on every new version. So, I would say this: SQLT is filling up some gaps that me, and many others, consider important in order to be diligent on root cause analysis for cases were a SQL performs poorly.
What is DB360?
As SQLT brings to the table several pieces of information that we need for SQL Tuning, and which are not available using out-of-the-box tools like TKPROF or SQL Monitoring, the new DB360 tool is doing something similar for the entire database: It complements what AWR provides by producing a report with meaningful information about an entire database. This DB360 is a tool that installs nothing on the database, and produces an HTML report with sections such as Configuration, Sizing, CBO Statistics, Performance Trends, etc.
Is DB360 a licensed product?
No. This DB360 tool belongs to Enkitec. It is not yet available to the Oracle community, but it will be soon. Same as SQLT, if you have an Oracle Tuning or Diagnostics Pack, then when you execute DB360 you would get to see in your DB360 report some pieces of information generated out of views and APIs covered by those Oracle Packs, else you get only the pieces which require no Oracle Pack license. Besides the restriction to limit your use of DB360 as per your Oracle Pack license, DB360 itself will be available to the Oracle community for free, and with no strings attached, same as SQLT.
Why are SQLT and DB360 free to download and use?
These tools are simply a contribution to the Oracle community. “Sharing tools is like sharing knowledge: it makes our IT experience more pleasurable”. The only payback comes when you share back with the Oracle community some of your knowledge, or some of your tools and scripts. If you have been a speaker in an Oracle Users Groups, then you may relate to this gratifying experience to share with others what you know. At RMOUG these past 3 days, I have had the opportunity to experience once again this special sense of community, that is always eager to share and to learn from each other.
SQLT complements TKPROF and SQL Monitor. DB360 complements AWR. When it comes to diagnostics, either for one SQL or for an entire Database, having to our disposal additional diagnostics in the context of our analysis, improves our chances to do a diligent job, while reducing the time that it would take to assembly those pieces manually; all with the extra benefit of having these extra diagnostics pieces consistent to a point in time. So, I hope you continue enjoying SQLT and in the near future DB360!
Every so often I see systems where there is a good amount of SQL that uses Literals instead of Binds, and executes enough times to create a large number of Cursors. Is this a red flag? As many questions regarding performance, I would say the right answer is: it all depends.
Of course we want to use Binds instead of Literals in order to reduce the frequency of Hard Parses, and in turn reduce CPU consumption and space utilization in the Shared Pool. Does it mean we want to replace all Literals with Binds? Do we declare war on Literals? In my opinion, the answer is simply: NO.
If a SQL has a Predicate on a date column, or a key column, then I would expect the Number of Distinct Values (NDV) for such column to be high, and in some cases as high as the number of rows in the Table (unique values for example). In the other hand, if the Predicate is in one of those columns that denotes a code, like Process Type or Status, and the NDV is small, then I’d rather keep the Literal in place. Specially if the data in such column is skewed and I have (or plan to have) Histograms on it.
What do I propose?
- If the SQL is executed sporadically, then it does not matter (Literals or Binds).
- If the SQL executes frequently, and the Predicate in question is on a Column where the Number of Distinct Values (NDV) is high, then use a Bind instead of a Literal (for this Predicate).
- If the SQL executes frequently, and the Predicate in question is on a Column where the NDV is low, then use a Literal (for this Predicate). This assumes the NDV for these Literals is also small.
- Regardless if using a Literal or a Bind for a particular Predicate: If the data in a Column referenced by a Predicate is heavily skewed, gather Statistics with Histograms on this Column.
Follow-up question: When the NDV is high or low? The answer is also: it all depends. I personally prefer to see Literals if the NDV for this Column (and this Literal) is less than 10 (or so).
Why having Binds and Literals on same SQL is better than having all Binds?
If we have good set of CBO Statistics, and we have Histograms on skewed data, and we are using bind peeking, and we are on 11g, and Adaptive Cursor Sharing (ACS) is enabled, and we plan using SQL Plan Management (SPM), then we are for a treat:
With all the “ands” above, by using Binds on predicates with high NDV and Literals in those with low NDV, then we will end up having a small number of different SQL_IDs for what we consider “the same SQL”. Each incarnation of this SQL could potentially have its own set of optimal Execution Plans created by ACS and the CBO (by making use of Histograms on the data and Selectivity Profiles on ACS). Then, with the aid of SPM we could provide stability to those multiple optimal Execution Plans for each version of the SQL. That means that SQL Q1 with Literal L1 could have a different set of optimal plans than Q1 with Literals L2.
Replacing some Literals with Binds but not all Literals sounds like a lot of work, but actually the extra work may be worth the effort. In my opinion, the end result is better if we replace most, but not all (as per proposal above). WHY? Even when ACS does a good job at finding multiple optimal plans for a SQL by using the selectivity of the predicates; by allowing a small number of cursors for the same SQL given the use of Literals in columns with low NDV, we are basically reducing the times we would have to execute a SQL with a sub-optimal plan due to current ACS ramp-up process. This extra granularity provided by a small number of incarnations of the “otherwise same SQL” could be crucial for tuning complex SQL or corner cases.