Two common mistakes using METHOD_OPT in DBMS_STATS

When gathering Cost-based Optimizer (CBO) statistics for a Table, most probably you use DBMS_STATS directly (unless your application is EBS, PSFT or Siebel). There are a couple of common mistakes we see around the use of the METHOD_OPT parameter on GATHER APIs out of this DBMS_STATS package.

METHOD_OPT => ‘FOR ALL INDEXED COLUMNS SIZE…’

With this parameter value DBMS_STATS gathers statistics ONLY in columns referenced in at least one index. So, any column that is not referenced by an index does NOT get any new statistics. The misconception is that if we specify let’s say SIZE AUTO or SIZE 254, we would get Histograms for indexed columns and regular statistics (with no Histograms) for the rest of the columns. This is not the case. We simple do not get new statistics for those columns which are not referenced by an index.

WHY is this a problem? Well, if your query has predicates referencing a non-indexed column, the CBO still has to figure out the Selectivity for the predicate. With outdated statistics or no statistics at all, this Selectivity may not be accurate and your plan may suffer.

If you are thinking “if I have no statistics on a column then Dynamic Sampling will kick in” you may be overlooking the fact that in most cases Dynamic Sampling ONLY gets invoked if your Table has no statistics. In a case like this the Table has statistics (DBA_TABLES) but the column of interest may not (DBA_TABL_COLUMNS). In other words, Dynamic Sampling is not the answer here. What you really want is to stay away of “FOR ALL INDEXED COLUMNS“. You want to use instead “FOR ALL COLUMNS“.

Note: Siebel is an exception since Siebel requires all columns ever used on a predicate to have at least one index. Not that I agree with this policy, but if actually implemented then gathering statistics in all indexed columns is just fine.

METHOD_OPT => NULL

If you are using a tool or a script to dynamically generate the DBMS_STATS.GATHER_TABLE_STATS command to gather Table statistics, you may have a front-end or set of input parameters to specify HOW the user of your tool/script wants to gather statistics. A common mistake (I made it myself some time back), is to pass parameter “METHOD_OPT => NULL” (when the user does not specify anything for method). Be aware that passing a NULL in this parameter means: gather statistics for the Table (DBA_TABLES) but nothing for the Columns (DBA_TAB_COLUMNS). In other words, the statistics for Table Columns do not get refreshed, preserving their old values.

On 10g and 11g using “METHOD_OPT => NULL” is wrong since we end-up with outdated and out of sync statistics. Of course this has a negative effect in our execution plans. On 12c it seems the behavior is different. I did some testings and on 12c it seems that we get refreshed column statistics even if we pass a NULL. But if you are on 10g or 11g be aware of this common mistake. What you want to do instead is to NOT pass this parameter at. If the parameter is not passed at all, then it gets a default value that basically means: gather statistics in all columns, but in those referenced by at least one predicate, collect Histograms. This default value is usually fine, and in any case it is what you intended when you were passing NULL in the first place.

Conclusion

Gathering CBO statistics calling DBMS_STATS.GATHER_TABLE_STATS with METHOD_OPT => ‘FOR ALL INDEXED COLUMNS…” or passing a value of NULL are common mistakes. You don’t see the detrimental effects right away. But as the column statistics become gradually older and older, some of your execution plans start to change. The symptoms are: unexpected poorly performing plans pop-up once in a while. The remedy is simple: avoid these two common mistakes!

Keep always in mind: statistics that closely represent your data improve the chances for the CBO to generate an optimal plan. I always think of the CBO as an old temperamental lady: be nice to her and she may be nice to you in return 😉

Written by Carlos Sierra

July 27, 2013 at 6:38 am

Posted in Statistics

29 Responses

Subscribe to comments with RSS.

[…] #DATABASE #ORACLE #TUNING by Carlos Sierra : Two common mistakes using METHOD_OPT in DBMS_STATS […]

#DATABASE #ORACLE #TUNING by Carlos Sierra : Two common mistakes using METHOD_OPT in DBMS_STATS | Database Scene

July 27, 2013 at 10:48 am

Reply
I use METHOD_OPT => ‘FOR ALL INDEXED COLUMNS SIZE 254’. What is the problem
with the non-indexed columns not having histograms? What kind of a decision can
optimizer reach for those columns? The optimizer decision usually boils down to
whether to use an index or not. There are not many decisions which can be made
if there are no indexes referencing the column in question. About the only
decision that can be made, based on the data distribution in the non-indexed
column, is whether to use sort/merge or hash join. If that is ever the case,
there is an application design problem, rather than the optimizer problem.

Mladen Gogala

July 27, 2013 at 2:09 pm

Reply
- CBO uses column statistics to estimate Selectivity of predicates referencing such columns, index or not. CBO decides for join order and join method, not only access method. Selectivity influences join order and method. Not sure why you consider the join order an application design problem. Column statistics are important if there are predicates on them (filter or access).
  
  Carlos Sierra
  
  July 27, 2013 at 5:08 pm
  
  Reply
Very nice post Carlos. I always learn something reading your material. Can you help answer a question you made me think of, please. You stated the default method_opt is “gather statistics in all columns, but in those referenced by at least one predicate, collect Histograms”. So … while gathering stats, how does oracle know if a column is referenced by at least one predicate? I’m curious.

Michael Cunningham

July 29, 2013 at 12:11 pm

Reply
- Michael,
  When you call DBMS_STATS passing its default value to METHOD_OPT, the value is “FOR ALL COLUMNS SIZE AUTO”. DBMS_STATS looks into SYS.COL_USAGE$ to determine which columns have been referenced by at least one predicate. That helps to decide if the column gets a Histogram or not. In the latter case, it still gets column statistics.
  
  Carlos Sierra
  
  July 29, 2013 at 8:08 pm
  
  Reply
Thanks for such a good article….
Is there any view/table in database which can tell us, which METHOD_OPT was used while collecting the statistics on a table

AT

August 3, 2013 at 8:18 am

Reply
- AT,
  No, we do not store the method_opt that we used, but we can pretty much figure it out when we look at the statistics. You may want to use SQLT on a SQL of concern, so finding out the method_opt becomes easy.
  
  Carlos Sierra
  
  August 3, 2013 at 8:56 am
  
  Reply
  - Thanks Carlos for a quick reply
    
    AT
    
    August 5, 2013 at 9:18 am
  - no problem 😉
    
    Carlos Sierra
    
    August 6, 2013 at 5:40 am
Thanks for all the good info Carlos. We are big Siebel users. You mentioned that using FOR ALL INDEXED COLUMNS was okay for Siebel databases as Siebel requires that all columns used in a predicate must have at least one index. My question is how does Siebel require that or is it up to me to ensure that the necessary indexes have been built?

rbrieck

January 24, 2014 at 2:52 pm

Reply
- You can use coe_siebel_profile.sql for that effect. Please refer to Metalink (MOS) note 781927.1. The script is in that note.
  
  Carlos Sierra
  
  January 24, 2014 at 4:22 pm
  
  Reply
Hi Carlos , What is the option to avoid generation of histograms but gather statistics on all columns of the table.

Siva

July 10, 2015 at 2:11 am

Reply
- method_opt => ‘for all columns size 1’
  
  Carlos Sierra
  
  July 13, 2015 at 8:13 am
  
  Reply
Thank you for the explanation!

Guilherme Mesquita

August 18, 2015 at 2:06 pm

Reply
Would you recommend, in general, NOT gathering histograms at all except in the few cases where histograms may help with data skew issues?

O.Jock

August 16, 2016 at 2:00 pm

Reply
- Yes!
  
  Carlos Sierra
  
  August 19, 2016 at 2:16 pm
  
  Reply
Hello Carlos,
I’m in doubt how to collect statistics on a EBS database.
In the doc Tuning Performance on E-Business suite (Doc ID 744143.1)
1- Use only FND_STATS or the Gather Schema and Gather Table Statistics Concurrent Programs.
2 – Do NOT USE the ANALYZE command or dbms_stats directly. It is not supported, and results in sub-optimal plans.

When I realize that I have a table with statistical stale, I do:
exec fnd_stats.gather_table_stats (ownname => ” tabname => ‘table_name’, estimate_percent => 0 DEGREE => 10, backup_flag => ‘BACKUP’);

This is correct command? You collect this way for EBS?

Oracle version: 11.2.0.4

Jose Neto

September 23, 2016 at 4:07 pm

Reply
- Just pass schema and table_name. better yet if you can use the GUI interface to submit concurrent request
  
  Carlos Sierra
  
  September 23, 2016 at 4:58 pm
  
  Reply
  - OK Carlos, tks!
    
    Jose Neto
    
    September 23, 2016 at 5:32 pm
Hi Carlos,
I use the below Stats gather statement, please suggest the right approach :

BEGIN
SYS.DBMS_STATS.GATHER_TABLE_STATS (
OwnName => ‘DATA’
,TabName => ‘ACCT_HISTORY’
,PartName => ‘DEC2017’
,Granularity => ‘PARTITION’
,Estimate_Percent => SYS.DBMS_STATS.AUTO_SAMPLE_SIZE
,Degree => 4
,Cascade => FALSE
,No_Invalidate => FALSE);
END;
/

Raghabendra Sinha

May 11, 2017 at 10:26 pm

Reply
- I see you are gathering stats in one partition. What about the others? Why cascade false? Why invalidate? You need to provide some context. For most cases, you only want to pass ownname and tabname. Which database release is this?
  
  Carlos Sierra
  
  May 12, 2017 at 5:25 am
  
  Reply
Hi Carlos,
In my EBS database, i have many custom schema and using DBMS_STATS package for gathering schema stat on those custom schema after confirmation from Oracle. We are using “method_opt for all indexed column size skewonly”. I am not sure but i believe there is change of improvement in it if we use method_opt for all column size skewonly ?
I am also not sure about size whether it should be skewonly or auto? Hope you help me.

Deepak Chandrapal

August 11, 2017 at 9:21 am

Reply
- If you are on 11.2.0.4 or higher, you may want to gather table stats passing only the owner of the table and the table_name. DBMS_STATS is pretty good at deciding values for other parameters of this API.
  
  Carlos Sierra
  
  August 11, 2017 at 9:41 am
  
  Reply
  - Hi Carlos,
    
    Thanks for giving time.
    
    But my question is;
    We are using “method_opt for all indexed column size skewonly”.
    This is going to generate histogram on indexed column only but we are having many other column in SQL query “where” clause. So, generating histogram for all column will help here?
    
    Deepak Chandrapal
    
    August 14, 2017 at 1:57 am
  - Would help with what? If your query has predicates on columns other than indexed columns, then you want to have reasonable statistics on such columns. Not sure if that was your question. If I were you, then I would consider letting DBMS_STATS use default method_opt “FOR ALL COLUMNS SIZE AUTO”, assuming you are on 11.2.0.4 o4 higher.
    
    Carlos Sierra
    
    August 14, 2017 at 7:54 am
Hey Carlos – we just upgraded to Peoplesoft FSC 9.2/Tools 8.55 on Oracle 12.1.0.2 – right after the upgrade, we had run stats (default method_opt) – so no histograms yet on the nVision selector tables or PS_LEDGER. Once users starting running the nVision reports and we reran stats, and COL_USAGE$ was populated, our plans went down the tube and chock full of Cartesian joins. Makes for very deceiving testing as users were ecstatic until stats got updated and histograms reared their ugly head.

Michele Rotyliano

August 13, 2017 at 7:58 pm

Reply
- Histograms are not evil. They help CBO produce better plans most of the time. If you’d rather not have them, then you may want to set table preferences using DBMS_STATS. But in the other hand, if you are suspecting a Histogram is affecting one of your plans, you may actually want to confirm that. I mean, doing some analysis of the execution plan and confirming it is the histogram the root cause and not something else.
  
  Carlos Sierra
  
  August 14, 2017 at 7:57 am
  
  Reply
I have a table with 8 columns and one of the column is of BLOB datatype. I would like to exclude that column from the statistics gathering,
as that column is hardly used. And it takes very long time to gather statistics. This table has about 600 mil records.
I tried with Method_opt => ‘For all columns size auto for columns (col1,col2,col3,…)’
But it giving me error, that col1 is invalid identifier.

Any idea, how i can exclude that.

Thanks.

Mandar K

December 29, 2017 at 7:06 am

Reply
- Try without the parenthesis. With parenthesis you are referring to an extension (column group), and not to a list of columns.
  
  Carlos Sierra
  
  January 2, 2018 at 7:38 am
  
  Reply