Carlos Sierra's Tools and Tips

Tools and Tips for Oracle Performance and SQL Tuning

Two common mistakes using METHOD_OPT in DBMS_STATS

with 11 comments

When gathering Cost-based Optimizer (CBO) statistics for a Table, most probably you use DBMS_STATS directly (unless your application is EBS, PSFT or Siebel). There are a couple of common mistakes we see around the use of the METHOD_OPT parameter on GATHER APIs out of this DBMS_STATS package.

METHOD_OPT => ‘FOR ALL INDEXED COLUMNS SIZE…’

With this parameter value DBMS_STATS gathers statistics ONLY in columns referenced in at least one index. So, any column that is not referenced by an index does NOT get any new statistics. The misconception is that if we specify let’s say SIZE AUTO or SIZE 254,  we would get Histograms for indexed columns and regular statistics (with no Histograms) for the rest of the columns. This is not the case. We simple do not get new statistics for those columns which are not referenced by an index.

WHY is this a problem? Well, if your query has predicates referencing a non-indexed column, the CBO still has to figure out the Selectivity for the predicate. With outdated statistics or no statistics at all, this Selectivity may not be accurate and your plan may suffer.

If you are thinking “if I have no statistics on a column then Dynamic Sampling will kick in” you may be overlooking the fact that in most cases Dynamic Sampling ONLY gets invoked if your Table has no statistics. In a case like this the Table has statistics (DBA_TABLES) but the column of interest may not (DBA_TABL_COLUMNS). In other words, Dynamic Sampling is not the answer here. What you really want is to stay away of “FOR  ALL INDEXED COLUMNS“. You want to use instead “FOR ALL COLUMNS“.

Note: Siebel is an exception since Siebel requires all columns ever used on a predicate to have at least one index. Not that I agree with this policy, but if actually implemented then gathering statistics in all indexed columns is just fine.

METHOD_OPT => NULL

If you are using a tool or a script to dynamically generate the DBMS_STATS.GATHER_TABLE_STATS command to gather Table statistics, you may have a front-end or set of input parameters to specify HOW the user of your tool/script wants to gather statistics. A common mistake (I made it myself some time back), is to pass parameter “METHOD_OPT => NULL” (when the user does not specify anything for method). Be aware that passing a NULL in this parameter means: gather statistics for the Table (DBA_TABLES) but nothing for the Columns (DBA_TAB_COLUMNS). In other words, the statistics for Table Columns do not get refreshed, preserving their old values.

On 10g and 11g using “METHOD_OPT => NULL” is wrong since we end-up with outdated and out of sync statistics. Of course this has a negative effect in our execution plans. On 12c it seems the behavior is different. I did some testings and on 12c  it seems that we get refreshed column statistics even if we pass a NULL. But if you are on 10g or 11g be aware of this common mistake. What you want to do instead is to NOT pass this parameter at. If the parameter is not passed at all, then it gets a default value that basically means: gather statistics in all columns, but in those referenced by at least one predicate, collect Histograms. This default value is usually fine, and in any case it is what you intended when you were passing NULL in the first place.

Conclusion

Gathering CBO statistics calling DBMS_STATS.GATHER_TABLE_STATS with METHOD_OPT => ‘FOR ALL INDEXED COLUMNS…” or passing a value of NULL are common mistakes. You don’t see the detrimental effects right away. But as the column statistics become gradually older and older, some of your execution plans start to change. The symptoms are: unexpected poorly performing plans pop-up once in a while.  The remedy is simple: avoid these two common mistakes!

Keep always in mind: statistics that closely represent your data improve the chances for the CBO to generate an optimal plan. I always think of the CBO as an old temperamental lady: be nice to her and she may be nice to you in return ;-)

Written by Carlos Sierra

July 27, 2013 at 6:38 am

Posted in Statistics

11 Responses

Subscribe to comments with RSS.

  1. […] #DATABASE #ORACLE #TUNING by Carlos Sierra : Two common mistakes using METHOD_OPT in DBMS_STATS […]

  2. I use METHOD_OPT => ‘FOR ALL INDEXED COLUMNS SIZE 254′. What is the problem
    with the non-indexed columns not having histograms? What kind of a decision can
    optimizer reach for those columns? The optimizer decision usually boils down to
    whether to use an index or not. There are not many decisions which can be made
    if there are no indexes referencing the column in question. About the only
    decision that can be made, based on the data distribution in the non-indexed
    column, is whether to use sort/merge or hash join. If that is ever the case,
    there is an application design problem, rather than the optimizer problem.

    Mladen Gogala

    July 27, 2013 at 2:09 pm

    • CBO uses column statistics to estimate Selectivity of predicates referencing such columns, index or not. CBO decides for join order and join method, not only access method. Selectivity influences join order and method. Not sure why you consider the join order an application design problem. Column statistics are important if there are predicates on them (filter or access).

      Carlos Sierra

      July 27, 2013 at 5:08 pm

  3. Very nice post Carlos. I always learn something reading your material. Can you help answer a question you made me think of, please. You stated the default method_opt is “gather statistics in all columns, but in those referenced by at least one predicate, collect Histograms”. So … while gathering stats, how does oracle know if a column is referenced by at least one predicate? I’m curious.

    Michael Cunningham

    July 29, 2013 at 12:11 pm

    • Michael,
      When you call DBMS_STATS passing its default value to METHOD_OPT, the value is “FOR ALL COLUMNS SIZE AUTO”. DBMS_STATS looks into SYS.COL_USAGE$ to determine which columns have been referenced by at least one predicate. That helps to decide if the column gets a Histogram or not. In the latter case, it still gets column statistics.

      Carlos Sierra

      July 29, 2013 at 8:08 pm

  4. Thanks for such a good article….
    Is there any view/table in database which can tell us, which METHOD_OPT was used while collecting the statistics on a table

    AT

    August 3, 2013 at 8:18 am

    • AT,
      No, we do not store the method_opt that we used, but we can pretty much figure it out when we look at the statistics. You may want to use SQLT on a SQL of concern, so finding out the method_opt becomes easy.

      Carlos Sierra

      August 3, 2013 at 8:56 am

  5. Thanks for all the good info Carlos. We are big Siebel users. You mentioned that using FOR ALL INDEXED COLUMNS was okay for Siebel databases as Siebel requires that all columns used in a predicate must have at least one index. My question is how does Siebel require that or is it up to me to ensure that the necessary indexes have been built?

    rbrieck

    January 24, 2014 at 2:52 pm

    • You can use coe_siebel_profile.sql for that effect. Please refer to Metalink (MOS) note 781927.1. The script is in that note.

      Carlos Sierra

      January 24, 2014 at 4:22 pm


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 1,677 other followers

%d bloggers like this: