Interesting case where a full table scan is chosen instead of an index

While I was presenting at Hotsos Symposium 2013, I showed a real-life case where the CBO was choosing a full table scan instead of an index range scan. The root cause was a suspicious “NVL(column, value) = value” predicate seen in the execution plan. But the actual SQL contained only “column = value”. So the question became “WHERE the NVL is coming from”? The selection of index or full scan became a simple side effect of the “function(column) = value” predicate.

On 11g there is a new optimization to improve the performance of DDL operations adding columns to an exiting table with data, where the new column is declared as having a constant as default value while making it not nullable. In other words, all existing rows are logically expected to be updated with default value on this new column. If the table is large, this ALTER TABLE ADD COLUMN operation would take long. This optimization records instead just the metadata for this new column (default value and not null constraint) and returns in no time regardless the size of the table.

What was confusing in this real-life case was the fact that filter predicate looked like this “NVL(USER2.SOME_FLAG,’N’)=’Y'” while SOME_FLAG default value was “Y” instead of “N” according to DBA_TAB_COLS. How was that possible? At some point we suspected the column default value had changed from “N” to “Y” but it was a simple speculation. So HOW do we prove or disprove the unexpected “N” in the filter predicate was the product of changing the default value of the affected column?

I created a small test case in order to reproduce observations from the real-life case. I created a table with data on my 11g test database. I created afterwards a new column which would use this new optimization (having a default constant and making it not nullable during same DDL). Last, I modified the default value for same column with another DDL. Keep in mind that when column was created only the metadata was recorded and the actual rows where not updated (this is the optimization).

DROP TABLE t1;
CREATE TABLE t1 AS
WITH v1 AS (SELECT ROWNUM n FROM dual CONNECT BY LEVEL <= 10000)
SELECT ROWNUM id,
 DBMS_RANDOM.STRING('U', 40) c1
 FROM v1, v1
 WHERE ROWNUM <= 10000;

-- rows are not updated. only metadata is recorded. (first DDL)
ALTER TABLE t1 ADD (c2 VARCHAR2(3) DEFAULT 'NO' NOT NULL);

-- data default is recorded. (second DDL)
ALTER TABLE t1 MODIFY (c2 DEFAULT 'YES');

SELECT COUNT(*) FROM t1 WHERE c2 = 'NO';

/*
 COUNT(*)
----------
 10000 <<<<<<<<<<<<<<<<< expecting 0!
*/

SET PAGES 100 LIN 300;
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL, NULL, 'PREDICATE'));

/*
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 23 (100)| |
| 1 | SORT AGGREGATE | | 1 | 3 | | |
|* 2 | TABLE ACCESS FULL| T1 | 11342 | 34026 | 23 (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
 2 - filter(NVL("C2",'NO')='NO') <<<< expecting (NVL("C2",'YES')='NO')!
*/

I was expecting different result and I thought this was a bug. I presented the issue to Mauro Pagano for a peer review and he explained that results were expected (and correct). If we hadn’t had the DDL optimization, at the time of the first ALTER TABLE we would had updated all rows with value “NO”. Then at second DDL we would had updated no rows with value “YES”, but future rows may get that value “YES” if c2 was not explicitly assigned a value. It made sense. And the optimization mimics the same behavior. So, rows that existed before the first DDL have no actual value for “c2” but stored metadata will assign “NO” when accessed. Rows created between DDL 1 and 2 would acquire default value of “NO” if needed (and updated with that “NO” value). And rows created after 2nd DDL would get value “YES” if needed (and updated with that “YES” value). So, only rows before DDL 1 would still have an internal NULL but external “NO”, thanks to initial stored metadata from DDL optimization.

Conclusion

The use of DDL optimization improved the performance of the ALTER TABLE ADD COLUMN command but it introduced a filter predicate which disabled the possible use of a normal index on same column, affecting the performance of queries having such filter predicates. Possible solutions include:

Adding such columns without the optimization: Add column (DDL), update all rows to default value (DML), add default value to column (DDL), add not null constraint (DDL)
Issue an ALTER SESSION SET “_add_col_optim_enabled” = FALSE; before the column is created
Create either an expression with function seen in predicate, or a function-based index

Written by Carlos Sierra

March 12, 2013 at 8:59 am

Posted in OUG, Plan Stability

13 Responses

Subscribe to comments with RSS.

Dear Carlos,

Then this means that DDL optimization introduced a slient change to the C2 column so that this one is in fact represented, behind the scene, by NVL(C2, ‘NO’).

In that case we have better to virtual column which will represent the “C2” metadata and index that column as shown in the following example:

mohamed@mhouri> alter table t1 add c2virt varchar2(3) generated always as (NVL(C2,'NO')) virtual;

Table altered.

mohamed@mhouri> create index ind_v on t1(c2virt);

Index created.

mohamed@mhouri> select count(1) from t1 where c2virt='NO';

  COUNT(1)
----------
     10000

1 row selected.

mohamed@mhouri> select * from table (dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
SQL_ID  5c71tv7y29cr6, child number 0
-------------------------------------
select count(1) from t1 where c2virt='NO'

Plan hash value: 1690899805

-------------------------------------------------------------------------------
| Id  | Operation             | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |       |       |       |     9 (100)|          |
|   1 |  SORT AGGREGATE       |       |     1 |     6 |            |          |
|*  2 |   INDEX FAST FULL SCAN| IND_V |   106 |   636 |     9   (0)| 00:00:01 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("C2VIRT"='NO')

Note
-----
   - dynamic sampling used for this statement (level=2)

Note that when I was trying to create the virtual column I observed this bizare thing

mohamed@mhouri> alter table t1 add c2virt number(1) generated always as (NVL(C2,'NO')) virtual;

Table altered.

mohamed@mhouri> create index ind_v on t1(c2virt);
create index ind_v on t1(c2virt)
          *
ERROR at line 1:
ORA-01722: invalid number

Oracle allows the creating of a virutal column having NUMBER datatype on column C2 which is of VARCHAR2 but forturnately the creation of the index on such column is refused

Best Regards
Mohamed Houri
http://www.hourim.wordpress.com

hourim

March 12, 2013 at 2:57 pm

Mohamed,

I see your point. Yes, it looks like validation of data types on expression used to create virtual column happens when the data from it is materialized into the index. I also get “ORA-01722: invalid number” when I create such virtual column then try a simple SELECT * FROM t1.

Cheers — Carlos

Carlos Sierra

March 13, 2013 at 7:00 am

Reply

Hi Carlos,

I’ve created a index on that column and Oracle uses it without problem. I don’t know why you say “it introduced a filter predicate which disabled the possible use of a normal index on same column”.

Nice post!

joaquingonzalez

March 13, 2013 at 4:45 am

Maybe it has to do with statistics rather than default column value.
My test case (Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 – 64bit Production):

DROP TABLE t1;

CREATE TABLE t1 AS
WITH v1 AS (SELECT ROWNUM n FROM dual CONNECT BY LEVEL <= 10000)
SELECT ROWNUM id,
DBMS_RANDOM.STRING('U', 40) c1
FROM v1, v1
WHERE ROWNUM <= 10000;

ALTER TABLE t1 ADD (c2 VARCHAR2(3) DEFAULT 'NO' NOT NULL);

ALTER TABLE t1 MODIFY (c2 DEFAULT 'YES');

insert into t1 values (0, DBMS_RANDOM.STRING('U', 40), 'YES');

create index t1_i on t1 (c2);

select /*+index(t1 t1(c2))*/ * from t1 where c2='YES';

The plan I see is:

———————————————————————————————————–
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers |
———————————————————————————————————–
| 0 | SELECT STATEMENT | | 1 | | 2 (100)| 1 |00:00:00.01 | 3 |
| 1 | TABLE ACCESS BY INDEX ROWID| T1 | 1 | 1 | 2 (0)| 1 |00:00:00.01 | 3 |
|* 2 | INDEX RANGE SCAN | T1_I | 1 | 1 | 1 (0)| 1 |00:00:00.01 | 2 |
———————————————————————————————————–

Predicate Information (identified by operation id):
—————————————————

2 – access("C2"='YES')

Note
—–
– dynamic sampling used for this statement (level=2)

joaquingonzalez

March 13, 2013 at 6:59 am

Reply
The filter predicate does affect cardinality when table t1 has statistics, if it doesn’t, Oracle uses dynamic sampling, and the cardinality is correct.

select * from t1 where c2=’YES’

Plan hash value: 838529891

—————————————————————–
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)|
—————————————————————–
| 0 | SELECT STATEMENT | | 1 | | 16 (100)|
|* 1 | TABLE ACCESS FULL| T1 | 1 | 1 | 16 (0)|
—————————————————————–

Predicate Information (identified by operation id):
—————————————————

1 – filter(NVL(“C2”,’NO’)=’YES’)

Note
—–
– dynamic sampling used for this statement (level=2)

joaquingonzalez

March 13, 2013 at 7:10 am

Reply

Joaquin,

The real case shows a frequency histogram for this column. This histogram shows only one bucket and it is for value ‘N’. Column stats show this column has two distinct values with low/high of ‘N’ and ‘Y’. Due to fix for bug 10174050 selectivity for value ‘Y’ (which is missing on histogram) is computed as 0.5 of smallest bucket (with value ‘N’ and selectivity of 1.0). With such a large selectivity CBO goes with FTS and we see NVL predicate. If we had had a more accurate selectivity (close to zero), we would had used the index on this column and the predicate from SQL would not had had the NVL on it. Original case had an index on this column. Original table had 3M rows and cardinality was computed as 1.5M. There were only 2 rows with value ‘Y’. DBMS_STATS auto sample size missed value ‘Y’ on histogram (bug 10174050).

So, the suboptimal plan was due to corner case where missing value from histogram made CBO over estimate cardinality and decide on a FTS. Presence of NVL on predicate was then a side effect of CBO selecting a FTS and not the other way around.

Cheers — Carlos

Carlos Sierra

March 13, 2013 at 7:55 am

Thanks for the explanation.

Cheers – Joaquin

joaquingonzalez

March 13, 2013 at 8:46 am

I think that Joaquim’s suggestion is correct. That is, this DDL optimization is not affecting the use of an index created on the column C2. When the CBO sees that the cost of using the index on C2 is better that FULL SCANNING t1 table it will use that index even thought a NVL filter is applied on the indexed C2 column. And vice versa when the cost of FULL SCANNING the table is less than the cost of using an index range scan.

See example below:

mohamed@mhouri> select * from t1 where c2='YES'; ---> 1 row
                                             
------------------------------------------------------------------------
SQL_ID  80c9dzzucv1pt, child number 0                                                                                   
-------------------------------------                                                                                   
select * from t1 where c2='YES'                                                                                         
                                                                                                                        
Plan hash value: 1242897189                                                                                             
                                                                                                                        
-----------------------------------------------------------------------
| Id  | Operation                   | Name | Starts | E-Rows | A-Rows |
-----------------------------------------------------------------------
|   0 | SELECT STATEMENT            |      |      1 |        |      1 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1   |      1 |      1 |      1 |
|*  2 |   INDEX RANGE SCAN          | T1_I |      1 |      1 |      1 |
-----------------------------------------------------------------------
                                                                                                                        
Predicate Information (identified by operation id):                                                                     
---------------------------------------------------                                                                     
                                                                                                                        
   2 - access(NVL("C2",'NO')='YES')                                                                                     
                                                                                                                        
Note                                                                                                                    
-----                                                                                                                   
   - dynamic sampling used for this statement (level=2)                                                                 
                                                                                                                        
mohamed@mhouri> select * from t1 where c2='NO';  ---> 10000 rows selected.

                                           
-------------------------------------------------------------
SQL_ID  cmt4ug5h60dt8, child number 0                                                                                   
-------------------------------------                                                                                   
select * from t1 where c2='NO'                                                                                          
                                                                                                                        
Plan hash value: 3617692013                                                                                             
                                                                                                                        
-------------------------------------------------------------                                 
| Id  | Operation         | Name | Starts | E-Rows | A-Rows |                                
-------------------------------------------------------------                                
|   0 | SELECT STATEMENT  |      |      1 |        |  10000 |                                  
|*  1 |  TABLE ACCESS FULL| T1   |      1 |  10502 |  10000 |                                
-------------------------------------------------------------                               
                                                                                                                        
Predicate Information (identified by operation id):                                                                     
---------------------------------------------------                                                                     
                                                                                                                        
   1 - filter(NVL("C2",'NO')='NO')                                                                                      
                                                                                                                        
Note                                                                                                                    
-----                                                                                                                   
   - dynamic sampling used for this statement (level=2)

hourim

March 13, 2013 at 9:05 am

First bug reference should had been 5483301 and not 10174050 .

Carlos Sierra

March 14, 2013 at 7:04 am

Twist. When FTS then NVL. Index does not need NVL since it contains values for column:

DROP TABLE t1;
CREATE TABLE t1 AS
WITH v1 AS (SELECT ROWNUM n FROM dual CONNECT BY LEVEL <= 10000)
SELECT ROWNUM id,
       DBMS_RANDOM.STRING('U', 40) c1
  FROM v1, v1
 WHERE ROWNUM <= 10000;

-- rows are not updated. only metadata is recorded.
ALTER TABLE t1 ADD (c2 VARCHAR2(3) DEFAULT 'NO' NOT NULL);

-- create index on c2
CREATE INDEX t1_i1 ON t1 (c2);

-- data default is recorded.
ALTER TABLE t1 MODIFY (c2 DEFAULT 'YES');

-- insert one row with YES
INSERT INTO t1 VALUES (0, DBMS_RANDOM.STRING('U', 40), 'YES');

-- distribution
SELECT c2, COUNT(*) FROM t1 GROUP BY c2;

-- gather stats (no histogram since column usage is cold)
EXEC DBMS_STATS.gather_table_stats(USER, 't1');

SELECT COUNT(*) FROM t1 WHERE c2 = 'YES';

/*
  COUNT(*)
----------
         1
*/

SET PAGES 100 LIN 300;
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL, NULL, 'PREDICATE'));

/*
-------------------------------------------------------
| Id  | Operation             | Name  | Rows  | Cost (|
-------------------------------------------------------
|   0 | SELECT STATEMENT      |       |       |     7 |
|   1 |  SORT AGGREGATE       |       |     1 |       |
|*  2 |   INDEX FAST FULL SCAN| T1_I1 |  5001 |     7 |
-------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("C2"='YES')
*/

SELECT /*+ FULL(t1) */ COUNT(*) FROM t1 WHERE c2 = 'YES';
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL, NULL, 'PREDICATE'));

/*
---------------------------------------------------
| Id  | Operation          | Name | Rows  | Cost (|
---------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |    21 |
|   1 |  SORT AGGREGATE    |      |     1 |       |
|*  2 |   TABLE ACCESS FULL| T1   |  5001 |    21 |
---------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(NVL("C2",'NO')='YES')
/*

Carlos Sierra

March 13, 2013 at 11:55 am

[…] Joaquín González pointed out to me that Carlos Sierra has blogged about this too . It was his presentation where the topic came up. […]

mandatory columns | Oracle MVA

March 14, 2013 at 4:22 am

Reply
[…] Hier noch der Link zu Carlos Blog: Interesting case where a full table scan is chosen instead of an index […]

Lösung: Eine unerwartete Bedingung in der Where Clausel | Oracle Rätsel Blog

May 30, 2016 at 3:35 am

Reply
[…] Here the link to Carlos’s Blog: Interesting case where a full table scan is chosen instead of an index […]

SOLUTION: A Strange Condition in the Where Clause | Oracle Riddle Blog

May 30, 2016 at 4:11 am

Reply

Carlos Sierra's Tools and Tips