Interesting case where a full table scan is chosen instead of an index
While I was presenting at Hotsos Symposium 2013, I showed a real-life case where the CBO was choosing a full table scan instead of an index range scan. The root cause was a suspicious “NVL(column, value) = value” predicate seen in the execution plan. But the actual SQL contained only “column = value”. So the question became “WHERE the NVL is coming from”? The selection of index or full scan became a simple side effect of the “function(column) = value” predicate.
On 11g there is a new optimization to improve the performance of DDL operations adding columns to an exiting table with data, where the new column is declared as having a constant as default value while making it not nullable. In other words, all existing rows are logically expected to be updated with default value on this new column. If the table is large, this ALTER TABLE ADD COLUMN operation would take long. This optimization records instead just the metadata for this new column (default value and not null constraint) and returns in no time regardless the size of the table.
What was confusing in this real-life case was the fact that filter predicate looked like this “NVL(USER2.SOME_FLAG,’N’)=’Y'” while SOME_FLAG default value was “Y” instead of “N” according to DBA_TAB_COLS. How was that possible? At some point we suspected the column default value had changed from “N” to “Y” but it was a simple speculation. So HOW do we prove or disprove the unexpected “N” in the filter predicate was the product of changing the default value of the affected column?
I created a small test case in order to reproduce observations from the real-life case. I created a table with data on my 11g test database. I created afterwards a new column which would use this new optimization (having a default constant and making it not nullable during same DDL). Last, I modified the default value for same column with another DDL. Keep in mind that when column was created only the metadata was recorded and the actual rows where not updated (this is the optimization).
DROP TABLE t1; CREATE TABLE t1 AS WITH v1 AS (SELECT ROWNUM n FROM dual CONNECT BY LEVEL <= 10000) SELECT ROWNUM id, DBMS_RANDOM.STRING('U', 40) c1 FROM v1, v1 WHERE ROWNUM <= 10000; -- rows are not updated. only metadata is recorded. (first DDL) ALTER TABLE t1 ADD (c2 VARCHAR2(3) DEFAULT 'NO' NOT NULL); -- data default is recorded. (second DDL) ALTER TABLE t1 MODIFY (c2 DEFAULT 'YES'); SELECT COUNT(*) FROM t1 WHERE c2 = 'NO'; /* COUNT(*) ---------- 10000 <<<<<<<<<<<<<<<<< expecting 0! */ SET PAGES 100 LIN 300; SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL, NULL, 'PREDICATE')); /* --------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | | | 23 (100)| | | 1 | SORT AGGREGATE | | 1 | 3 | | | |* 2 | TABLE ACCESS FULL| T1 | 11342 | 34026 | 23 (0)| 00:00:01 | --------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter(NVL("C2",'NO')='NO') <<<< expecting (NVL("C2",'YES')='NO')! */
I was expecting different result and I thought this was a bug. I presented the issue to Mauro Pagano for a peer review and he explained that results were expected (and correct). If we hadn’t had the DDL optimization, at the time of the first ALTER TABLE we would had updated all rows with value “NO”. Then at second DDL we would had updated no rows with value “YES”, but future rows may get that value “YES” if c2 was not explicitly assigned a value. It made sense. And the optimization mimics the same behavior. So, rows that existed before the first DDL have no actual value for “c2” but stored metadata will assign “NO” when accessed. Rows created between DDL 1 and 2 would acquire default value of “NO” if needed (and updated with that “NO” value). And rows created after 2nd DDL would get value “YES” if needed (and updated with that “YES” value). So, only rows before DDL 1 would still have an internal NULL but external “NO”, thanks to initial stored metadata from DDL optimization.
Conclusion
The use of DDL optimization improved the performance of the ALTER TABLE ADD COLUMN command but it introduced a filter predicate which disabled the possible use of a normal index on same column, affecting the performance of queries having such filter predicates. Possible solutions include:
- Adding such columns without the optimization: Add column (DDL), update all rows to default value (DML), add default value to column (DDL), add not null constraint (DDL)
- Issue an ALTER SESSION SET “_add_col_optim_enabled” = FALSE; before the column is created
- Create either an expression with function seen in predicate, or a function-based index
Dear Carlos,
Then this means that DDL optimization introduced a slient change to the C2 column so that this one is in fact represented, behind the scene, by NVL(C2, ‘NO’).
In that case we have better to virtual column which will represent the “C2” metadata and index that column as shown in the following example:
Note that when I was trying to create the virtual column I observed this bizare thing
Oracle allows the creating of a virutal column having NUMBER datatype on column C2 which is of VARCHAR2 but forturnately the creation of the index on such column is refused
Best Regards
Mohamed Houri
http://www.hourim.wordpress.com
hourim
March 12, 2013 at 2:57 pm
Mohamed,
I see your point. Yes, it looks like validation of data types on expression used to create virtual column happens when the data from it is materialized into the index. I also get “ORA-01722: invalid number” when I create such virtual column then try a simple SELECT * FROM t1.
Cheers — Carlos
Carlos Sierra
March 13, 2013 at 7:00 am
Hi Carlos,
I’ve created a index on that column and Oracle uses it without problem. I don’t know why you say “it introduced a filter predicate which disabled the possible use of a normal index on same column”.
Nice post!
joaquingonzalez
March 13, 2013 at 4:45 am
Maybe it has to do with statistics rather than default column value.
My test case (Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 – 64bit Production):
DROP TABLE t1;
CREATE TABLE t1 AS
WITH v1 AS (SELECT ROWNUM n FROM dual CONNECT BY LEVEL <= 10000)
SELECT ROWNUM id,
DBMS_RANDOM.STRING('U', 40) c1
FROM v1, v1
WHERE ROWNUM <= 10000;
ALTER TABLE t1 ADD (c2 VARCHAR2(3) DEFAULT 'NO' NOT NULL);
ALTER TABLE t1 MODIFY (c2 DEFAULT 'YES');
insert into t1 values (0, DBMS_RANDOM.STRING('U', 40), 'YES');
create index t1_i on t1 (c2);
select /*+index(t1 t1(c2))*/ * from t1 where c2='YES';
The plan I see is:
———————————————————————————————————–
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers |
———————————————————————————————————–
| 0 | SELECT STATEMENT | | 1 | | 2 (100)| 1 |00:00:00.01 | 3 |
| 1 | TABLE ACCESS BY INDEX ROWID| T1 | 1 | 1 | 2 (0)| 1 |00:00:00.01 | 3 |
|* 2 | INDEX RANGE SCAN | T1_I | 1 | 1 | 1 (0)| 1 |00:00:00.01 | 2 |
———————————————————————————————————–
Predicate Information (identified by operation id):
—————————————————
2 – access("C2"='YES')
Note
—–
– dynamic sampling used for this statement (level=2)
joaquingonzalez
March 13, 2013 at 6:59 am
The filter predicate does affect cardinality when table t1 has statistics, if it doesn’t, Oracle uses dynamic sampling, and the cardinality is correct.
select * from t1 where c2=’YES’
Plan hash value: 838529891
—————————————————————–
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)|
—————————————————————–
| 0 | SELECT STATEMENT | | 1 | | 16 (100)|
|* 1 | TABLE ACCESS FULL| T1 | 1 | 1 | 16 (0)|
—————————————————————–
Predicate Information (identified by operation id):
—————————————————
1 – filter(NVL(“C2”,’NO’)=’YES’)
Note
—–
– dynamic sampling used for this statement (level=2)
joaquingonzalez
March 13, 2013 at 7:10 am
Joaquin,
The real case shows a frequency histogram for this column. This histogram shows only one bucket and it is for value ‘N’. Column stats show this column has two distinct values with low/high of ‘N’ and ‘Y’. Due to fix for bug 10174050 selectivity for value ‘Y’ (which is missing on histogram) is computed as 0.5 of smallest bucket (with value ‘N’ and selectivity of 1.0). With such a large selectivity CBO goes with FTS and we see NVL predicate. If we had had a more accurate selectivity (close to zero), we would had used the index on this column and the predicate from SQL would not had had the NVL on it. Original case had an index on this column. Original table had 3M rows and cardinality was computed as 1.5M. There were only 2 rows with value ‘Y’. DBMS_STATS auto sample size missed value ‘Y’ on histogram (bug 10174050).
So, the suboptimal plan was due to corner case where missing value from histogram made CBO over estimate cardinality and decide on a FTS. Presence of NVL on predicate was then a side effect of CBO selecting a FTS and not the other way around.
Cheers — Carlos
Carlos Sierra
March 13, 2013 at 7:55 am
Thanks for the explanation.
Cheers – Joaquin
joaquingonzalez
March 13, 2013 at 8:46 am
I think that Joaquim’s suggestion is correct. That is, this DDL optimization is not affecting the use of an index created on the column C2. When the CBO sees that the cost of using the index on C2 is better that FULL SCANNING t1 table it will use that index even thought a NVL filter is applied on the indexed C2 column. And vice versa when the cost of FULL SCANNING the table is less than the cost of using an index range scan.
See example below:
hourim
March 13, 2013 at 9:05 am
First bug reference should had been 5483301 and not 10174050 .
Carlos Sierra
March 14, 2013 at 7:04 am
Twist. When FTS then NVL. Index does not need NVL since it contains values for column:
Carlos Sierra
March 13, 2013 at 11:55 am
[…] Joaquín González pointed out to me that Carlos Sierra has blogged about this too . It was his presentation where the topic came up. […]
mandatory columns | Oracle MVA
March 14, 2013 at 4:22 am
[…] Hier noch der Link zu Carlos Blog: Interesting case where a full table scan is chosen instead of an index […]
Lösung: Eine unerwartete Bedingung in der Where Clausel | Oracle Rätsel Blog
May 30, 2016 at 3:35 am
[…] Here the link to Carlos’s Blog: Interesting case where a full table scan is chosen instead of an index […]
SOLUTION: A Strange Condition in the Where Clause | Oracle Riddle Blog
May 30, 2016 at 4:11 am