Counting rows fast
A friend of mine asked me last night basically this: “How is that SQLTXPLAIN counts rows?”. In particular, he was referring to the use of the SAMPLE clause of the SELECT statement. Look at this SQLT’s log piece:
SQL_ID a9x1kc4ymyhkz -------------------- SELECT /*+ FULL(t) PARALLEL */ COUNT(*) * 1e4 FROM "XYPZ"."INSTRUMENT" SAMPLE (.01) t SQL_ID 025v6k1032t69 -------------------- SELECT /*+ FULL(t) PARALLEL */ COUNT(*) * 1e5 FROM "XYPZ"."POSITION_COMPOSITION" SAMPLE (.001) t SQL_ID 8rby3340xpd9k -------------------- SELECT /*+ FULL(t) PARALLEL */ COUNT(*) * 1e5 FROM "XYPZ"."POSITION_EVENT" SAMPLE (.001) t
WHY is it that SQLT has to count rows?
SQLT has to count rows so it can report side by side DBA_TABLES.NUM_ROWS and COUNT(*) from each Table. This is an easy way to see if your statistics are way off, and this mechanism exists on SQLT well before DBA_TAB_MODIFICATIONS came to existence. Actually, SQLT uses both methods to health-check how stale are your Table statistics.
The conundrum here is: “I use SQLT because I want to diagnose a performance issue on a QUERY on top of large Tables, but I do not want SQLT to take a long time just to produce a COUNT(*) of my Tables…”.
Fast versus Precise
In Performance tuning, there is always a trade-off. You want X but you sacrifice B. Counting rows is no different. Do you want it faster? Then you sacrifice precision. The SAMPLE clause of the SELECT statement allows you to do exactly that (syntax below):
SAMPLE [ BLOCK ] ( sample_percent ) [ SEED ( seed_value ) ]
So, if you specify a 10% sample size then you have to multiply the COUNT(*) by 10. If you sample 1% you multiply the COUNT(*) by 100. In large Tables if you sample, lets say 0.1%, your multiplier becomes 1,000, which is the same than 1e3 (10**3 or 10^3 depending where you went to school). Sample size can be as small as 0.000,001 and as large as 100 (but without including 100 itself). It represents probabilities more than an actual sample size.
The optional BLOCK clause simple says: use sample blocks instead of rows. And the optional SEED clause tries to provide some consistency in the result of the count when you use the same value for two executions of the exact same count. This SEED clause takes a value between 0 and 4,294,967,295.
How SQLT counts rows?
SQLT has over 40 tool parameters. One of them is count_star_threshold with a seeded value of 10,000.
SQLT includes a small algorithm (below) that determines the size of the SAMPLE according to the estimated size of the Table itself, by looking at its statistics as per DBA_TABLES.NUM_ROWS. No statistics? then skip the sample and do a normal full scan. If the Table is expected to be smaller then the count_star_threshold, then do a full scan. So is up to 10x this threshold. After that, use a sample size proportionally inverse to the Table size. The bigger the Table the smaller the Sample.
SQLT also forces a full Table scan and invokes Parallel Execution (PX) as a method to expedite the count. This count can be really fast on Exadata systems as you can imagine.
/* ------------------------- * * private perform_count_star * * called by: sqlt$i.common_calls and sqlt$i.remote_xtract * * ------------------------- */ PROCEDURE perform_count_star (p_statement_id IN NUMBER) IS l_sql VARCHAR2(32767); l_number NUMBER; l_count NUMBER; BEGIN write_log('=> perform_count_star'); IF sqlt$a.get_param_n('count_star_threshold') = 0 THEN write_log('skip "count_star" as per corresponding parameter'); ELSE FOR i IN (SELECT owner, table_name, num_rows, source FROM &&tool_administer_schema..sqlt$_dba_all_tables_v WHERE statement_id = p_statement_id ORDER BY owner, table_name) LOOP IF i.num_rows IS NULL THEN l_sql := 'SELECT /*+ FULL(t) PARALLEL */ COUNT(*) FROM "'||i.owner||'"."'||i.table_name||'" t WHERE ROWNUM <= :number'; l_number := sqlt$a.get_param_n('count_star_threshold'); ELSIF i.num_rows < sqlt$a.get_param_n('count_star_threshold') THEN l_sql := 'SELECT /*+ FULL(t) PARALLEL */ COUNT(*) FROM "'||i.owner||'"."'||i.table_name||'" t WHERE ROWNUM <= :number'; l_number := sqlt$a.get_param_n('count_star_threshold') * 10; ELSIF i.num_rows < (sqlt$a.get_param_n('count_star_threshold') * 1e1) THEN l_sql := 'SELECT /*+ FULL(t) PARALLEL */ COUNT(*) * 1e1 FROM "'||i.owner||'"."'||i.table_name||'" SAMPLE (:number) t'; l_number := 1e1; ELSIF i.num_rows < (sqlt$a.get_param_n('count_star_threshold') * 1e2) THEN l_sql := 'SELECT /*+ FULL(t) PARALLEL */ COUNT(*) * 1e2 FROM "'||i.owner||'"."'||i.table_name||'" SAMPLE (:number) t'; l_number := 1e0; ELSIF i.num_rows < (sqlt$a.get_param_n('count_star_threshold') * 1e3) THEN l_sql := 'SELECT /*+ FULL(t) PARALLEL */ COUNT(*) * 1e3 FROM "'||i.owner||'"."'||i.table_name||'" SAMPLE (:number) t'; l_number := 1/1e1; ELSIF i.num_rows < (sqlt$a.get_param_n('count_star_threshold') * 1e4) THEN l_sql := 'SELECT /*+ FULL(t) PARALLEL */ COUNT(*) * 1e4 FROM "'||i.owner||'"."'||i.table_name||'" SAMPLE (:number) t'; l_number := 1/1e2; ELSIF i.num_rows < (sqlt$a.get_param_n('count_star_threshold') * 1e5) THEN l_sql := 'SELECT /*+ FULL(t) PARALLEL */ COUNT(*) * 1e5 FROM "'||i.owner||'"."'||i.table_name||'" SAMPLE (:number) t'; l_number := 1/1e3; ELSIF i.num_rows < (sqlt$a.get_param_n('count_star_threshold') * 1e6) THEN l_sql := 'SELECT /*+ FULL(t) PARALLEL */ COUNT(*) * 1e6 FROM "'||i.owner||'"."'||i.table_name||'" SAMPLE (:number) t'; l_number := 1/1e4; ELSIF i.num_rows < (sqlt$a.get_param_n('count_star_threshold') * 1e7) THEN l_sql := 'SELECT /*+ FULL(t) PARALLEL */ COUNT(*) * 1e7 FROM "'||i.owner||'"."'||i.table_name||'" SAMPLE (:number) t'; l_number := 1/1e5; ELSE l_sql := 'SELECT /*+ FULL(t) PARALLEL */ COUNT(*) * 1e8 FROM "'||i.owner||'"."'||i.table_name||'" SAMPLE (:number) t'; l_number := 1/1e6; END IF; l_sql := REPLACE(l_sql, ':number', l_number); write_log('num_rows='||i.num_rows||' sql='||l_sql); l_count := NULL; BEGIN EXECUTE IMMEDIATE l_sql INTO l_count; write_log(l_count||' rows counted'); EXCEPTION WHEN OTHERS THEN write_log('** '||SQLERRM); write_log(l_sql||' failed with error above. Process continues.'); END; IF l_count IS NOT NULL THEN IF i.source = 'DBA_TABLES' THEN UPDATE &&tool_repository_schema..sqlt$_dba_tables SET count_star = l_count WHERE statement_id = p_statement_id AND owner = i.owner AND table_name = i.table_name; ELSIF i.source = 'DBA_OBJECT_TABLES' THEN UPDATE &&tool_repository_schema..sqlt$_dba_object_tables SET count_star = l_count WHERE statement_id = p_statement_id AND owner = i.owner AND table_name = i.table_name; END IF; END IF; END LOOP; COMMIT; END IF; write_log('<= perform_count_star'); END perform_count_star;
Conclusion
Counting rows is like counting beans, you can count one at a time, or you can take some shortcuts. If you are willing to sacrifice some precision for the sake of gaining performance, consider then using the SAMPLE clause of the SELECT statement.
Very Nice Thoughts. Thanks For Sharing with us. I got More information about Java from Besant Technologies. If anyone wants to get Oracle Training in Chennai visit Besant Technologies.
Divya
January 25, 2014 at 3:43 am