I upgraded my database a couple of weeks ago and now my users complain their application is slower. They do not provide specifics but they “feel” it is running slower. Sounds familiar?
Every once in a while I get a request that goes like this: “how can I find if some SQL on my database is performing worse over time?”
It is very hard to deal with the ambiguities of some problems like “finding SQL that performs worse or better over time”. But if you simplify the problem and consider for example “Elapsed Time per Execution”, then you can easily produce a script like the one below, which returns a small list of SQL statements that seem to experience either a regression or an improvement over time. It uses linear regression on the ratio between “Elapsed Time per Execution” and its Median per SQL.
Then, If you are suspecting you have some SQL that may have regressed and need a hand to identify them, you can try this script below. It is now part of a small collection of scripts that you can download and use for free out of the cscripts link on the right hand side of this page, under “Downloads”.
---------------------------------------------------------------------------------------- -- -- File name: sql_performance_changed.sql -- -- Purpose: Lists SQL Statements with Elapsed Time per Execution changing over time -- -- Author: Carlos Sierra -- -- Version: 2014/10/31 -- -- Usage: Lists statements that have changed their elapsed time per execution over -- some history. -- Uses the ration between "elapsed time per execution" and the median of -- this metric for SQL statements within the sampled history, and using -- linear regression identifies those that have changed the most. In other -- words where the slope of the linear regression is larger. Positive slopes -- are considered "improving" while negative are "regressing". -- -- Example: @sql_performance_changed.sql -- -- Notes: Developed and tested on 18.104.22.168. -- -- Requires an Oracle Diagnostics Pack License since AWR data is accessed. -- -- To further investigate poorly performing SQL use sqltxplain.sql or sqlhc -- (or planx.sql or sqlmon.sql or sqlash.sql). -- --------------------------------------------------------------------------------------- -- SPO sql_performance_changed.txt; DEF days_of_history_accessed = '31'; DEF captured_at_least_x_times = '10'; DEF captured_at_least_x_days_apart = '5'; DEF med_elap_microsecs_threshold = '1e4'; DEF min_slope_threshold = '0.1'; DEF max_num_rows = '20'; SET lin 200 ver OFF; COL row_n FOR A2 HEA '#'; COL med_secs_per_exec HEA 'Median Secs|Per Exec'; COL std_secs_per_exec HEA 'Std Dev Secs|Per Exec'; COL avg_secs_per_exec HEA 'Avg Secs|Per Exec'; COL min_secs_per_exec HEA 'Min Secs|Per Exec'; COL max_secs_per_exec HEA 'Max Secs|Per Exec'; COL plans FOR 9999; COL sql_text_80 FOR A80; PRO SQL Statements with "Elapsed Time per Execution" changing over time WITH per_time AS ( SELECT h.dbid, h.sql_id, SYSDATE - CAST(s.end_interval_time AS DATE) days_ago, SUM(h.elapsed_time_total) / SUM(h.executions_total) time_per_exec FROM dba_hist_sqlstat h, dba_hist_snapshot s WHERE h.executions_total > 0 AND s.snap_id = h.snap_id AND s.dbid = h.dbid AND s.instance_number = h.instance_number AND CAST(s.end_interval_time AS DATE) > SYSDATE - &&days_of_history_accessed. GROUP BY h.dbid, h.sql_id, SYSDATE - CAST(s.end_interval_time AS DATE) ), avg_time AS ( SELECT dbid, sql_id, MEDIAN(time_per_exec) med_time_per_exec, STDDEV(time_per_exec) std_time_per_exec, AVG(time_per_exec) avg_time_per_exec, MIN(time_per_exec) min_time_per_exec, MAX(time_per_exec) max_time_per_exec FROM per_time GROUP BY dbid, sql_id HAVING COUNT(*) >= &&captured_at_least_x_times. AND MAX(days_ago) - MIN(days_ago) >= &&captured_at_least_x_days_apart. AND MEDIAN(time_per_exec) > &&med_elap_microsecs_threshold. ), time_over_median AS ( SELECT h.dbid, h.sql_id, h.days_ago, (h.time_per_exec / a.med_time_per_exec) time_per_exec_over_med, a.med_time_per_exec, a.std_time_per_exec, a.avg_time_per_exec, a.min_time_per_exec, a.max_time_per_exec FROM per_time h, avg_time a WHERE a.sql_id = h.sql_id ), ranked AS ( SELECT RANK () OVER (ORDER BY ABS(REGR_SLOPE(t.time_per_exec_over_med, t.days_ago)) DESC) rank_num, t.dbid, t.sql_id, CASE WHEN REGR_SLOPE(t.time_per_exec_over_med, t.days_ago) > 0 THEN 'IMPROVING' ELSE 'REGRESSING' END change, ROUND(REGR_SLOPE(t.time_per_exec_over_med, t.days_ago), 3) slope, ROUND(AVG(t.med_time_per_exec)/1e6, 3) med_secs_per_exec, ROUND(AVG(t.std_time_per_exec)/1e6, 3) std_secs_per_exec, ROUND(AVG(t.avg_time_per_exec)/1e6, 3) avg_secs_per_exec, ROUND(MIN(t.min_time_per_exec)/1e6, 3) min_secs_per_exec, ROUND(MAX(t.max_time_per_exec)/1e6, 3) max_secs_per_exec FROM time_over_median t GROUP BY t.dbid, t.sql_id HAVING ABS(REGR_SLOPE(t.time_per_exec_over_med, t.days_ago)) > &&min_slope_threshold. ) SELECT LPAD(ROWNUM, 2) row_n, r.sql_id, r.change, TO_CHAR(r.slope, '990.000MI') slope, TO_CHAR(r.med_secs_per_exec, '999,990.000') med_secs_per_exec, TO_CHAR(r.std_secs_per_exec, '999,990.000') std_secs_per_exec, TO_CHAR(r.avg_secs_per_exec, '999,990.000') avg_secs_per_exec, TO_CHAR(r.min_secs_per_exec, '999,990.000') min_secs_per_exec, TO_CHAR(r.max_secs_per_exec, '999,990.000') max_secs_per_exec, (SELECT COUNT(DISTINCT p.plan_hash_value) FROM dba_hist_sql_plan p WHERE p.dbid = r.dbid AND p.sql_id = r.sql_id) plans, REPLACE((SELECT DBMS_LOB.SUBSTR(s.sql_text, 80) FROM dba_hist_sqltext s WHERE s.dbid = r.dbid AND s.sql_id = r.sql_id), CHR(10)) sql_text_80 FROM ranked r WHERE r.rank_num <= &&max_num_rows. ORDER BY r.rank_num / SPO OFF;
Once you get the output of this script above, you can use the one below to actually list the time series for one of the SQL statements of interest:
---------------------------------------------------------------------------------------- -- -- File name: one_sql_time_series.sql -- -- Purpose: Performance History for one SQL -- -- Author: Carlos Sierra -- -- Version: 2014/10/31 -- -- Usage: Script sql_performance_changed.sql lists SQL Statements with performance -- improvement or regressed over some History. -- This script one_sql_time_series.sql lists the Performance Time Series for -- one SQL. -- -- Parameters: SQL_ID -- -- Example: @one_sql_time_series.sql -- -- Notes: Developed and tested on 22.214.171.124. -- -- Requires an Oracle Diagnostics Pack License since AWR data is accessed. -- -- To further investigate poorly performing SQL use sqltxplain.sql or sqlhc -- (or planx.sql or sqlmon.sql or sqlash.sql). -- --------------------------------------------------------------------------------------- -- SPO one_sql_time_series.txt; SET lin 200 ver OFF; COL instance_number FOR 9999 HEA 'Inst'; COL end_time HEA 'End Time'; COL plan_hash_value HEA 'Plan|Hash Value'; COL executions_total FOR 999,999 HEA 'Execs|Total'; COL rows_per_exec HEA 'Rows Per Exec'; COL et_secs_per_exec HEA 'Elap Secs|Per Exec'; COL cpu_secs_per_exec HEA 'CPU Secs|Per Exec'; COL io_secs_per_exec HEA 'IO Secs|Per Exec'; COL cl_secs_per_exec HEA 'Clus Secs|Per Exec'; COL ap_secs_per_exec HEA 'App Secs|Per Exec'; COL cc_secs_per_exec HEA 'Conc Secs|Per Exec'; COL pl_secs_per_exec HEA 'PLSQL Secs|Per Exec'; COL ja_secs_per_exec HEA 'Java Secs|Per Exec'; SELECT h.instance_number, TO_CHAR(CAST(s.end_interval_time AS DATE), 'YYYY-MM-DD HH24:MI') end_time, h.plan_hash_value, h.executions_total, TO_CHAR(ROUND(h.rows_processed_total / h.executions_total), '999,999,999,999') rows_per_exec, TO_CHAR(ROUND(h.elapsed_time_total / h.executions_total / 1e6, 3), '999,990.000') et_secs_per_exec, TO_CHAR(ROUND(h.cpu_time_total / h.executions_total / 1e6, 3), '999,990.000') cpu_secs_per_exec, TO_CHAR(ROUND(h.iowait_total / h.executions_total / 1e6, 3), '999,990.000') io_secs_per_exec, TO_CHAR(ROUND(h.clwait_total / h.executions_total / 1e6, 3), '999,990.000') cl_secs_per_exec, TO_CHAR(ROUND(h.apwait_total / h.executions_total / 1e6, 3), '999,990.000') ap_secs_per_exec, TO_CHAR(ROUND(h.ccwait_total / h.executions_total / 1e6, 3), '999,990.000') cc_secs_per_exec, TO_CHAR(ROUND(h.plsexec_time_total / h.executions_total / 1e6, 3), '999,990.000') pl_secs_per_exec, TO_CHAR(ROUND(h.javexec_time_total / h.executions_total / 1e6, 3), '999,990.000') ja_secs_per_exec FROM dba_hist_sqlstat h, dba_hist_snapshot s WHERE h.sql_id = '&sql_id.' AND h.executions_total > 0 AND s.snap_id = h.snap_id AND s.dbid = h.dbid AND s.instance_number = h.instance_number ORDER BY h.sql_id, h.instance_number, s.end_interval_time, h.plan_hash_value / SPO OFF;
Some of you have asked if the “Introducing the eDB360 Tool” session at the Oaktable World 2014 was recorder. Actually it was, and thanks to Kyle Hailey it is now available as well as the slides. Just go to the agenda of this event and click on corresponding link. There you will also find video and/or slides for all other sessions.
Thanks Kyle for making this possible!
The East Cost Oracle Users Group Conference 2014 is next week. To me, ECO is quite special. I am speaking there for my 3rd time!
The ECO group is kind of new (about 4 years old), and it is the clustering of several regional user groups including the Virginia Oracle Users Group, the Eastern States Oracle Applications Users Group, the Hampton Roads Oracle Users Group, and the Southeastern Oracle Users Group.
What I like about ECO is its size: small enough to remain cozy, and large enough to be a good opportunity to have one-on-one conversations with colleagues and friends.
This time at ECO 14, I will be delivering a session on “How a Developer Can Troubleshoot a SQL Performing Poorly on a Production DB” on Tuesday, November 4 at 3:15 PM (see agenda here). I will also co-deliver a 4-hours pre-conference workshop on “Oracle Performance Tuning” on Monday, November 3 at 1:00 PM. I will do this with Mauro Pagano, who is now a regular speaker and he is becoming a blogger. You can read his brand new blog at mauro-pagano.com.
Looking forward to meet old friends at ECO 14, and to make new ones!
Every once in a while it comes to my attention that edb360 takes several hours to run. What can be done? My advice is to let it run for several hours if possible. In most environment it completes in less that 1 hour, but I have seen cases where it may take 5 or 6. The reason is simple: too many SQL statements to execute. And some of those queries are executed on top of large historical sets. The good news is that edb360, as it executes each script, it compresses the output and catalogues it inside the main output ZIP file. So, even if you have to stop edb360 after hours of execution, the output is useful. On top of that, the least relevant collection happens at the end, so within the first hour or so you most probably have the essence of your system. Then, if you find yourself in a situation where edb360 has been in execution for several hours and you decide to kill it, please still use the output ZIP file. Also, within that file there are a couple of logs that can help to determine where exactly it got “stuck” (meaning which query is taking longer in your system). Since we don’t know in advance if edb360 will take more than 1hr to run, the best time to start its execution is at the end of a normal work day, or during the weekend.
How a Developer Can Troubleshoot a SQL Performing Poorly on a Production DB (Nov 4, 2014)
Oracle Performance Tools of the Trade (Nov 4, 2014)
Oracle Performance Tuning Fundamentals (Nov 4, 2014)
Introducing the eDB360 Tool (Sep 30, 2014)
SQLT XPLORE: The SQLT XPLAIN hidden child (Jun 21, 2014)
SQL Tuning Tools of the Trade (Jun 21, 2014)
SQL Tuning made easier with SQLTXPLAIN (SQLT) (Jun 21, 2014)
Using SQL Plan Management (SPM) to balance Plan Flexibility and Plan Stability (Jun 21, 2014)
Understanding How is that Adaptive Cursor Sharing (ACS) produces multiple Optimal Plans (Jun 21, 2014)
Enkitec’s Sizing and Provisioning (eSP) is a new internal tool designed and developed with Oracle Engineered Systems in mind. Thanks to the experience and insights from Randy Johnson, Karl Arao and Frits Hoogland, what began as a pet project for some of us, over time became an actual robust APEX/PLSQL application, developed by Christoph Ruepprich and myself, and ready to debut at Oracle Open World 2014.
This posting is about eSP, what it does, and how it helps on the sizing and provisioning of Oracle Engineered System, or I would rather say, any System where Oracle runs.
We used to size Engineered Systems using a complex and very useful spread sheet developed by Randy Johnson and Karl Arao. Now, it is the turn for eSP to take the next step, and move this effort forward into a more scalable application that sits on top of one of our Exadata machines.
Sizing an Engineered System
Sizing a System can be quite challenging, especially when the current system is composed of several hosts with multiple databases of diverse use, size, versions, workloads, etc. The new target system may also bring some complexities; as the number of possible configurations grows, finding the right choice becomes harder. Then we also have the challenge of disk redundancy, recovery areas, the potential benefits of offloading with their smart scans, just to mention some added complexities.
At a very high level, Sizing a System is about 3 entities: Resources, Capacity and Utilization. Resources define what I call “demand”, which is basically the set of computational resources from your original System made of one or many databases and instances over some hosts. Capacity, which I also call it “supply”, is the set of possible target Systems with their multiple Configurations, in other words Engineered Systems, or any other hardware capable to host Oracle databases. Utilization, which I may also refer as “allocation” is where the magic and challenge resides. It is a clever and unbiassed mapping between databases and configurations, then between instances and nodes. This mapping has to consider at the very least CPU footprint, Memory for SGA and PGA, database disk space, and throughput in terms of IOPS and MBPS. Additional constraints, as mentioned before, include redundancy and offloading among others. CPU can be a bit tricky since each CPU make and model has its own characteristics, so mapping them requires the use of SPEC.
Other challenge a Sizing tool has to consider is the variability of the Resources. The question becomes: Do we see the Resources as a worst case scenario, or shall we rather consider them as time series? In other words, do we compute and use peaks, or do we observe the use of Resources over time, then develop some methods to aggregate them consistently as time series? If we decide to use a reduced set of data points, do we use peaks or percentiles? if the latter, which percentile is well balanced? 99.9, 99, 95 or maybe 90? How conservative are those values? There are so many questions and the answer for most of them, as you may guess is: “it all depends”.
How eSP Works
Without getting into the technical details, I can say that eSP is an APEX application with a repository on an Oracle database, which inputs collected “Requirements” from the databases to be sized, then it processes these Requirements and prepares them to be “Allocated” into one or more defined hardware configurations. The process is for the most part “automated”, meaning this: we execute some tool or script in the set of hosts where the databases reside, then upload the output of these collectors into eSP and we are ready to Plan and apply “what-if” scenarios. Having an Exadata System as our work engine, it allows this eSP application to scale quite well. A “what-if” scenario takes as long as it takes to navigate APEX pages,while all the computations are done in sub-seconds behind scenes, thanks to Exadata!
Once we load the Resources from the eSP collector script, or from the eAdam tool, we can start playing with the metadata. Since eSP’s set of known Configurations (Capacity) include current Engineered Systems (X4), allocating Configurations is a matter of seconds, then mapping databases and instances becomes the next step. eSP contains an auto “allocate” algorithm for databases and instances, where we can choose between a “balanced” allocation or one that is “dense” with several density factors to choose from (100%, 90%, 80%, 70%, 60% and 50%). With all these automated options, we can try multiple sizing and allocation possibilities in seconds, regardless if we are Sizing and Provisioning for one database or a hundred of them.
eSP and OOW
The Enkitec’s Sizing and Provisioning (eSP) tool is an internal application that we created to help our customers to Size their next System or Systems in a sensible manner. The methods we implemented are transparent and unbiassed. We are bringing eSP to Oracle Open World 2014. I will personally demo eSP at our assigned booth, which is #111 at the Moscone South. I will be on and off the booth, so if you are interested on a demo please let me know, or contact your Enkitec/Accenture representative. We do prefer appointments, but walk-ins are welcomed. Hope to see you at OOW!
Oracle Application Express (APEX) is a great tool to rapidly develop applications on top of an Oracle database. While developing an internal application we noticed that some pages were slow, meaning taking a few seconds to refresh. Suspecting there was some poorly performing SQL behind those pages, we tried to generate a SQL Trace so we could review the generated SQL. Well, there is no out-of-the-box instrumentation to turn SQL Trace ON from an APEX page… Thus our challenge became: How can we identify suspected SQL performing poorly, when such SQL is generated by an APEX page?
Active Session History (ASH) requires an Oracle Diagnostics Pack License. If your site has such a License, and you need to identify poorly performing SQL generated by APEX, you may want to use find_apex.sql script below. It asks for an application user and for the APEX session (a list is provided in both cases). It outputs a list of poorly performing SQL indicating the APEX page of origin, the SQL_ID and the SQL text. With the SQL_ID you can use some other tool in order to gather additional diagnostics details, including the Execution Plan. You may want to use for that: planx.sql, sqlmon.sql or sqlash.sql. Note that find_apex.sql script also references sqld360.sql, but this new tool is not yet available, so use one of the other 3 suggestions for the time being (or SQLHC/SQLT).
To find poorly performing SQL, script find_apex.sql uses ASH instead of SQL Trace. If the action on a page takes more than a second, then most probably ASH will capture the poorly performing SQL delaying the page.
---------------------------------------------------------------------------------------- -- -- File name: find_apex.sql -- -- Purpose: Finds APEX poorly performing SQL for a given application user and session -- -- Author: Carlos Sierra -- -- Version: 2014/09/03 -- -- Usage: Inputs APEX application user and session id, and outputs list of poorly -- performing SQL statements for further investigation with other tools. -- -- Example: @find_apex.sql -- -- Notes: Developed and tested on 126.96.36.199. -- -- Requires an Oracle Diagnostics Pack License since ASH data is accessed. -- -- To further investigate poorly performing SQL use sqld360.sql -- (or planx.sql or sqlmon.sql or sqlash.sql). -- --------------------------------------------------------------------------------------- -- WHENEVER SQLERROR EXIT SQL.SQLCODE; ACC confirm_license PROMPT 'Confirm with "Y" that your site has an Oracle Diagnostics Pack License: ' BEGIN IF NOT '&&confirm_license.' = 'Y' THEN RAISE_APPLICATION_ERROR(-20000, 'You must have an Oracle Diagnostics Pack License in order to use this script.'); END IF; END; / WHENEVER SQLERROR CONTINUE; -- COL seconds FOR 999,990; COL appl_user FOR A30; COL min_sample_time FOR A25; COL max_sample_time FOR A25; COL apex_session_id FOR A25; COL page FOR A4; COL sql_text FOR A80; -- SELECT COUNT(*) seconds, SUBSTR(client_id, 1, INSTR(client_id, ':') - 1) appl_user, MIN(sample_time) min_sample_time, MAX(sample_time) max_sample_time FROM gv$active_session_history WHERE module LIKE '%/APEX:APP %' GROUP BY SUBSTR(client_id, 1, INSTR(client_id, ':') - 1) HAVING SUBSTR(client_id, 1, INSTR(client_id, ':') - 1) IS NOT NULL ORDER BY 1 DESC, 2 / -- ACC appl_user PROMPT 'Enter application user: '; -- SELECT MIN(sample_time) min_sample_time, MAX(sample_time) max_sample_time, SUBSTR(client_id, INSTR(client_id, ':') + 1) apex_session_id, COUNT(*) seconds FROM gv$active_session_history WHERE module LIKE '%/APEX:APP %' AND SUBSTR(client_id, 1, INSTR(client_id, ':') - 1) = TRIM('&&appl_user.') GROUP BY SUBSTR(client_id, INSTR(client_id, ':') + 1) ORDER BY 1 DESC / -- ACC apex_session_id PROMPT 'Enter APEX session ID: '; -- SELECT COUNT(*) seconds, SUBSTR(h.module, INSTR(h.module, ':', 1, 2) + 1) page, h.sql_id, SUBSTR(s.sql_text, 1, 80) sql_text FROM gv$active_session_history h, gv$sql s WHERE h.module LIKE '%/APEX:APP %' AND SUBSTR(h.client_id, 1, INSTR(h.client_id, ':') - 1) = TRIM('&&appl_user.') AND SUBSTR(h.client_id, INSTR(h.client_id, ':') + 1) = TRIM('&&apex_session_id.') AND s.sql_id = h.sql_id AND s.inst_id = h.inst_id AND s.child_number = h.sql_child_number GROUP BY SUBSTR(h.module, INSTR(h.module, ':', 1, 2) + 1), h.sql_id, SUBSTR(s.sql_text, 1, 80) ORDER BY 1 DESC, 2, 3 / -- PRO Use sqld360.sql (or planx.sql or sqlmon.sql or sqlash.sql) on SQL_ID of interest
This script as well as some others are now available on GitHub.