Archive for the ‘General’ Category
eDB360 provides a lot of insight about an Oracle database. It executes thousands of SQL statements querying GV$ and DBA views from the Oracle’s data dictionary. It was designed to impose very little load on the system where it executes, thus it consumes only one session and avoids parallel execution. On a system where the state of historical metrics is “normal”, eDB360 may take one or two hours to execute. In the other hand, when the volume of historical metrics is abnormally “large”, then eDB360 may execute for many hours up to default threshold of 24, then quit. No worries here, it can be restarted pretty much where it left…
If you are considering executing eDB360 on one of your databases, before you do please execute first a light-weight “pre-check”, which is basically a small script that reports on the state of Active Session History (ASH) stored into the Automatic Workload Repository (AWR). Bare in mind that ASH and AWR are part of the Oracle Diagnostics Pack, so consider such “pre-check” only if you have a license for such Oracle Pack for the target database.
AWR ASH Pre-Check
You can execute stand-alone the awr_ash_pre_check.sql script available as free software in GitHub within edb360-master or cscripts-master. If downloading eDB360, you can find awr_ash_pre_check.sql under the edb360-master/sql directory, and if you download the cscripts set, then you can find awr_ash_pre_check.sql under cscripts-master/sql.
This pre-check script reports on the following potential concerns:
- Retention period of AHR, frequency of Snapshots, AWR Baselines, and similar.
- Age of CBO Statistics on the main table behind DBA_HIST_ACTIVE_SESS_HISTORY (WRH$_ACTIVE_SESSION_HISTORY) and its partitions.
- State of CBO Statistics for WRH$_ACTIVE_SESSION_HISTORY segments: Are the statistics Locked? Are they Stale?
- Size of WRH$_ACTIVE_SESSION_HISTORY segments, with range of Snapshots per segment (partition). Are they outside the desired retention window?
- Creation date and last DDL date for WRH$_ACTIVE_SESSION_HISTORY segments. So expected partition splitting can be verified.
Take for example result below. It shows that current ASH repository contains 320.6 days of history, even when in this case only 30 were requested (displayed on an upper part of report not shown here). We also see ASH is 16.4 GBs in size, where normally 1-2 GBs are plenty. We can conclude last partition split (or simply last DDL) is 289.7 days old, where normally only a few days-old are expected. This pre-check sample results on a ballpark estimate of 127 hours to execute eDB360. In other words, if we had one month of history (as current AWR configuration requests) instead of 320.6 days of history, then ASH repository would be less than 10% its current size and eDB360 would be estimated to execute in about 13 hours instead of 127. Keep in mind this “pre-check” provides a ballpark estimate, so an actual execution of eDB360 would take more or less the estimated time.
What to do if eDB36o is estimated to run for several days like in sample above? What I recommend is to fix ASH first. This process requires to read and digest at least 3 My Oracle Support (MOS) notes below, but it is not as bad as it sounds. You want to follow this path anyways, so any diagnostics tool that relies on ASH from AWR would benefit of the much needed cleanup.
- WRH$_ACTIVE_SESSION_HISTORY Does Not Get Purged Based Upon the Retention Policy (Doc ID 387914.1)
- Bug 14084247 – ORA-1555 or ORA-12571 Failed AWR purge can lead to continued SYSAUX space use (Doc ID 14084247.8)
- Manually Purge the Optimizer Statistics and AWR Snaphots to Reduce Space Usage of SYSAUX Tablespace (Doc ID 1965061.1)
Diagnosing eDB360 taking long
Let’s say you executed an older version of eDB360 and it took long (newer versions starting on v1620 perform an automatic pre-check, and if execution is estimated to last more than 8 hours, then it pauses and ask for confirmation before executing). If you executed an older version of eDB360, or simply your execution took longer than expected, review next the following files included on the directory from where you executed eDB360 (edb360-master), or stored within the edb360_*.zip file generated by eDB360 as it executed.
If you find one of the SQL statements from eDB360 is taking more than a few minutes to execute, suspect first outdated CBO statistics on the Data Dictionary objects behind such query. Or if volume of data accessed by such query from eDB360 is large, suspect AWR data not getting purged. In either case take action accordingly.
If you cannot make sense of the information provided by these diagnostics files above, then contact eDB360 author at email@example.com.
Keep in mind that execution time for eDB360 really depends on the state of several metrics stored in AWR, where ASH is the most relevant piece but not the only one. Think outdated statistics and outdated metrics!
New tool sqld360 is now available! Mauro Pagano released this cool new tool a few hours ago. And yes, it is free for all!
So, what is sqld360? Well, it is an install-nothing free software that tells you a lot about one SQL statement. And if your site has an Oracle Tuning Pack or Diagnostics Pack license, then you get a lot more from sqld360. Sample snapshot below shows you the dynamic menu for a simple SQL. This sqld360 new tool uses similar techniques than edb360, so it displays information as html, text, csv and in some cases it uses some cool Google charts.
The obvious question is why would I use this sqld360 instead of SQLT or SQLHC? The answer is: you can use them all. What makes sqld360 different to SQLT is that sqld360 installs nothing on the database. And what makes sqld360 different to SQLHC is that sqld360 is available as free software on a GitHub repository, so you don’t have to have a MOS account available. Any Oracle user can download and use sqld360 starting today!
In terms of content, I can say that SQLHC gives you some Observations and sqld360 does not (yet). Besides that, I think sqld360 is superior to SQLHC simply because it is more mature and developed from scratch using newer techniques.
SQLT is a different animal. It provides tons of functionality developed over a decade. This sqld360 is on its first release, but it will certainly grow over time but only in the right directions. From the beginning, sqld360 focuses on what is important for tuners and for those in need to diagnose a SQL misbehaving. So simply put, sqld360 is faster and better focused than SQLT and SQLHC. We can call it “the next generation”.
Why free? The answer is: Why not? When Mauro and I started discussing sqld360, we decided to continue developing cool scripts and tools on our own personal time, on our own hardware, and without taking anything from anyone. So this is kind of our contribution to our Oracle community. I have provided edb360 and Mauro is providing sqld360. Mauro and myself would love to blog more often, but if we have to decide where to put our personal time, we both are more inclined to invest on free tools (after of course our family duties).
It is hard to appreciate with a sample execution from one of our systems, but if you look at chart below you may get to see how a SQL of interest compares to the system load. This kind of chart is helpful when you are trying to understand how a particular SQL affects a given load for a certain time window. Or when you need to documents your findings to business leaders.
I could continue telling more about sqld360, but the best way to appreciate it is by actually trying it on one of your own SQL statements. You can go to Mauro’s blog post about sqld360 and download this tool following instructions there. Then give it a shot, after all you have nothing to loose.
Forgot to mention this: sqld360 as well as edb360 work through a SQL*Plus connection, either on the database server or on a PC client. This is important since quite often we road-warriors and even in-house developers may not have access to OEM or some other persistently-connected tools. Then, with sqld360 and edb360, using a SQL*Plus connection you can extract enough metadata to analyze and comprehend what is happening on the entire database or around one particular SQL, all without having to be persistently-connected to the database. Yes, you can do all your analysis remotely from home! yay!
And, by the way, as of today edb360 calls sqld360 on SQL of interest. So if you are already using edb360 you may want to download version v1504 dated February 15, 2015. This edb360 v1504 contains sqld360.
I hope you enjoy both tools. Life is Good!
As I help a peer with a SQL Tuning engagement, I face the frequent case of: “We do not want to gather CBO schema object statistics because we don’t want our Execution Plans to change”. Well, the bad news is that: not gathering stats only gives you a false sense of safety because your Execution Plans can change anyways. The reason has to do with Predicates referencing values out of range. Typical cases include range of dates, or columns seeded with values out of a sequence (surrogate keys). Most applications use them both. Example: predicate that references last X days of data. Imagine that date column on this predicate actually has statistics with low and high value that are outdated, lets say the high value refers to last time we gather stats (several months old). In such cases, the CBO uses some heuristics starting on 10g, where the cardinality of the Predicate is computed according to range of low/high and how far the value on Predicate is from this low/high range as per the stats. In short, the cardinality changes over time, as the Predicate on the last X days of data changes from one day to the next, and the next, and so on. At some point, the CBO may decide for a different Plan (with lower cost) and the performance of such SQL may indeed change abruptly. Then we scratch our heads and repeat to ourselves: but we did not gather statistics, why did the plan change?
So, if you understand the rationale above, then you would agree with the fact that: not updating CBO schema stats do not offer any real Plan Stability. So, my recommendation is simple: have reasonable CBO statistics and live with the possibility that some Plans will change (they would change anyways, even if you do not gather stats). Keep always in mind this:
The CBO has better chances to produce optimal Plans if we provide reasonable CBO statistics.
Now the good news: if you have some business critical SQL statements and you want them to have stable Plans, then Oracle already provides SQL Plan Management, which is designed exactly for Plan Stability. So, instead of gambling everyday, hoping for your Plans not to change preserving outdated stats, rather face reality, then gather stats, and create SQL Plan Baselines in those few SQL statements that may prove to have an otherwise unstable Plan and are indeed critical for your business. On 10g you can use SQL Profiles instead.
On 10g and 11g, just let the automatic job that gathers CBO schema statistics do its part. In most cases, that is good enough. If you have transient data, for example ETL tasks, then you may want to have the process workflow to gather stats on particular Tables as soon as the data is loaded or transformed and before it is consumed. The trick is simple: “have the stats represent the data at all times”. At the same time, there is no need to over do the stats, just care when the change on the data is sensible.
I upgraded my database a couple of weeks ago and now my users complain their application is slower. They do not provide specifics but they “feel” it is running slower. Sounds familiar?
Every once in a while I get a request that goes like this: “how can I find if some SQL on my database is performing worse over time?”
It is very hard to deal with the ambiguities of some problems like “finding SQL that performs worse or better over time”. But if you simplify the problem and consider for example “Elapsed Time per Execution”, then you can easily produce a script like the one below, which returns a small list of SQL statements that seem to experience either a regression or an improvement over time. It uses linear regression on the ratio between “Elapsed Time per Execution” and its Median per SQL.
Then, If you are suspecting you have some SQL that may have regressed and need a hand to identify them, you can try this script below. It is now part of a small collection of scripts that you can download and use for free out of the cscripts link on the right hand side of this page, under “Downloads”.
---------------------------------------------------------------------------------------- -- -- File name: sql_performance_changed.sql -- -- Purpose: Lists SQL Statements with Elapsed Time per Execution changing over time -- -- Author: Carlos Sierra -- -- Version: 2014/10/31 -- -- Usage: Lists statements that have changed their elapsed time per execution over -- some history. -- Uses the ration between "elapsed time per execution" and the median of -- this metric for SQL statements within the sampled history, and using -- linear regression identifies those that have changed the most. In other -- words where the slope of the linear regression is larger. Positive slopes -- are considered "improving" while negative are "regressing". -- -- Example: @sql_performance_changed.sql -- -- Notes: Developed and tested on 220.127.116.11. -- -- Requires an Oracle Diagnostics Pack License since AWR data is accessed. -- -- To further investigate poorly performing SQL use sqltxplain.sql or sqlhc -- (or planx.sql or sqlmon.sql or sqlash.sql). -- --------------------------------------------------------------------------------------- -- SPO sql_performance_changed.txt; DEF days_of_history_accessed = '31'; DEF captured_at_least_x_times = '10'; DEF captured_at_least_x_days_apart = '5'; DEF med_elap_microsecs_threshold = '1e4'; DEF min_slope_threshold = '0.1'; DEF max_num_rows = '20'; SET lin 200 ver OFF; COL row_n FOR A2 HEA '#'; COL med_secs_per_exec HEA 'Median Secs|Per Exec'; COL std_secs_per_exec HEA 'Std Dev Secs|Per Exec'; COL avg_secs_per_exec HEA 'Avg Secs|Per Exec'; COL min_secs_per_exec HEA 'Min Secs|Per Exec'; COL max_secs_per_exec HEA 'Max Secs|Per Exec'; COL plans FOR 9999; COL sql_text_80 FOR A80; PRO SQL Statements with "Elapsed Time per Execution" changing over time WITH per_time AS ( SELECT h.dbid, h.sql_id, SYSDATE - CAST(s.end_interval_time AS DATE) days_ago, SUM(h.elapsed_time_total) / SUM(h.executions_total) time_per_exec FROM dba_hist_sqlstat h, dba_hist_snapshot s WHERE h.executions_total > 0 AND s.snap_id = h.snap_id AND s.dbid = h.dbid AND s.instance_number = h.instance_number AND CAST(s.end_interval_time AS DATE) > SYSDATE - &&days_of_history_accessed. GROUP BY h.dbid, h.sql_id, SYSDATE - CAST(s.end_interval_time AS DATE) ), avg_time AS ( SELECT dbid, sql_id, MEDIAN(time_per_exec) med_time_per_exec, STDDEV(time_per_exec) std_time_per_exec, AVG(time_per_exec) avg_time_per_exec, MIN(time_per_exec) min_time_per_exec, MAX(time_per_exec) max_time_per_exec FROM per_time GROUP BY dbid, sql_id HAVING COUNT(*) >= &&captured_at_least_x_times. AND MAX(days_ago) - MIN(days_ago) >= &&captured_at_least_x_days_apart. AND MEDIAN(time_per_exec) > &&med_elap_microsecs_threshold. ), time_over_median AS ( SELECT h.dbid, h.sql_id, h.days_ago, (h.time_per_exec / a.med_time_per_exec) time_per_exec_over_med, a.med_time_per_exec, a.std_time_per_exec, a.avg_time_per_exec, a.min_time_per_exec, a.max_time_per_exec FROM per_time h, avg_time a WHERE a.sql_id = h.sql_id ), ranked AS ( SELECT RANK () OVER (ORDER BY ABS(REGR_SLOPE(t.time_per_exec_over_med, t.days_ago)) DESC) rank_num, t.dbid, t.sql_id, CASE WHEN REGR_SLOPE(t.time_per_exec_over_med, t.days_ago) > 0 THEN 'IMPROVING' ELSE 'REGRESSING' END change, ROUND(REGR_SLOPE(t.time_per_exec_over_med, t.days_ago), 3) slope, ROUND(AVG(t.med_time_per_exec)/1e6, 3) med_secs_per_exec, ROUND(AVG(t.std_time_per_exec)/1e6, 3) std_secs_per_exec, ROUND(AVG(t.avg_time_per_exec)/1e6, 3) avg_secs_per_exec, ROUND(MIN(t.min_time_per_exec)/1e6, 3) min_secs_per_exec, ROUND(MAX(t.max_time_per_exec)/1e6, 3) max_secs_per_exec FROM time_over_median t GROUP BY t.dbid, t.sql_id HAVING ABS(REGR_SLOPE(t.time_per_exec_over_med, t.days_ago)) > &&min_slope_threshold. ) SELECT LPAD(ROWNUM, 2) row_n, r.sql_id, r.change, TO_CHAR(r.slope, '990.000MI') slope, TO_CHAR(r.med_secs_per_exec, '999,990.000') med_secs_per_exec, TO_CHAR(r.std_secs_per_exec, '999,990.000') std_secs_per_exec, TO_CHAR(r.avg_secs_per_exec, '999,990.000') avg_secs_per_exec, TO_CHAR(r.min_secs_per_exec, '999,990.000') min_secs_per_exec, TO_CHAR(r.max_secs_per_exec, '999,990.000') max_secs_per_exec, (SELECT COUNT(DISTINCT p.plan_hash_value) FROM dba_hist_sql_plan p WHERE p.dbid = r.dbid AND p.sql_id = r.sql_id) plans, REPLACE((SELECT DBMS_LOB.SUBSTR(s.sql_text, 80) FROM dba_hist_sqltext s WHERE s.dbid = r.dbid AND s.sql_id = r.sql_id), CHR(10)) sql_text_80 FROM ranked r WHERE r.rank_num <= &&max_num_rows. ORDER BY r.rank_num / SPO OFF;
Once you get the output of this script above, you can use the one below to actually list the time series for one of the SQL statements of interest:
---------------------------------------------------------------------------------------- -- -- File name: one_sql_time_series.sql -- -- Purpose: Performance History for one SQL -- -- Author: Carlos Sierra -- -- Version: 2014/10/31 -- -- Usage: Script sql_performance_changed.sql lists SQL Statements with performance -- improvement or regressed over some History. -- This script one_sql_time_series.sql lists the Performance Time Series for -- one SQL. -- -- Parameters: SQL_ID -- -- Example: @one_sql_time_series.sql -- -- Notes: Developed and tested on 18.104.22.168. -- -- Requires an Oracle Diagnostics Pack License since AWR data is accessed. -- -- To further investigate poorly performing SQL use sqltxplain.sql or sqlhc -- (or planx.sql or sqlmon.sql or sqlash.sql). -- --------------------------------------------------------------------------------------- -- SPO one_sql_time_series.txt; SET lin 200 ver OFF; COL instance_number FOR 9999 HEA 'Inst'; COL end_time HEA 'End Time'; COL plan_hash_value HEA 'Plan|Hash Value'; COL executions_total FOR 999,999 HEA 'Execs|Total'; COL rows_per_exec HEA 'Rows Per Exec'; COL et_secs_per_exec HEA 'Elap Secs|Per Exec'; COL cpu_secs_per_exec HEA 'CPU Secs|Per Exec'; COL io_secs_per_exec HEA 'IO Secs|Per Exec'; COL cl_secs_per_exec HEA 'Clus Secs|Per Exec'; COL ap_secs_per_exec HEA 'App Secs|Per Exec'; COL cc_secs_per_exec HEA 'Conc Secs|Per Exec'; COL pl_secs_per_exec HEA 'PLSQL Secs|Per Exec'; COL ja_secs_per_exec HEA 'Java Secs|Per Exec'; SELECT h.instance_number, TO_CHAR(CAST(s.end_interval_time AS DATE), 'YYYY-MM-DD HH24:MI') end_time, h.plan_hash_value, h.executions_total, TO_CHAR(ROUND(h.rows_processed_total / h.executions_total), '999,999,999,999') rows_per_exec, TO_CHAR(ROUND(h.elapsed_time_total / h.executions_total / 1e6, 3), '999,990.000') et_secs_per_exec, TO_CHAR(ROUND(h.cpu_time_total / h.executions_total / 1e6, 3), '999,990.000') cpu_secs_per_exec, TO_CHAR(ROUND(h.iowait_total / h.executions_total / 1e6, 3), '999,990.000') io_secs_per_exec, TO_CHAR(ROUND(h.clwait_total / h.executions_total / 1e6, 3), '999,990.000') cl_secs_per_exec, TO_CHAR(ROUND(h.apwait_total / h.executions_total / 1e6, 3), '999,990.000') ap_secs_per_exec, TO_CHAR(ROUND(h.ccwait_total / h.executions_total / 1e6, 3), '999,990.000') cc_secs_per_exec, TO_CHAR(ROUND(h.plsexec_time_total / h.executions_total / 1e6, 3), '999,990.000') pl_secs_per_exec, TO_CHAR(ROUND(h.javexec_time_total / h.executions_total / 1e6, 3), '999,990.000') ja_secs_per_exec FROM dba_hist_sqlstat h, dba_hist_snapshot s WHERE h.sql_id = '&sql_id.' AND h.executions_total > 0 AND s.snap_id = h.snap_id AND s.dbid = h.dbid AND s.instance_number = h.instance_number ORDER BY h.sql_id, h.instance_number, s.end_interval_time, h.plan_hash_value / SPO OFF;
Enkitec’s Sizing and Provisioning (eSP) is a new internal tool designed and developed with Oracle Engineered Systems in mind. Thanks to the experience and insights from Randy Johnson, Karl Arao and Frits Hoogland, what began as a pet project for some of us, over time became an actual robust APEX/PLSQL application, developed by Christoph Ruepprich and myself, and ready to debut at Oracle Open World 2014.
This posting is about eSP, what it does, and how it helps on the sizing and provisioning of Oracle Engineered System, or I would rather say, any System where Oracle runs.
We used to size Engineered Systems using a complex and very useful spread sheet developed by Randy Johnson and Karl Arao. Now, it is the turn for eSP to take the next step, and move this effort forward into a more scalable application that sits on top of one of our Exadata machines.
Sizing an Engineered System
Sizing a System can be quite challenging, especially when the current system is composed of several hosts with multiple databases of diverse use, size, versions, workloads, etc. The new target system may also bring some complexities; as the number of possible configurations grows, finding the right choice becomes harder. Then we also have the challenge of disk redundancy, recovery areas, the potential benefits of offloading with their smart scans, just to mention some added complexities.
At a very high level, Sizing a System is about 3 entities: Resources, Capacity and Utilization. Resources define what I call “demand”, which is basically the set of computational resources from your original System made of one or many databases and instances over some hosts. Capacity, which I also call it “supply”, is the set of possible target Systems with their multiple Configurations, in other words Engineered Systems, or any other hardware capable to host Oracle databases. Utilization, which I may also refer as “allocation” is where the magic and challenge resides. It is a clever and unbiassed mapping between databases and configurations, then between instances and nodes. This mapping has to consider at the very least CPU footprint, Memory for SGA and PGA, database disk space, and throughput in terms of IOPS and MBPS. Additional constraints, as mentioned before, include redundancy and offloading among others. CPU can be a bit tricky since each CPU make and model has its own characteristics, so mapping them requires the use of SPEC.
Other challenge a Sizing tool has to consider is the variability of the Resources. The question becomes: Do we see the Resources as a worst case scenario, or shall we rather consider them as time series? In other words, do we compute and use peaks, or do we observe the use of Resources over time, then develop some methods to aggregate them consistently as time series? If we decide to use a reduced set of data points, do we use peaks or percentiles? if the latter, which percentile is well balanced? 99.9, 99, 95 or maybe 90? How conservative are those values? There are so many questions and the answer for most of them, as you may guess is: “it all depends”.
How eSP Works
Without getting into the technical details, I can say that eSP is an APEX application with a repository on an Oracle database, which inputs collected “Requirements” from the databases to be sized, then it processes these Requirements and prepares them to be “Allocated” into one or more defined hardware configurations. The process is for the most part “automated”, meaning this: we execute some tool or script in the set of hosts where the databases reside, then upload the output of these collectors into eSP and we are ready to Plan and apply “what-if” scenarios. Having an Exadata System as our work engine, it allows this eSP application to scale quite well. A “what-if” scenario takes as long as it takes to navigate APEX pages,while all the computations are done in sub-seconds behind scenes, thanks to Exadata!
Once we load the Resources from the eSP collector script, or from the eAdam tool, we can start playing with the metadata. Since eSP’s set of known Configurations (Capacity) include current Engineered Systems (X4), allocating Configurations is a matter of seconds, then mapping databases and instances becomes the next step. eSP contains an auto “allocate” algorithm for databases and instances, where we can choose between a “balanced” allocation or one that is “dense” with several density factors to choose from (100%, 90%, 80%, 70%, 60% and 50%). With all these automated options, we can try multiple sizing and allocation possibilities in seconds, regardless if we are Sizing and Provisioning for one database or a hundred of them.
eSP and OOW
The Enkitec’s Sizing and Provisioning (eSP) tool is an internal application that we created to help our customers to Size their next System or Systems in a sensible manner. The methods we implemented are transparent and unbiassed. We are bringing eSP to Oracle Open World 2014. I will personally demo eSP at our assigned booth, which is #111 at the Moscone South. I will be on and off the booth, so if you are interested on a demo please let me know, or contact your Enkitec/Accenture representative. We do prefer appointments, but walk-ins are welcomed. Hope to see you at OOW!