bstract
This document describes how you can monitor an Oracle database server such as:
- Connection time
- A recent restart
- The number of connections
- Cache hit
- Dead locks
- etc ...
Introduction
Steps
There are some steps you’ll need to follow in order to monitor a new database machine. They are:
- Setup the oracle user account
- Update your server host definition for oracle monitoring
What’s Already Done For You
To make your life a bit easier, a few configuration tasks have already been done for you:
Installation of Oracle Instant Client
Code Block oracle-instantclient11.2-sqlplus-11.2.0.4.0-1.x86_64 oracle-lib-compat-11.2.0.12-1.el6.x86_64 oracle-instantclient-sqlplus-selinux-11.2.0.2-1.el6.noarch perl-DBD-Oracle-1.62-3.el6.x86_64 oracle-nofcontext-selinux-0.1.23.36-1.el6.noarch nagios-plugins-oracle-1.4.16-10.el6.x86_64 oracle-instantclient11.2-basic-11.2.0.4.0-1.x86_64 oracle-instantclient-selinux-11.2.0.2-1.el6.noarch
- Installation of check_oracle_health plugin
- /var/lib/shinken/libexec/check_oracle_health
| Tip |
|---|
We suppose here that the Oracle machine you want to monitor is named srv-lin-1 and is a linux. Please change the above lines and commands with the real name of your server of course. |
Setup the oracle user account
Create the database user
| Tip |
|---|
You will need to configure the user for all your oracle databases. |
Connect to your database as sysadmin on the oracle server :
| Code Block |
|---|
srv-lin-1:oracle# sqlplus "/ as sysdba" |
And then create your shinken account on the database:
| Code Block |
|---|
CREATE USER shinken IDENTIFIED BY shinkenpassword;GRANT CREATE SESSION TO shinken; GRANT SELECT any dictionary TO shinken; GRANT SELECT ON V_$SYSSTAT TO shinken; GRANT SELECT ON V_$INSTANCE TO shinken; GRANT SELECT ON V_$LOG TO shinken; GRANT SELECT ON SYS.DBA_DATA_FILES TO shinken; GRANT SELECT ON SYS.DBA_FREE_SPACE TO shinken; |
And for old 8.1.7 database only:
| Code Block |
|---|
---- if somebody still uses Oracle 8.1.7... GRANT SELECT ON sys.dba_tablespaces TO shinken; GRANT SELECT ON dba_temp_files TO shinken; GRANT SELECT ON sys.v_$Temp_extent_pool TO shinken; GRANT SELECT ON sys.v_$TEMP_SPACE_HEADER TO shinken; GRANT SELECT ON sys.v_$session TO shinken; |
Test the connection
To see if the connection to the database named PROD is ok, just launch :
| Code Block |
|---|
/var/lib/shinken/libexec/check_oracle_health --connect "(DESCRIPTION =(ADDRESS_LIST =(ADDRESS = (PROTOCOL = TCP)(HOST = srv-lin-1)(PORT = 1521)))(CONNECT_DATA =(SID = PROD)))" --user "shinken" --password "shinkenpassword" --mode connection-time |
Configuring host template and host for monitoring
The parameters common to every Oracle server monitored shoud be setted in the host template "oracle-connection-method".
The parameters specifics to each server, such as database name for example, have to be done in the host's data itself.
| Tip |
|---|
It's a best practice to use the same database monitoring user on every Oracle server monitored. Doing so, you can configure the database user/password only once in the host template "oracle-connection-method". |
Configuration of Oracle-connection-method host template
- select the Oracle-connection-method host template
- Clic on "Data" tab
- Type the oracle user in the ORACLE_USER field (shinken in the previous example)
- Type the oracle password in the ORACLE_USER field (shinkenpassword in the previous example)
Configuration of a host
- Add the Oracle choosen host template to the Host Templates to inherit field
- Clic on "Data" tab
- Look for the field DATABASES in the "From templates" datas
- Type the name of the database to be monitoring
| Tip |
|---|
You can monitor multiple database on the same host in typing the name of the database separated by a comma. Exemple : DB1,DB2,DB3 |
What is checked with the templates
Oracle
| Check | Detail | Check range | Default Warning | Default Critical |
|---|---|---|---|---|
| tnsping | listener | |||
| process-usage | Percentage of max possible processes | 0% to 100% | 80 | 90 |
| sga-shared-pool-free | Free Memory in the Shared Pool | 0% to 100% | 10: | 5: |
| tablespace-usage | Used diskspace in the tablespace | 0% to 100% | 90 | 98 |
| corrupted-blocks | Number of corrupted blocks in database | 0 to n | 1 | 10 |
| invalid-objects | Sum of faulty Objects, Indices, Partitions | 0 to n | 0.1 | 0.1 |
Oracle-datafile
| Check | Detail | Check range | Default Warning | Default Critical |
|---|---|---|---|---|
| datafile-io-traffic | Sum of IO-Operationes from Datafiles per second | n/sec | 1000 | 5000 |
| datafiles-existing | Percentage of max possible datafiles | 0% to 100% | 80 | 90 |
Oracle-enqueue
| Check | Detail | Check range | Default Warning | Default Critical |
|---|---|---|---|---|
| enqueue-contention | Enqueue wait/request-Ratio | 0% to 100% | 1 | 10 |
| enqueue-waiting | How many percent of the elapsed time since the last run has an Enqueue spend with waiting | 0% to 100% | 0.00033 | 0.00033 |
Oracle-events
| Check | Detail | Check range | Default Warning | Default Critical |
|---|---|---|---|---|
| event-waits | Waits/sec from system events | n/sec | 10100 |
| 100 |
| event-waiting |
| How many percent of the elapsed time has an event spend with waiting | 0% |
| to 100% |
| 0.1 |
| 0.5 |
Oracle-flash-recovery-area
| Check | Detail | Check range | Default Warning | Default Critical | ||||
|---|---|---|---|---|---|---|---|---|
| flash-recovery-area-usage |
| Used diskspace in the flash recovery area | 0% |
| to 100% |
| 90 |
| 98 |
Oracle-latch
| Check | Detail | Check range | Default Warning | Default Critical |
|---|---|---|---|---|
| latch-contention |
| Latch misses/gets-ratio. With –name a Latchname or Latchnumber can be passed over. (See list-latches) | 0% |
| to 100% |
| 1 |
| 2 |
| latch-waiting |
| How many percent of the elapsed time since the last run has a Latch spend with waiting | 0% |
| to 100% |
| 0.1 |
| 1 |
Oracle-pga
| Check | Detail | Check range | Default Warning | Default Critical |
|---|---|---|---|---|
| pga-in-memory-sort-ratio |
| Percentage of sorts in the memory |
| 0% |
| to 100% | 99: | 90: |
Oracle-redo
| Check | Detail | Check range | Default Warning | Default Critical | ||||
|---|---|---|---|---|---|---|---|---|
| switch-interval |
| Interval between RedoLog File Switches | 0 |
| to n |
| 600: |
| 60: |
| retry-ratio |
| Retry-Rate in the RedoLog Buffer | 0% |
| to 100% |
| 1 |
| 10 |
| redo-io-traffic |
| Redolog IO in MB/sec | n/sec |
| 199 |
| 200 |
Oracle-rman
| Check | Detail | Check range | Default Warning | Default Critical | ||||
|---|---|---|---|---|---|---|---|---|
| rman-backup-problems |
| Number of RMAN-errors during the last three days | 0 |
| to n | 1 | 2 |
Oracle-rollback-segment
| Check | Detail | Check range | Default Warning | Default Critical | ||||
|---|---|---|---|---|---|---|---|---|
| roll-header-contention |
| Rollback Segment Header Contention | 0% |
| to 100% | 1 | 2 |
| roll-block-contention |
| Rollback Segment Block Contention | 0% |
| to 100% |
| 1 |
| 2 |
| roll-hit-ratio |
| Rollback Segment gets/waits Ratio | 0% |
| to 100% |
| 99: |
| 98: |
| roll-extends |
| Rollback Segment Extends n |
| n/sec |
| 1 |
| 100 |
| roll-wraps |
| Rollback Segment Wraps n |
| n/sec |
| 1 |
| 100 |
Oracle-sga
| Check | Detail | Check range | Default Warning | Default Critical | ||||
|---|---|---|---|---|---|---|---|---|
| sga-data-buffer-hit-ratio |
| Hitrate in the Data Buffer Cache | 0% |
| to 100% | 98: | 95: |
| sga-library-cache-gethit-ratio |
| Hitrate in the Library Cache (Gets) | 0% |
| to 100% |
| 98: |
| 95: |
| sga-library-cache-pinhit-ratio |
| Hitrate in the Library Cache (Pins) | 0% |
| to 100% |
| 98: |
| 95: |
| sga-library-cache-reloads |
| Reload-Rate in the Library Cache | n/sec |
| 10 |
| 10 |
| sga-dictionary-cache-hit-ratio |
| Hitrate in the Dictionary Cache | 0% |
| to 100% |
| 95: |
| 90: |
| sga-latches-hit-ratio |
| Hitrate of the Latches | 0% |
| to 100% |
| 98: |
| 95: |
| sga-shared-pool-reloads |
| Reload-Rate in the Shared Pool | 0% |
| to 100% |
| 1 |
| 10 |
Oracle-soft-parse
| Check | Detail | Check range | Default Warning | Default Critical | ||||
|---|---|---|---|---|---|---|---|---|
| soft-parse-ratio |
| Percentage of soft-parse-ratio | 0% |
| to 100% | 90: | 98: |
Oracle-stale-statistics
| Check | Detail | Check range | Default Warning | Default Critical |
|---|---|---|---|---|
| stale-statistics |
| Sum of objects with obsolete optimizer statistics |
| n | 10 | 100 |
Oracle-sysstat
| Check | Detail | Check range | Default Warning | Default Critical |
|---|---|---|---|---|
| sysstat | Changes/sec for any value from |
| v$sysstat | n/sec |
| 10 | xx |
Oracle-tablespace
| Check | Detail | Check range | Default Warning | Default Critical | ||||
|---|---|---|---|---|---|---|---|---|
| tablespace-fragmentation |
| Free Space Fragmentation Index | 100 |
| to 1 |
| 30: |
| 20: |
| tablespace-can-allocate-next |
| Checks if there is enough free tablespace for the next Extent |
| tablespace-remaining-time |
| Sum of remaining days until a tablespace is used by 100%. The rate of increase will be calculated with the values from the last 30 days. (With the parameter –lookback different periods can be specified) | Days |
| 90: |
| 30: |
Oracle-top10-stats
| Check | Detail | Check range | Default Warning | Default Critical |
|---|---|---|---|---|
| seg-top10-logical-reads |
| Sum of the userprocesses under the top 10 logical reads | n |
| 1 |
| 9 |
| seg-top10-physical-reads |
| Sum of the userprocesses under the top 10 physical reads | n |
| 1 |
| 9 |
| seg-top10-buffer-busy-waits |
| Sum of the userprocesses under the top 10 buffer busy waits | n |
| 1 |
| 9 |
| seg-top10-row-lock-waits |
| Sum of the userprocesses under the top 10 row lock waits | n |
| 1 |
| 9 |
Oracle-usage
| Check | Detail | Check range | Default Warning | Default Critical |
|---|---|---|---|---|
| session-usage | Percentage of max possible sessions | 0% to 100% | 80 | 90 |
| Connected users | Number of currently connected users | 0 to n | 50 | 100 |
Oracle-full
...