To make your life a bit easier, a few configuration tasks have already been done for you:
Installation of Oracle Instant Client
oracle-instantclient11.2-sqlplus-11.2.0.4.0-1.x86_64 oracle-lib-compat-11.2.0.12-1.el6.x86_64 oracle-instantclient-sqlplus-selinux-11.2.0.2-1.el6.noarch perl-DBD-Oracle-1.62-3.el6.x86_64 oracle-nofcontext-selinux-0.1.23.36-1.el6.noarch nagios-plugins-oracle-1.4.16-10.el6.x86_64 oracle-instantclient11.2-basic-11.2.0.4.0-1.x86_64 oracle-instantclient-selinux-11.2.0.2-1.el6.noarch |
We suppose here that the Oracle machine you want to monitor is named srv-lin-1 and is a linux. Please change the above lines and commands with the real name of your server of course. |
You will need to configure the user for all your oracle databases. |
Connect to your database as sysadmin on the oracle server :
srv-lin-1:oracle# sqlplus "/ as sysdba" |
And then create your shinken account on the database:
CREATE USER shinken IDENTIFIED BY shinkenpassword;GRANT CREATE SESSION TO shinken; GRANT SELECT any dictionary TO shinken; GRANT SELECT ON V_$SYSSTAT TO shinken; GRANT SELECT ON V_$INSTANCE TO shinken; GRANT SELECT ON V_$LOG TO shinken; GRANT SELECT ON SYS.DBA_DATA_FILES TO shinken; GRANT SELECT ON SYS.DBA_FREE_SPACE TO shinken; |
And for old 8.1.7 database only:
---- if somebody still uses Oracle 8.1.7... GRANT SELECT ON sys.dba_tablespaces TO shinken; GRANT SELECT ON dba_temp_files TO shinken; GRANT SELECT ON sys.v_$Temp_extent_pool TO shinken; GRANT SELECT ON sys.v_$TEMP_SPACE_HEADER TO shinken; GRANT SELECT ON sys.v_$session TO shinken; |
To see if the connection to the database named PROD is ok, just launch :
/var/lib/shinken/libexec/check_oracle_health --connect "(DESCRIPTION =(ADDRESS_LIST =(ADDRESS = (PROTOCOL = TCP)(HOST = srv-lin-1)(PORT = 1521)))(CONNECT_DATA =(SID = PROD)))" --user "shinken" --password "shinkenpassword" --mode connection-time |
The parameters common to every Oracle server monitored shoud be setted in the host template "oracle-connection-method".
The parameters specifics to each server, such as database name for example, have to be done in the host's data itself.
It's a best practice to use the same database monitoring user on every Oracle server monitored. Doing so, you can configure the database user/password only once in the host template "oracle-connection-method". |
| Check | Detail | Check range | Default Warning | Default Critical |
|---|---|---|---|---|
| tnsping | listener | |||
| process-usage | Percentage of max possible processes | 0% to 100% | 80 | 90 |
| sga-shared-pool-free | Free Memory in the Shared Pool | 0% to 100% | 10: | 5: |
| tablespace-usage | Used diskspace in the tablespace | 0% to 100% | 90 | 98 |
| corrupted-blocks | XXXXXXXXXXX |
session-usage: Percentage of max possible sessions 0%..100% (80, 90)
rman-backup-problems: Number of RMAN-errors during the last three days 0..n (1, 2)
sga-data-buffer-hit-ratio: Hitrate in the Data Buffer Cache 0%..100% (98:, 95:)
sga-library-cache-gethit-ratio: Hitrate in the Library Cache (Gets) 0%..100% (98:, 95:)
sga-library-cache-pinhit-ratio: Hitrate in the Library Cache (Pins) 0%..100% (98:, 95:)
sga-library-cache-reloads: Reload-Rate in the Library Cache n/sec (10,10)
sga-dictionary-cache-hit-ratio: Hitrate in the Dictionary Cache 0%..100% (95:, 90:)
sga-latches-hit-ratio: Hitrate of the Latches 0%..100% (98:, 95:)
sga-shared-pool-reloads: Reload-Rate in the Shared Pool 0%..100% (1, 10)
pga-in-memory-sort-ratio: Percentage of sorts in the memory. 0%..100% (99:, 90:)
invalid-objects: Sum of faulty Objects, Indices, Partitions
stale-statistics: Sum of objects with obsolete optimizer statistics n (10, 100)
tablespace-free: Free diskspace in the tablespace 0%..100% (5:, 2:)
tablespace-fragmentation: Free Space Fragmentation Index 100..1 (30:, 20:)
tablespace-io-balanc: IO-Distribution under the datafiles of a tablespace n (1.0, 2.0)
tablespace-remaining-time: Sum of remaining days until a tablespace is used by 100%. The rate of increase will be calculated with the values from the last 30 days. (With the parameter –lookback different periods can be specified) Days (90:, 30:)
tablespace-can-allocate-next: Checks if there is enough free tablespace for the next Extent.
flash-recovery-area-usage: Used diskspace in the flash recovery area 0%..100% (90, 98)
flash-recovery-area-free: Free diskspace in the flash recovery area 0%..100% (5:, 2:)
soft-parse-ratio: Percentage of soft-parse-ratio 0%..100%
switch-interval: Interval between RedoLog File Switches 0..n Seconds (600:, 60:)
retry-ratio: Retry-Rate in the RedoLog Buffer 0%..100% (1, 10)
redo-io-traffic: Redolog IO in MB/sec n/sec (199,200)
roll-header-contention: Rollback Segment Header Contention 0%..100% (1, 2)
roll-block-contention: Rollback Segment Block Contention 0%..100% (1, 2)
roll-hit-ratio: Rollback Segment gets/waits Ratio 0%..100% (99:, 98:)
roll-extends: Rollback Segment Extends n, n/sec (1, 100)
roll-wraps: Rollback Segment Wraps n, n/sec (1, 100)
seg-top10-logical-reads: Sum of the userprocesses under the top 10 logical reads n (1, 9)
seg-top10-physical-reads: Sum of the userprocesses under the top 10 physical reads n (1, 9)
seg-top10-buffer-busy-waits: Sum of the userprocesses under the top 10 buffer busy waits n (1, 9)
seg-top10-row-lock-waits: Sum of the userprocesses under the top 10 row lock waits n (1, 9)
event-waits: Waits/sec from system events n/sec (10,100)
event-waiting: How many percent of the elapsed time has an event spend with waiting 0%..100% (0.1,0.5)
enqueue-contention: Enqueue wait/request-Ratio 0%..100% (1, 10)
enqueue-waiting: How many percent of the elapsed time since the last run has an Enqueue spend with waiting 0%..100% (0.00033,0.0033)
latch-contention: Latch misses/gets-ratio. With –name a Latchname or Latchnumber can be passed over. (See list-latches) 0%..100% (1,2)
latch-waiting: How many percent of the elapsed time since the last run has a Latch spend with waiting 0%..100% (0.1,1)
sysstat: Changes/sec for any value from v$sysstat n/sec (10,10)