Page History

Abstract

This document describes how you can monitor an Oracle database server such as:

Connection time
A recent restart
The number of connections
Cache hit
Dead locks
etc ...

Introduction

Steps

There are some steps you’ll need to follow in order to monitor a new database machine. They are:

Setup the oracle user account
Update your server host definition for oracle monitoring

What’s Already Done For You

To make your life a bit easier, a few configuration tasks have already been done for you:

Installation of Oracle Instant Client

Code Block

oracle-instantclient11.2-sqlplus-11.2.0.4.0-1.x86_64
oracle-lib-compat-11.2.0.12-1.el6.x86_64
oracle-instantclient-sqlplus-selinux-11.2.0.2-1.el6.noarch
perl-DBD-Oracle-1.62-3.el6.x86_64
oracle-nofcontext-selinux-0.1.23.36-1.el6.noarch
nagios-plugins-oracle-1.4.16-10.el6.x86_64
oracle-instantclient11.2-basic-11.2.0.4.0-1.x86_64
oracle-instantclient-selinux-11.2.0.2-1.el6.noarch

Installation of check_oracle_health plugin
- /var/lib/shinken/libexec/check_oracle_health

Tip
We suppose here that the Oracle machine you want to monitor is named srv-lin-1 and is a linux. Please change the above lines and commands with the real name of your server of course.

Setup the oracle user account

Create the database user

Tip
You will need to configure the user for all your oracle databases.

Connect to your database as sysadmin on the oracle server :

Code Block
srv-lin-1:oracle# sqlplus "/ as sysdba"

And then create your shinken account on the database:

Code Block

CREATE USER shinken IDENTIFIED BY shinkenpassword;GRANT CREATE SESSION TO shinken;
GRANT SELECT any dictionary TO shinken;
GRANT SELECT ON V_$SYSSTAT TO shinken;
GRANT SELECT ON V_$INSTANCE TO shinken;
GRANT SELECT ON V_$LOG TO shinken;
GRANT SELECT ON SYS.DBA_DATA_FILES TO shinken;
GRANT SELECT ON SYS.DBA_FREE_SPACE TO shinken;

And for old 8.1.7 database only:

Code Block

---- if somebody still uses Oracle 8.1.7...
GRANT SELECT ON sys.dba_tablespaces TO shinken;
GRANT SELECT ON dba_temp_files TO shinken;
GRANT SELECT ON sys.v_$Temp_extent_pool TO shinken;
GRANT SELECT ON sys.v_$TEMP_SPACE_HEADER TO shinken;
GRANT SELECT ON sys.v_$session TO shinken;

Test the connection

To see if the connection to the database named PROD is ok, just launch :

Code Block

/var/lib/shinken/libexec/check_oracle_health --connect "(DESCRIPTION =(ADDRESS_LIST =(ADDRESS = (PROTOCOL = TCP)(HOST = srv-lin-1)(PORT = 1521)))(CONNECT_DATA =(SID = PROD)))" --user "shinken" --password "shinkenpassword" --mode connection-time

Configuring host template and host for monitoring

The parameters common to every Oracle server monitored shoud be setted in the host template "oracle-connection-method".

The parameters specifics to each server, such as database name for example, have to be done in the host's data itself.

Tip
It's a best practice to use the same database monitoring user on every Oracle server monitored. Doing so, you can configure the database user/password only once in the host template "oracle-connection-method".

Configuration of Oracle-connection-method host template

select the Oracle-connection-method host template
Clic on "Data" tab
Type the oracle user in the ORACLE_USER field (shinken in the previous example)
Type the oracle password in the ORACLE_USER field (shinkenpassword in the previous example)

Configuration of a host

Add the Oracle choosen host template to the Host Templates to inherit field
Clic on "Data" tab
Look for the field DATABASES in the "From templates" datas
Type the name of the database to be monitoring

Tip

You can monitor multiple database on the same host in typing the name of the database separated by a comma.

Exemple : DB1,DB2,DB3

What is checked with the templates

Oracle

Check	Detail	Check range	Default Warning	Default Critical
tnsping	listener
process-usage	Percentage of max possible processes	0% to 100%	80	90
sga-shared-pool-free	Free Memory in the Shared Pool	0% to 100%	10:	5:
tablespace-usage	Used diskspace in the tablespace	0% to 100%	90	98
corrupted-blocks	Number of corrupted blocks in database	0 to n	1	10

Oracle-datafile

Check	Detail	Check range	Default Warning	Default Critical
datafile-io-traffic	Sum of IO-Operationes from Datafiles per second	n/sec	1000	5000
datafiles-existing	Percentage of max possible datafiles	0% to 100%	80	90

Oracle-enqueue

Oracle-events

Oracle-flash-recovery-area

Oracle-latch

Oracle-pga

Oracle-redo

Oracle-rman

Oracle-rollback-segment

Oracle-sga

Oracle-soft-parse

Oracle-stale-statistics

Oracle-sysstat

Oracle-tablespace

Oracle-top10-stats

Oracle-usage

Check	Detail	Check range	Default Warning	Default Critical
datafile-io-traffic	Sum of IO-Operationes from Datafiles per second	n/sec	1000	5000
datafiles-existing	Percentage of max possible datafiles	0% to 100%	80	90

session-usage: Percentage of max possible sessions 0%..100% (80, 90)

rman-backup-problems: Number of RMAN-errors during the last three days 0..n (1, 2)

sga-data-buffer-hit-ratio: Hitrate in the Data Buffer Cache 0%..100% (98:, 95:)
sga-library-cache-gethit-ratio: Hitrate in the Library Cache (Gets) 0%..100% (98:, 95:)
sga-library-cache-pinhit-ratio: Hitrate in the Library Cache (Pins) 0%..100% (98:, 95:)
sga-library-cache-reloads: Reload-Rate in the Library Cache n/sec (10,10)
sga-dictionary-cache-hit-ratio: Hitrate in the Dictionary Cache 0%..100% (95:, 90:)
sga-latches-hit-ratio: Hitrate of the Latches 0%..100% (98:, 95:)
sga-shared-pool-reloads: Reload-Rate in the Shared Pool 0%..100% (1, 10)

pga-in-memory-sort-ratio: Percentage of sorts in the memory. 0%..100% (99:, 90:)
invalid-objects: Sum of faulty Objects, Indices, Partitions
stale-statistics: Sum of objects with obsolete optimizer statistics n (10, 100)

tablespace-free: Free diskspace in the tablespace 0%..100% (5:, 2:)
tablespace-fragmentation: Free Space Fragmentation Index 100..1 (30:, 20:)
tablespace-io-balanc: IO-Distribution under the datafiles of a tablespace n (1.0, 2.0)
tablespace-remaining-time: Sum of remaining days until a tablespace is used by 100%. The rate of increase will be calculated with the values from the last 30 days. (With the parameter –lookback different periods can be specified) Days (90:, 30:)
tablespace-can-allocate-next: Checks if there is enough free tablespace for the next Extent.
flash-recovery-area-usage: Used diskspace in the flash recovery area 0%..100% (90, 98)
flash-recovery-area-free: Free diskspace in the flash recovery area 0%..100% (5:, 2:)
soft-parse-ratio: Percentage of soft-parse-ratio 0%..100%
switch-interval: Interval between RedoLog File Switches 0..n Seconds (600:, 60:)
retry-ratio: Retry-Rate in the RedoLog Buffer 0%..100% (1, 10)
redo-io-traffic: Redolog IO in MB/sec n/sec (199,200)
roll-header-contention: Rollback Segment Header Contention 0%..100% (1, 2)
roll-block-contention: Rollback Segment Block Contention 0%..100% (1, 2)
roll-hit-ratio: Rollback Segment gets/waits Ratio 0%..100% (99:, 98:)
roll-extends: Rollback Segment Extends n, n/sec (1, 100)
roll-wraps: Rollback Segment Wraps n, n/sec (1, 100)
seg-top10-logical-reads: Sum of the userprocesses under the top 10 logical reads n (1, 9)
seg-top10-physical-reads: Sum of the userprocesses under the top 10 physical reads n (1, 9)
seg-top10-buffer-busy-waits: Sum of the userprocesses under the top 10 buffer busy waits n (1, 9)
seg-top10-row-lock-waits: Sum of the userprocesses under the top 10 row lock waits n (1, 9)
event-waits: Waits/sec from system events n/sec (10,100)
event-waiting: How many percent of the elapsed time has an event spend with waiting 0%..100% (0.1,0.5)
enqueue-contention: Enqueue wait/request-Ratio 0%..100% (1, 10)
enqueue-waiting: How many percent of the elapsed time since the last run has an Enqueue spend with waiting 0%..100% (0.00033,0.0033)
latch-contention: Latch misses/gets-ratio. With –name a Latchname or Latchnumber can be passed over. (See list-latches) 0%..100% (1,2)
latch-waiting: How many percent of the elapsed time since the last run has a Latch spend with waiting 0%..100% (0.1,1)
sysstat: Changes/sec for any value from v$sysstat n/sec (10,10)

Oracle-full

Setup the oracle user account

Create the database user

Tip
You will need to configure the user for all your oracle databases.

Connect to your database as sysadmin on the oracle server :

Code Block
srv-lin-1:oracle# sqlplus "/ as sysdba"

And then create your shinken account on the database:

Code Block

CREATE USER shinken IDENTIFIED BY shinkenpassword;GRANT CREATE SESSION TO shinken;
GRANT SELECT any dictionary TO shinken;
GRANT SELECT ON V_$SYSSTAT TO shinken;
GRANT SELECT ON V_$INSTANCE TO shinken;
GRANT SELECT ON V_$LOG TO shinken;
GRANT SELECT ON SYS.DBA_DATA_FILES TO shinken;
GRANT SELECT ON SYS.DBA_FREE_SPACE TO shinken;

And for old 8.1.7 database only:

Code Block

---- if somebody still uses Oracle 8.1.7...
GRANT SELECT ON sys.dba_tablespaces TO shinken;
GRANT SELECT ON dba_temp_files TO shinken;
GRANT SELECT ON sys.v_$Temp_extent_pool TO shinken;
GRANT SELECT ON sys.v_$TEMP_SPACE_HEADER TO shinken;
GRANT SELECT ON sys.v_$session TO shinken;

Test the connection

To see if the connection to the database named PROD is ok, just launch :

Code Block

/var/lib/shinken/libexec/check_oracle_health --connect "(DESCRIPTION =(ADDRESS_LIST =(ADDRESS = (PROTOCOL = TCP)(HOST = srv-lin-1)(PORT = 1521)))(CONNECT_DATA =(SID = PROD)))" --user "shinken" --password "shinkenpassword" --mode connection-time

Configuring host template and host for monitoring

The parameters common to every Oracle server monitored shoud be setted in the host template "oracle-connection-method".

The parameters specifics to each server, such as database name for example, have to be done in the host's data itself.

Tip
It's a best practice to use the same database monitoring user on every Oracle server monitored. Doing so, you can configure the database user/password only once in the host template "oracle-connection-method".

Configuration of Oracle-connection-method host template

select the Oracle-connection-method host template
Clic on "Data" tab
Type the oracle user in the ORACLE_USER field (shinken in the previous example)
Type the oracle password in the ORACLE_USER field (shinkenpassword in the previous example)

Configuration of a host

Add the Oracle choosen host template to the Host Templates to inherit field
Clic on "Data" tab
Look for the field DATABASES in the "From templates" datas
Type the name of the database to be monitoring

Tip

You can monitor multiple database on the same host in typing the name of the database separated by a comma.

Exemple : DB1,DB2,DB3

What is checked with the templates

Oracle

Check	Detail	Check range	Default Warning	Default Critical
tnsping	listener
process-usage	Percentage of max possible processes	0% to 100%	80	90
sga-shared-pool-free	Free Memory in the Shared Pool	0% to 100%	10:	5:
tablespace-usage	Used diskspace in the tablespace	0% to 100%	90	98
corrupted-blocks	Number of corrupted blocks in database	0 to n	1	10

Oracle-datafile

Check	Detail	Check range	Default Warning	Default Critical
datafile-io-traffic	Sum of IO-Operationes from Datafiles per second	n/sec	1000	5000
datafiles-existing	Percentage of max possible datafiles	0% to 100%	80	90

Oracle-enqueue

Oracle-events

Oracle-flash-recovery-area

Oracle-latch

Oracle-pga

Oracle-datafile

session-usage: Percentage of max possible sessions 0%..100% (80, 90)
rman-backup-problems: Number of RMAN-errors during the last three days 0..n (1, 2)

sga-data-buffer-hit-ratio: Hitrate in the Data Buffer Cache 0%..100% (98:, 95:)
sga-library-cache-gethit-ratio: Hitrate in the Library Cache (Gets) 0%..100% (98:, 95:)
sga-library-cache-pinhit-ratio: Hitrate in the Library Cache (Pins) 0%..100% (98:, 95:)
sga-library-cache-reloads: Reload-Rate in the Library Cache n/sec (10,10)
sga-dictionary-cache-hit-ratio: Hitrate in the Dictionary Cache 0%..100% (95:, 90:)
sga-latches-hit-ratio: Hitrate of the Latches 0%..100% (98:, 95:)
sga-shared-pool-reloads: Reload-Rate in the Shared Pool 0%..100% (1, 10)
pga-in-memory-sort-ratio: Percentage of sorts in the memory. 0%..100% (99:, 90:)
invalid-objects: Sum of faulty Objects, Indices, Partitions
stale-statistics: Sum of objects with obsolete optimizer statistics n (10, 100)
tablespace-free: Free diskspace in the tablespace 0%..100% (5:, 2:)
tablespace-fragmentation: Free Space Fragmentation Index 100..1 (30:, 20:)
tablespace-io-balanc: IO-Distribution under the datafiles of a tablespace n (1.0, 2.0)
tablespace-remaining-time: Sum of remaining days until a tablespace is used by 100%. The rate of increase will be calculated with the values from the last 30 days. (With the parameter –lookback different periods can be specified) Days (90:, 30:)
tablespace-can-allocate-next: Checks if there is enough free tablespace for the next Extent.
flash-recovery-area-usage: Used diskspace in the flash recovery area 0%..100% (90, 98)
flash-recovery-area-free: Free diskspace in the flash recovery area 0%..100% (5:, 2:)
soft-parse-ratio: Percentage of soft-parse-ratio 0%..100%
switch-interval: Interval between RedoLog File Switches 0..n Seconds (600:, 60:)
retry-ratio: Retry-Rate in the RedoLog Buffer 0%..100% (1, 10)
redo-io-traffic: Redolog IO in MB/sec n/sec (199,200)
roll-header-contention: Rollback Segment Header Contention 0%..100% (1, 2)
roll-block-contention: Rollback Segment Block Contention 0%..100% (1, 2)
roll-hit-ratio: Rollback Segment gets/waits Ratio 0%..100% (99:, 98:)
roll-extends: Rollback Segment Extends n, n/sec (1, 100)
roll-wraps: Rollback Segment Wraps n, n/sec (1, 100)
seg-top10-logical-reads: Sum of the userprocesses under the top 10 logical reads n (1, 9)
seg-top10-physical-reads: Sum of the userprocesses under the top 10 physical reads n (1, 9)
seg-top10-buffer-busy-waits: Sum of the userprocesses under the top 10 buffer busy waits n (1, 9)
seg-top10-row-lock-waits: Sum of the userprocesses under the top 10 row lock waits n (1, 9)
event-waits: Waits/sec from system events n/sec (10,100)
event-waiting: How many percent of the elapsed time has an event spend with waiting 0%..100% (0.1,0.5)
enqueue-contention: Enqueue wait/request-Ratio 0%..100% (1, 10)
enqueue-waiting: How many percent of the elapsed time since the last run has an Enqueue spend with waiting 0%..100% (0.00033,0.0033)
latch-contention: Latch misses/gets-ratio. With –name a Latchname or Latchnumber can be passed over. (See list-latches) 0%..100% (1,2)
latch-waiting: How many percent of the elapsed time since the last run has a Latch spend with waiting 0%..100% (0.1,1)
sysstat: Changes/sec for any value from v$sysstat n/sec (10,10)

Oracle-full

Page tree

Page History

Versions Compared

Old Version 28

New Version 29

Key

Abstract

Introduction

Steps

What’s Already Done For You

Setup the oracle user account

Create the database user

Test the connection

Configuring host template and host for monitoring

Configuration of Oracle-connection-method host template

Configuration of a host

What is checked with the templates

Oracle

Oracle-datafile

Oracle-enqueue

Oracle-events

Oracle-flash-recovery-area

Oracle-latch

Oracle-pga

Oracle-redo

Oracle-rman

Oracle-rollback-segment

Oracle-sga

Oracle-soft-parse

Oracle-stale-statistics

Oracle-sysstat

Oracle-tablespace

Oracle-top10-stats

Oracle-usage

Oracle-full

Setup the oracle user account

Create the database user

Test the connection

Configuring host template and host for monitoring

Configuration of Oracle-connection-method host template

Configuration of a host

What is checked with the templates

Oracle

Oracle-enqueue

Oracle-events

Oracle-flash-recovery-area

Oracle-latch

Oracle-pga

Oracle-datafile