...
The following things occur when hosts or services experience SOFT state changes:*
- The SOFT state is logged.
...
- Event handlers are executed to handle the SOFT state.
SOFT states are only logged if you enabled the :ref:`loglog_service_retries <configuration/configmain-advanced#log_service_retries>` or :ref:`logor log_host_retries <configuration/configmain-advanced#log_host_retries>` options in your main configuration file.
The only important thing that really happens during a soft state is the execution of event handlers. Using event handlers can be particularly useful if you want to try and proactively fix a problem before it turns into a HARD state. The :ref:`$HOSTSTATETYPE$ <$HOSTSTATETYPE$>` or :ref:`$SERVICESTATETYPE$ <$SERVICESTATETYPE$>` $HOSTSTATETYPE$ or $SERVICESTATETYPE$ macros will have a value of "SOFT" when event handlers are executed, which allows your event handler scripts to know when they should take corrective action. More information on event handlers can be found :ref:`here <advanced/eventhandlers>`.
...
Hard states occur for hosts and services in the following situations:*
- When a host or service check results in a non-UP or non-OK state and it has been (re)checked the number of times specified by the max_check_attempts option in the host or service definition. This is a hard error state.
...
- When a host or service transitions from one hard error state to another error state (e.g. WARNING to CRITICAL).
...
- When a service check results in a non-OK state and its corresponding host is either DOWN or UNREACHABLE.
...
- When a host or service recovers from a hard error state. This is considered to be a hard recovery.
...
- When a
...
- passive host check
...
- is received. Passive host checks are treated as HARD
...
- .
The following things occur when hosts or services experience HARD state changes:*
- The HARD state is logged.
...
- Event handlers are executed to handle the HARD state.
...
- Contacts are notifified of the host or
...
- check problem or recovery.
The :ref:`$HOSTSTATETYPE$ <$HOSTSTATETYPE$>` or :ref:`$SERVICESTATETYPE$ <$SERVICESTATETYPE$>` macros $HOSTSTATETYPE$ or$SERVICESTATETYPE$ data will have a value of "HARD" when event handlers are executed, which allows your event handler scripts to know when they should take corrective action. More information on event handlers can be found :ref:`here <advanced/eventhandlers>`.
Example
Here's an example of how state types are determined, when state changes occur, and when event handlers and notifications are sent out. The table below shows consecutive checks of a service check over time. The service check has a max_check_attempts value of 3.==== ======= ======== ========== ============ =============================================================================================================================================================================================================
Time Check # State State Type State Change Notes
0 1 OK HARD No Initial state of the service
1 1 CRITICAL SOFT Yes
| Time | Check Number | State Type | Type State | Changes | Notes |
|---|---|---|---|---|---|
| 0 | 1 | OK | HARD | No | Initial state |
| 1 | 1 | CRITICAL | SOFT | Yes | First detection of a non-OK state. Event handlers execute. |
| 2 | 2 | WARNING | SOFT | Yes |
...
| Check continues to be in a non-OK state. Event handlers execute. | |||||
| 3 | 3 | CRITICAL | HARD | Yes | Max check attempts has been reached, so |
...
| check goes into a HARD state. Event handlers execute and a problem notification is sent out. Check |
...
| number is reset to 1 immediately after this happens. | ||||
| 4 | 1 | WARNING | HARD | Yes |
...
| Check changes to a HARD WARNING state. Event handlers execute and a problem notification is sent out. | ||||
| 5 | 1 | WARNING | HARD | No |
...
| Check stabilizes in a HARD problem state. Depending on what the notification interval for the service is, another notification might be sent out. | ||||
| 6 | 1 | OK | HARD | Yes |
...
| Check experiences a HARD recovery. Event handlers execute and a recovery notification is sent out. | ||||
| 7 | 1 | OK | HARD | No |
...
| Check is still OK. | ||||
| 8 | 1 | UNKNOWN | SOFT | Yes |
...
| Check is detected as changing to a SOFT non-OK state. Event handlers execute |
...
| 9 | 2 | OK | SOFT | Yes |
...
| Check experiences a SOFT recovery. Event handlers execute, but notification are not sent, as this wasn't a "real" problem. State type is set HARD and check |
...
| number is reset to 1 immediately after this happens. | ||||
| 10 | 1 | OK | HARD | No |
...
| Check stabilizes in an OK state. |
...