View Source

Shinken Entreprise is composed with 7 daemons. Shinken Enterprise's architecture has been designed according to the Unix Way: one tool, one task. Shinken Enterprise has an architecture where each part is isolated and connects to the others via standard interfaces. Shinken Enterprise is based on a HTTP backend. This makes building a highly available or distributed monitoring architecture quite easy.

Shinken Enterprise Daemons

Synopsis

Voici un tableau présentant les différents daemons et leurs ports d’écoute et leurs rôles respectifs.

Daemon	Listening Port	Protocol	Role
Synchronizer	7765	HTTPS	Manage the configuration
Arbiter	7770	HTTPS	Read and dispatch the configuration
Scheduler	7768	HTTPS	Manage monitoring logic
Poller	7771	HTTPS	Launch monitoring checks
Reactionner	7769	HTTPS	Launch notifications plugins
Receiver	7773	HTTPS	Receive external commands
Broker	7772	HTTPS	Get and export data

The synchronizer

The synchronizer role

The synchronizer daemon manage the whole configuration. It use modules to detect new hosts and host modifications. It presents the configuration web interface to the administrators. Here are the optional sources the daemon can use to get information:

Active directory
VSphere (VMWare)
Nagios or Shinken framework configuration files
Network scans

Active directory

The Active Directory discovery is done with a domain account, and only need read access. The connexion can be done in LDAPS to be sure the connexion is secure.

It is possible to define a top level OU in order to list only the elements (servers and users) that are defined below tis OU level.

The information that the module is getting are server names, FQDN, the server OS, and if defined in the LDAP entry, its locality.

VSphere from VMWare

The VSphere discovery is designed ti discovery physical servers (ESX) and their virtual servers. It will also get OS and their IP address, but only if the VMware tools are enabled and running on the virtual server.

The Synchronizer to VMware connexion is always through the VSphere server, and only need a read only access to it. The Shinken Enterprise servers do not ned any direct access to the ESX servers.

The Shinken Enterprise to VSphere communication is done with the SOAP API from VMWare, onto a HTTPS connexion.

Nagios or Shinken Framework configuration files

Shinken Enterprise is able to load any Nagios or Shinken Framework configuration files. It will automatically load the dfined objects into its configuration.

Network scans

The network scan discovery is optional. It's done thanks to the nmap command, launched on the Synchronzier server. It allow to scan networks defined by the Shinken Enterprise administrators.

The scans are done on the TCP and UDP ports. It will also try to get additionnals data from the servers and services running on them (it use the -O option of the nmap command).

Data storage of the configuration

All discovered data from the Synchronizer are saved into a Mongodb database. It possible, it's better for the database to be set aside the Synchronizer daemon. this database do not need to be shared with other daemons, and so its communications should be limited to the local synchronizer server.

Configuration interface and its access

The configuration interface is hosted on the Synchronizer daemon, and use another TCP port than the visualization UI. You can use two different credential systems:

Manage directly on the configuration interface
Manage credentials with Active directory accounts. the daemon will use Ldap connexions to check for the credentials.

The non-admon users will be restricted in the visibility onto the hosts that they are direct contacts, or are in a contact group linked with the hosts.

This interface is using the save Mongodb dtabse than the synchronizer daemon. The default port for this configuration interface is 7766.

Interface	Daemon	Port
Configuration	Synchronizer	7766

Synchonizer connexion summary

Source daemon	Connexion to	Port	Protocol	Note
Synchronizer	Active Directory	636	LDAPS	Read only account
Synchronizer	VSphere	443	HTTPS	Read only account on VSphere

The arbiter

Role

The arbiter daemon reads the configuration from the synchronizer. It divides it into parts (N schedulers = N parts), and distributes them to the appropriate Shinken Enterprise daemons. Additionally, it manages the high availability features: if a particular daemon dies, it re-routes the configuration managed by this failed daemon to the configured spare. Finally, it receives input from users or passive check results and routes them to the appropriate daemon. Passive check results are forwarded to the Scheduler responsible for the check. There can only be one active arbiter with other arbiters acting as hot standby spares in the architecture.

Connexion with the synchronizer

The communication between the arbiter and the synchronizer is done on the standard port of the synchronizer (7765).

Connexion with the other daemons

This daemon is used to check and dispatch configuration to the others daemons, but not the Synchronizer. The connexion is done on the standard port of the others daemons, and will use a HTTPS connexion is the otehrs daemons are defined to use it.

Data

This daemon is hosting the whole system configuration into memory. It have access to all server names, address, types, and also the defined command used to check them.

It also host in memory the defined contacts that should receive notification for the defined hosts and services.

Arbiter connexion summary

Source daemon	Destination	Port	Protocol	Note
Arbiter	Synchronizer	7765	HTTPS
Arbiter	Scheduler	7768	HTTPS
Arbiter	Poller	7771	HTTPS
Arbiter	Reactionner	7769	HTTPS
Arbiter	Receiver	7773	HTTPS
Arbiter	Arbiter	7770	HTTPS	Only if there is a arbiter slave, and only from the the master to theslave
Arbiter	Broker	7772	HTTPS

The scheduler

The scheduler daemon manages the dispatching of checks and actions to the poller and reactionner daemons respectively. The scheduler daemon is also responsible for processing the check result queue, analyzing the results, doing correlation and following up actions accordingly (if a service is down, ask for a host check). It does not launch checks or notifications. It just keeps a queue of pending checks and notifications for other daemons of the architecture (like pollers or reactionners). This permits distributing load equally across many pollers. There can be many schedulers for load-balancing or hot standby roles. Status persistence is achieved using a retention module.

The poller

The poller daemon launches check plugins as requested by schedulers. When the check is finished it returns the result to the schedulers. Pollers can be tagged for specialized checks (ex. Windows versus Unix, customer A versus customer B, DMZ). There can be many pollers for load-balancing or hot standby spare roles.

The reactionner

The reactionner daemon issues notifications and launches event_handlers. This centralizes communication channels with external systems in order to simplify SMTP authorizations or RSS feed sources (only one for all hosts/services). There can be many reactionners for load-balancing and spare roles

The broker

The broker daemon exports and manages data from schedulers. The management can is done exclusively with modules. Multiple broker modules can be enabled simultaneously. Example of broker modules:

Module for centralizing Shinken logs: Simple-log (flat file)

Modules for exporting data: Graphite-Perfdata

Modules for the Livestatus API

The receiver

The receiver daemon receives passive check data and serves as a distributed passive command buffer that will be read by the arbiter daemon. There can be many receivers for load-balancing and hot standby spare roles. The receiver can also use modules to accept data from different protocols.

Module for passive data collection: WS arbiter module

This architecture is fully flexible and scalable: the daemons that require more performance are the poller and the schedulers. The administrator can add as many as he wants.

Shinken Entreprise V02.03.03-U01 > Architecture > shinken-architecture.png

Automatic load balancing

Distribute hosts among schedulers

Shinken Enterprise is able to cut the user configuration into parts and dispatch it to the schedulers. The load balancing is done automatically: the administrator does not need to remember which host is linked with another one to create packs.

The dispatch is a host-based one: that means that all services of a host will be in the same scheduler as this host. That means that the administrator does not need to know all relations among elements like parents, hostdependencies or service dependencies: Shinken Enterprise is able to look at these relations and put these related elements into the same shard.

This action is done in two parts:

create independent shards of elements

paste shards to create N configurations for the N schedulers

Creating independent shards

The cutting action is done by looking at two elements: hosts and services. Services are linked with their host so they will be in the same shard. Other relations are taken into account :

Network relationship for hosts (like a distant server and its router)

Host logicial dependencies

Shinken Enterprise looks at all these relations and creates a graph with it. A graph is a relation shard. This can be illustrated by the following picture :

Shinken Entreprise V02.03.03-U01 > Architecture > pack-creation.png

In this example, we will have two shards:

shard 1: Host-1 to host-5 and all their services

shard 2: Host-6 to Host-8 and all their services

The shard aggregation into the schedulers

When all shards are created, the Arbiter aggregates them into N configurations if the administrator has defined N active schedulers (no spares). Shards are aggregated into configurations (it's like "Big packs"). The dispatch looks at the weight property of schedulers: the higher weight a scheduler has, the more packs it will have. This can be shown in the following picture :

Shinken Entreprise V02.03.03-U01 > Architecture > pack-agregation.png

The configurations sending to satellites

When all configurations are created, the Arbiter sends them to the N active Schedulers. A Scheduler can start processing checks once it has received and loaded it's configuration without having to wait for all schedulers to be ready(v1.2). For larger configurations, having more than one Scheduler, even on a single server is highly recommended, as they will load their configurations (new or updated) faster. The Arbiter also creates configurations for satellites (pollers, reactionners and brokers) with links to Schedulers so they know where to get jobs to do. After sending the configurations, the Arbiter begins to watch for orders from the users and is responsible for monitoring the availability of the satellites.

The high availability

The Shinken Enterprise architecture is a high availability one. Before looking at how this works,let's take a look at how the load balancing works if it's now already done.

Nobody is perfect. A server can crash, an application too. That is why administrators have spares: they can take configurations of failing elements and reassign them. For the moment the only daemon that does not have a spare is the Arbiter, but this will be added in the future. The Arbiter regularly checks if everyone is available. If a scheduler or another satellite is dead, it sends its conf to a spare node, defined by the administrator. All satellites are informed by this change so they can get their jobs from the new element and do not try to reach the dead one. If a node was lost due to a network interruption and it comes back up, the Arbiter will notice and ask the old system to drop its configuration.

The availability parameters can be modified from the default settings when using larger configurations as the Schedulers or Brokers can become busy and delay their availability responses. The timers are aggressive by default for smaller installations. See daemon configuration parameters for more information on the three timers involved.

External commands dispatching

The administrator needs to send orders to the schedulers (like a new status for passive checks). In Shinken Enterprise the administrator just sends the order to the Arbiter, that's all. External commands can be divided into two types :

commands that are global to all schedulers

commands that are specific to one element (host/service)

For each command, Shinken knows if it is global or not. If global, it just sends orders to all schedulers. For specific ones instead it searches which scheduler manages the element referred by the command (host/service) and sends the order to this scheduler. When the order is received by schedulers they just need to apply them.

Different types of Pollers: poller_tag

The current Shinken Enterprise architecture is useful for someone that uses the same type of poller for checks. But it can be useful to have different types of pollers. We already saw that all pollers talk to all schedulers. In fact, pollers can be "tagged" so that they will execute only some checks.

This is useful when the user needs to have hosts in the same scheduler (like with dependencies) but needs some hosts or services to be checked by specific pollers (see usage cases below).

These checks can in fact be tagged on 3 levels :

Host

Service

Command

The parameter to tag a command, host or service, is "poller_tag". If a check uses a "tagged" or "untagged" command in a untagged host/service, it takes the poller_tag of this host/service. In a "untagged" host/service, it's the command tag that is taken into account.

The pollers can be tagged with multiple poller_tags. If they are tagged, they will only take checks that are tagged, not the untagged ones, unless they defined the tag "None".

It's mainly used when you have a DMZ network, you need to have a dedicated poller that is in the DMZ, and return results to a scheduler in LAN. With this, you can still have dependencies between DMZ hosts and LAN hosts, and still be sure that checks are done in a DMZ-only poller.

Advanced architectures: Realms

Shinken's architecture allows the administrator to have a unique point of administration with numerous schedulers, pollers, reactionners and brokers. Hosts are dispatched with their own services to schedulers and the satellites (pollers/reactionners/brokers) get jobs from them. Everyone is happy.

Or almost everyone. Think about an administrator who has a distributed architecture around the world. With the current Shinken Enterprise architecture the administrator can put a couple scheduler/poller daemons in Europe and another set in Asia, but he cannot "tag" hosts in Asia to be checked by the asian scheduler . Also trying to check an asian server with an european scheduler can be very sub-optimal, read very sloooow. The hosts are dispatched to all schedulers and satellites so the administrator cannot be sure that asian hosts will be checked by the asian monitoring servers.

In the normal Shinken Enterprise Architecture is useful for load balancing with high availability, for single site.

Shinken Enterprise provides a way to manage different geographic or organizational sites.

We will use a generic term for this site management, Realms.

Realms in few words

A realm is a pool of resources (scheduler, poller, reactionner, receiver and broker) that hosts or hostgroups can be attached to. A host or hostgroup can be attached to only one realm. All "dependancies" or parents of this hosts must be in the same realm. A realm can be tagged "default"' and realm untagged hosts will be put into it. In a realm, pollers, reactionners and brokers will only get jobs from schedulers of the same realm.

Realms are different than the poller_tags

Make sure to undestand when to use realms and when to use poller_tags.

realms are used to segregate schedulers

poller_tags are used to segregate pollers

For some cases poller_tag functionality could also be done using Realms. The question you need to ask yourself: Is a poller_tag "enough", or do you need to fully segregate a the scheduler level and use Realms. In realms, schedulers do not communicate with schedulers from other Realms:

If you just need a poller in a DMZ network: use poller_tag
If you need a scheduler/poller in a customer LAN: use realms

Sub realms

A realm can contain another realm. It does not change anything for schedulers: they are only responsible for hosts of their realm not the ones of the sub realms. The realm tree is useful for satellites like reactionners or brokers: they can get jobs from the schedulers of their realm, but also from schedulers of sub realms. Pollers can also get jobs from sub realms, but it's less useful so it's disabled by default. Warning: having more than one broker in a scheduler is not a good idea. The jobs for brokers can be taken by only one broker. For the Arbiter it does not change a thing: there is still only one Arbiter and one configuration whatever realms you have.

Example of realm usage

Let's take a look at two distributed environnements. In the first case the administrator wants totally distinct daemons. In the second one he just wants the schedulers/pollers to be distincts, but still have one place to send notifications (reactionners) and one place for database export (broker).

Distincts realms :

Shinken Entreprise V02.03.03-U01 > Architecture > shinken-architecture-isolated-realms.png

More common usage, the global realm with reactionner/broker, and sub realms with schedulers/pollers :

Shinken Entreprise V02.03.03-U01 > Architecture > shinken-architecture-global-realm.png

Satellites can be used for their realm or sub realms too. It's just a parameter in the configuration of the element.