Page History

Overview

Include Page

	Daemons

...


border	false
	Daemons
isConfigured	true
preferences	spaceName=&spaceKey=SE&quickfind-space=&pageId=1409062&pageName=Daemons&quickfind-page=

...

width
&isConfigured=true&refresh=false&showLink=false&isEditable=false
quickfind-space
refresh	false
pageId	1409062
pageName	Daemons
url	rest/gadgets/1.0/g/com.atlassian.confluence.plugins.gadgets:confluence-page-gadget/gadgets/confluence-page-gadget.xml
spaceKey	SE
spaceName
isEditable	false

...

auto
showLink	false
quickfind-page	Daemons

Architecture summary

This architecture is fully flexible and scalable

...

. To improve Shinken Enterprise capacity, increasing the number of daemons of the same role is the best way.

Image Added

Image Removed

...

Automatic load balancing

Distribute hosts among schedulers

Shinken Enterprise is able to cut the user configuration into parts and dispatch it to the schedulers

...

:

- The load balancing is done automatically: the administrator does not need to remember which host is linked

...

- to another one to create packs.
- The dispatch is a host-based one: that means that all

...

- checks of a host will be in the same scheduler as this host. That means that the administrator does not need to know all relations among elements like parents, host dependencies or

...

- check dependencies: Shinken Enterprise is able to look at these relations and put these related elements into the same shard.

This action is done in two parts:

- create independent shards of elements
- paste shards to create N configurations for the N schedulers

Creating independent shards

The cutting action is done by looking at two elements: hosts and

...

checks.

...

Checks are linked with their host so they will be in the same shard.

Other relations are taken into

...

consideration :

- Network relationship for hosts (like a distant server and its router).

- Host

...

- logical dependencies.

Shinken Enterprise looks at all these relations and creates a graph with it. A graph is a relation shard.

...

In this example, we will have two shards:

...

- Shard 1: Host-1 to host-5 and all their

...

- checks

...

- Shard 2: Host-6 to Host-8 and all their

...

- checks

Image Added

The shard aggregation into the schedulers

When all shards are created, the Arbiter aggregates them into N configurations if the administrator has defined N active schedulers (no spares).

Shards are aggregated into configurations (it's like "Big packs").

The dispatch looks at the weight property of schedulers: the higher weight a scheduler has, the more packs it will have.

...

Image Removed

...

Image Added

The configurations sending to satellites

When all configurations are created, the Arbiter sends them to the N active Schedulers.

A Scheduler can start processing checks once it has received and loaded it's configuration without having to wait for all schedulers to be ready.

For larger configurations, having more than one Scheduler, even on a single server is highly recommended, as they will load their configurations (new or updated) faster.

The Arbiter also creates configurations for satellites (pollers, reactionners and brokers) with links to Schedulers so they know where to get jobs to do.

After sending the configurations, the Arbiter begins to watch for

...

orders (called external command) from the users and is responsible for monitoring the availability of the satellites.

The high availability

...

Nobody is perfect. A server can crash, an application too. That is why administrators have spares: they can take configurations of failing elements and reassign them.

...

The Shinken Enterprise architecture is a high availability one.

- The Arbiter regularly checks if everyone is available. If a scheduler or another satellite is dead,

...

- the Arbiter sends its conf to a spare node, defined by the administrator.
  - All satellites are informed by this change so they can get their

...

- - tasks from the new element and do not try to reach the dead one.
  - If a node was lost due to a network interruption and it comes back up, the Arbiter will notice and ask the old system to drop its configuration.
- The availability parameters can be modified from the default settings when using larger configurations as the Schedulers or Brokers can become busy and delay their availability responses.

...

- ( See Daemons configuration parameters for more information on the three timers involved ).

The only daemon that does not have a spare is the Synchronizer, because its interruption won't have any critical impact on you monitoring.

External commands dispatching

The administrator needs to send orders to the schedulers (like a new status for passive checks).

In Shinken Enterprise, the administrator just sends the order to the Arbiter, that's all. External commands can be divided into two types :

- commands that are global to all schedulers.

- commands that are specific to one element (host/

...

- check).

For each command, Shinken Enterprise knows if it is global or not

...

:

- If global, it just sends orders to all schedulers.
- For specific ones

...

- it searches which scheduler manages the element referred by the command (host/

...

- check) and sends the order to this scheduler.

When the order is received by schedulers they just need to apply

...

Different types of Pollers: poller_tag

The current Shinken Enterprise architecture is useful for someone that uses the same type of poller for checks. But it can be useful to have different types of pollers. We already saw that all pollers talk to all schedulers. In fact, pollers can be "tagged" so that they will execute only some checks.

This is useful when the user needs to have hosts in the same scheduler (like with dependencies) but needs some hosts or services to be checked by specific pollers (see usage cases below).

These checks can in fact be tagged on 3 levels :

Host

Service

Command

The parameter to tag a command, host or service, is "poller_tag". If a check uses a "tagged" or "untagged" command in a untagged host/service, it takes the poller_tag of this host/service. In a "untagged" host/service, it's the command tag that is taken into account.

The pollers can be tagged with multiple poller_tags. If they are tagged, they will only take checks that are tagged, not the untagged ones, unless they defined the tag "None".

It's mainly used when you have a DMZ network, you need to have a dedicated poller that is in the DMZ, and return results to a scheduler in LAN. With this, you can still have dependencies between DMZ hosts and LAN hosts, and still be sure that checks are done in a DMZ-only poller.

Advanced architectures: Realms

...

them

...

Or almost everyone. Think about an administrator who has a distributed architecture around the world. With the current Shinken Enterprise architecture the administrator can put a couple of scheduler/poller daemons in Europe and another set in Asia, but he cannot "tag" hosts in Asia to be checked by the asian scheduler . Also trying to check an asian server with an european scheduler can be very sub-optimal, read very sloooow. The hosts are dispatched to all schedulers and satellites so the administrator cannot be sure that asian hosts will be checked by the asian monitoring servers.

In the normal Shinken Enterprise Architecture is useful for load balancing with high availability, for single site.

Shinken Enterprise provides a way to manage different geographic or organizational sites.

We will use a generic term for this site management, Realms.

...

A realm is a pool of resources (scheduler, poller, reactionner, receiver and broker) that hosts or hostgroups that can be attached to. A host or hostgroup can be attached to only one realm. All "dependancies" or parents of this hosts must be in the same realm. A realm can be tagged "default"' and realm untagged hosts will be put into it. In a realm, pollers, reactionners and brokers will only get jobs from schedulers of the same realm.

...

Make sure to undestand when to use realms and when to use poller_tags.

realms are used to segregate schedulers

poller_tags are used to segregate pollers

...

.

...

If you just need a poller in a DMZ network: use poller_tag
If you need a scheduler/poller in a customer LAN: use realms

...

A realm can contain another realm. It does not change anything for schedulers: they are only responsible for hosts of their realm not the ones of the sub realms. The realm tree is useful for satellites like reactionners or brokers: they can get jobs from the schedulers of their realm, but also from schedulers of sub realms. Pollers can also get jobs from sub realms, but it's less useful so it's disabled by default. Warning: having more than one broker in a scheduler is not a good idea. The jobs for brokers can be taken by only one broker. For the Arbiter it does not change a thing: there is still only one Arbiter and one configuration whatever realms you have.

...

Let's take a look at two distributed environnements. In the first case the administrator wants totally distinct daemons. In the second one he just wants the schedulers/pollers to be distincts, but still have one place to send notifications (reactionners) and one place for database export (broker).

Distincts realms :

...

Page tree

Versions Compared

Old Version 20

New Version Current

Key

Overview

Architecture summary

Automatic load balancing

Distribute hosts among schedulers

Creating independent shards

The shard aggregation into the schedulers

The configurations sending to satellites

The high availability

External commands dispatching

Different types of Pollers: poller_tag

Advanced architectures: Realms