Page History

...

Principes généraux

Include Page

	DaemonsDémons
border	false
	DaemonsDémons
isConfigured	true
preferences	spaceName=&spaceKey=SE&quickfind-space=&pageId=1409062&pageName=Daemons&quickfind-page=&isConfigured=true&refresh=false&showLink=false&isEditable=false
quickfind-space
refresh	false
pageId	1409062
pageName	Daemons
url	rest/gadgets/1.0/g/com.atlassian.confluence.plugins.gadgets:confluence-page-gadget/gadgets/confluence-page-gadget.xml
spaceKey	SE
spaceName
isEditable	false
width	auto
showLink	false
quickfind-page	Daemons

This architecture is fully flexible and scalable. To improve Shinken Enterprise capacity, increasing the number of daemons of the same role is the best way.

Image Removed

Automatic load balancing

Distribute hosts among schedulers

Shinken Enterprise is able to cut the user configuration into parts and dispatch it to the schedulers.

The load balancing is done automatically: the administrator does not need to remember which host is linked to another one to create packs.

The dispatch is a host-based one: that means that all checks of a host will be in the same scheduler as this host. That means that the administrator does not need to know all relations among elements like parents, host dependencies or check dependencies: Shinken Enterprise is able to look at these relations and put these related elements into the same shard.

This action is done in two parts:

create independent shards of elements
paste shards to create N configurations for the N schedulers

Creating independent shards

The cutting action is done by looking at two elements: hosts and checks. Checks are linked with their host so they will be in the same shard.

Other relations are taken into consideration :

Network relationship for hosts (like a distant server and its router).

Host logical dependencies.

Shinken Enterprise looks at all these relations and creates a graph with it. A graph is a relation shard.

This can be illustrated by the following picture :

Image Removed

L'architecture est totalement flexible et scalable.Pour améliorer les capacités de Shinken Enterprise , augmenter le nombre de démons ayant le même rôle est la meilleure approche.

Image Added

Load balancing automatique

Distribution des hôtes à travers les schedulers

Shinken Enterprise est capable de couper la configuration en plusieurs parties et les distribuer aux Schedulers .

Le load balancing est fait automatiquement : l'administrateur n'a pas besoin de se souvenir quel hôte est lié à tel autre pour créer les packs

La répartition est basée sur les hôtes : cela veut dire que tous les checks associés à un hôte seront dans le même Scheduler que l'hôte . Cela signifie que l'administrateur n'a pas besoin de connaître toutes les relations entre éléments comme les parents,, dépendances d'hôtes ou dépendances de checks : Shinken Enterprise est capable de lire ses relations et de rassembler tous les éléments liés dans la même partition.

Cette action se fait en 2 parties :

création de partitions indépendantes pour les éléments
copie des partitions pour créer N configurations pour N Schedulers

Création de partitions indépendantes

L'action de hachage se fait en se basant sur 2 éléments : les hôtes et les checks. Les checks sont liés à l'hôte donc ils seront dans la même partition.

D'autres relations sont prises en compte :

Liaisons réseau pour un hôte (comme un serveur distant et son routeur).

Dépendances logiques.

Shinken Enterprise regarde toutes les relations et crée un graphe avec. Un graphe est une partition de relations.

Illustration :

Image Added
Dans cet exemple, nous avons 2 partitionsIn this example, we will have two shards:

Shard 1: Host-1 to au host-5 and all their et tous leurs checks

Shard 2: Host-6 to au Host-8 and all their checks

The shard aggregation into the schedulers

When all shards are created, the Arbiter aggregates them into N configurations if the administrator has defined N active schedulers (no spares).

Shards are aggregated into configurations (it's like "Big packs").

The dispatch looks at the weight property of schedulers: the higher weight a scheduler has, the more packs it will have.

This can be shown in the following picture :

Image Removed

The configurations sending to satellites

When all configurations are created, the Arbiter sends them to the N active Schedulers.

A Scheduler can start processing checks once it has received and loaded it's configuration without having to wait for all schedulers to be ready.

For larger configurations, having more than one Scheduler, even on a single server is highly recommended, as they will load their configurations (new or updated) faster.

...

et tous leurs checks

L'aggrégation des partitions dans les schedulers

Quand toutes les partitions sont créées, l'Arbiter les agrège dans N configurations si l'administrateur a défini N Schedulers actifs (sans spare).

La répartition se fait sur un critère de poids des Schedulers : plus le poids est élevé, plus il y a de packs .

Illustration :

Image Added

Envoi des configurations vers des satellites

Une fois que toutes les configurations sont créées, l'Arbiter les envoie aux N Schedulers actifs .

Un Scheduler peut commencer à lancer des checks une fois qu'il reçu et chargé sa configuration sans avoir à attendre que TOUS les Schedulers soient prêts.

Pour des configurations plus importantes, avoir plusieurs Schedulers (même sur un seul serveur) est fortement recommandé car ils chargeront leur configuration beaucoup plus vite (nouvelle ou modification)

L'Arbiter crée également les configurations pour ses satellites (pollers, reactionners et brokers) avec les liens permettant de savoir où réaliser les tâches .

Après avoir envoyé les configurations, l'Arbiter commence à traiter les ordres (appelées commandes externes) des utilisateurs et est responsable de vérifier la disponibilité des satellites.

La haute disponibilité

L’architecture de Shinken Enterprise est hautement disponible.

Un serveur peut crasher, une application également. C'est pour cela que les administrateurs ont des back up : ils peuvent recharger la configuration des éléments tombés .

A ce jour, un seul démon n'est pas encore doublé : l'Arbiter. Cela sera fait dans le futur. L'Arbiter vérifie régulièrement que tous les démons sont disponibles. Si un Scheduler ou un autre satellite est tombé, l'Arbiter envoit sa configuration au nœud spare défini par l’administrateur .

Tous les satellites sont informés de ce changement, de façon à recevoir leur tâches du nouvel élément sans essayer de joindre le démon tombé .
Si un nœud est perdu du à une coupure réseau, puis revient , l'Arbiter en prend note et demande au système de supprimer son ancienne configuration .

Les critères de disponibilité peuvent être modifiés dans les paramètres par défaut lorsqu'il s'agit d'une grosse installation car les Schedulers et Brokers peuvent être surchargés et du coup avoir des temps de réponse de disponibilité plus longs.

Les délais sont volontairement très courts pour de petites installations ( Voir paramètres de configuration des Démons pour plus d'information ).

Distribution par Commande Externe

L'administrateur doit envoyer des ordres aux Schedulers (comme par exemple un nouveau statut pour un check passif).

Dans Shinken Enterprise, l'administrateur envoie uniquement l'ordre à l'Arbiter, c'est tout. Les commandes externes sont de 2 types :

commandes qui concernent tous les Schedulers.

commandes qui sont spécifiques à un seul élément (hôte/check).

Pour chaque commande, Shinken Enterprise détecte si c'est global ou particulier:

si global, il envoie les ordres à tous les Schedulers.
Si particulier, il détecte quel Scheduler gère l'élément concerné par la commande (hôte/check) et envoie l'ordre au bon Scheduler.

Dès réception de l'ordre par les Schedulers, il est appliqué.

After sending the configurations, the Arbiter begins to watch for orders (called external command) from the users and is responsible for monitoring the availability of the satellites.

The high availability

The Shinken Enterprise architecture is a high availability one. Before looking at how this works, let's take a look at how the load balancing works.

Nobody is perfect. A server can crash, an application too. That is why administrators have spares: they can take configurations of failing elements and reassign them.

For the moment the only daemon that does not have a spare is the Arbiter, but this will be added in the future. The Arbiter regularly checks if everyone is available. If a scheduler or another satellite is dead, the Arbiter sends its conf to a spare node, defined by the administrator.

All satellites are informed by this change so they can get their tasks from the new element and do not try to reach the dead one.
If a node was lost due to a network interruption and it comes back up, the Arbiter will notice and ask the old system to drop its configuration.

The availability parameters can be modified from the default settings when using larger configurations as the Schedulers or Brokers can become busy and delay their availability responses.

The timers are aggressive by default for smaller installations ( See Daemons configuration parameters for more information on the three timers involved ).

External commands dispatching

The administrator needs to send orders to the schedulers (like a new status for passive checks).

In Shinken Enterprise, the administrator just sends the order to the Arbiter, that's all. External commands can be divided into two types :

commands that are global to all schedulers.

commands that are specific to one element (host/check).

For each command, Shinken Enterprise knows if it is global or not:

If global, it just sends orders to all schedulers.
For specific ones it searches which scheduler manages the element referred by the command (host/check) and sends the order to this scheduler.

When the order is received by schedulers they just need to apply them.

Page tree

Versions Compared

Old Version 1

New Version Current

Key

Principes généraux

Automatic load balancing

Distribute hosts among schedulers

Creating independent shards

Load balancing automatique

Distribution des hôtes à travers les schedulers

Création de partitions indépendantes

The shard aggregation into the schedulers

The configurations sending to satellites

L'aggrégation des partitions dans les schedulers

Envoi des configurations vers des satellites

La haute disponibilité

Distribution par Commande Externe

The high availability

External commands dispatching