Sur réception du signal SIGUSR1 le module va effectuer un dump de sa mémoire, pour tout autre signal, le module va s'arrêter :
[YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ MANAGE SIGNAL ] The worker with the pid XXXX received a signal XX |
Quand le processus de pilotage s'arrête de façon inopinée :
[YYYY-MM-DD HH:MM:SS] ERROR : [ SCHEDULERNAME ] [ MongodbRetention ] [ WORKER:XXXX ] I am a worker with pid: XXXX and my master process YYYY is dead, I exit. |
[YYYY-MM-DD HH:MM:SS] ERROR : [ SCHEDULERNAME ] [ MongodbRetention ] [ WORKER pid=XXXX ] (support-only) MEMORY DUMP (to be sent to the support): xxxxxxxx xxxxxxxx xxxxxxxx |
[YYYY-MM-DD HH:MM:SS] ERROR : [ SCHEDULERNAME ] [ MongodbRetention ] [ WORKER pid=XXXX ] (support-only) Memory information dumped to file FFFFFFF (to be sent to the support) |
[YYYY-MM-DD HH:MM:SS] ERROR : [ SCHEDULERNAME ] [ MongodbRetention ] [ WORKER pid=XXXX ] MEMORY DUMP: FAIL check if guppy lib is installed |
[YYYY-MM-DD HH:MM:SS] ERROR : [ SCHEDULERNAME ] [ MongodbRetention ] [ WORKER pid=XXXX ] (support-only) MEMORY DUMP: FAIL check if meliae lib is installed |
| Section | Description |
|---|---|
| LOAD RETENTION | Correspond au chargement de la rétention |
| DELETE OLD RETENTION | Correspond à la suppression des anciennes rétentions |
| SAVE | Correspond à la sauvegarde |
RETENTION STATUS | Correspond à l'étape de vérification de l'état de la rétention, avant son chargement |
[YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ SOUS-SECTION ] Try to open a Mongodb connection to [ mongodb://192.168.1.120/?w=1&safe=false ] database [ shinken ] [YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ INITIALISATION ] [ MONGO ] [ SSH TUNNEL ] Connection to mongodb://192.168.1.120/?w=1&fsync=false with a ssh tunnel: [YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ INITIALISATION ] [ MONGO ] [ SSH TUNNEL ] - searching a random local port available for the tunnel binding (trying 15978): localhost:15978 =(ssh tunnel)=> bastdev2:22 =(mongodb)=> 192.168.1.120:27017 (search try:1) [YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ INITIALISATION ] [ MONGO ] [ SSH TUNNEL ] - tunnel creation SUCCESS: localhost:15978 =(ssh tunnel)=> 192.168.1.120:22 =(mongodb)=> 192.168.1.120:27017 (search try:1, ssh pid=22096) [YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ INITIALISATION ] [ MONGO ] [ SSH TUNNEL ] - SUCCESS mongo connection is OPENED with the SSH tunnel: localhost:15978 =(ssh tunnel)=> 192.168.1.120:22 =(mongodb)=> 192.168.1.120:27017 [YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ SOUS-SECTION ] Mongo connection established in 0.200s |
Il y indique :
[YYYY-MM-DD HH:MM:SS] WARNING: [ SCHEDULERNAME ] [ MongodbRetention ] [ INITIALISATION ] [ MONGO ] [ SSH TUNNEL ] Connection to mongo failed, closing the SSH tunnel [YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ MONGO ] Mongo raised ERROR_MESSAGE on the operation get_connection. Operation failed : 1/5 ... [YYYY-MM-DD HH:MM:SS] WARNING: [ SCHEDULERNAME ] [ MongodbRetention ] [ INITIALISATION ] [ MONGO ] [ SSH TUNNEL ] Connection to mongo failed, closing the SSH tunnel [YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ MONGO ] Mongo raised ERROR_MESSAGE on the operation get_connection. Operation failed : 2/5 ... [YYYY-MM-DD HH:MM:SS] WARNING: [ SCHEDULERNAME ] [ MongodbRetention ] [ INITIALISATION ] [ MONGO ] [ SSH TUNNEL ] Connection to mongo failed, closing the SSH tunnel [YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ MONGO ] Mongo raised ERROR_MESSAGE on the operation get_connection. Operation failed : 3/5 ... [YYYY-MM-DD HH:MM:SS] WARNING: [ SCHEDULERNAME ] [ MongodbRetention ] [ INITIALISATION ] [ MONGO ] [ SSH TUNNEL ] Connection to mongo failed, closing the SSH tunnel [YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ MONGO ] Mongo raised ERROR_MESSAGE on the operation get_connection. Operation failed : 4/5 ... [YYYY-MM-DD HH:MM:SS] WARNING: [ SCHEDULERNAME ] [ MongodbRetention ] [ INITIALISATION ] [ MONGO ] [ SSH TUNNEL ] Connection to mongo failed, closing the SSH tunnel [YYYY-MM-DD HH:MM:SS] ERROR : [ SCHEDULERNAME ] [ MongodbRetention ] [ MONGO ] Mongo raised ERROR_MESSAGE on the operation get_connection. Operation failed : 5/5. We tried 5 times but it kept failing. |
[YYYY-MM-DD HH:MM:SS] ERROR : [ SCHEDULERNAME ] [ MongodbRetention ] [ LOAD RETENTION ] LOAD RETENTION ] FAILED Retention could not be loaded from mongodb: Mongo raised ERROR_MESSAGE on the operation get_connection. Operation failed : 5/5. We tried 5 times but it kept failing. |
Si plusieurs url mongo sont précisées
[YYYY-MM-DD HH:MM:SS] ERROR : [ SCHEDULERNAME ] [ MODULES-MANAGER ] The instance MongodbRetention raised an error: Multiples urls were found in the module's configuration file. I disable it and set it to restart it later |
Pour la sauvegarde de la rétention, trois SOUS-SECTION existent:
| Section | Description |
|---|---|
| SAVE GLOBAL | Correspond au processus global de la sauvegarde |
| SAVE WORKERS | Correspond à un sous-processus de SAVE GLOBAL, qui s'occupe de la file d'attente des différents workers de la sauvegarde |
| SAVE WORKER X | C'est un sous-processus de SAVE WORKERS, correspondant à un worker numéroté X qui permet de sauvegarder une partie des informations du scheduler en base. Le nombre de workers est paramétrable dans les paramètres du module. ( voir Module MongodbRetention ( Rétention en base de données centralisée par royaume ) ) |
Les logs SAVE GLOBAL donnent des informations relatives au fonctionnement global du module ou de sa configuration.
Avant de faire la rétention, le module informe de l'URI utilisé ainsi que du nombre total d'hôtes et de checks à sauvegarder.
[YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ SAVE GLOBAL ] Starting to save retention with VV worker(s). [ XX:hosts/clusters ] [ YY:checks ] ( Database used = mongodb://127.0.0.1safe=false, use ssh = 0 ), max time allowed for the save ZZ seconds |
Dans l'exemple :
[2025-02-11 09:53:59] INFO : [ scheduler-master ] [ MongodbRetention ] [ SAVE GLOBAL ] Starting to save retention with 4 worker(s). [ 10:hosts/clusters ] [ 100:checks ] ( Database used = mongodb://192.168.1.56/?w=1&fsync=false, use ssh = 1 ), max time allowed for the save 120 seconds |
Les erreurs lors de la sauvegarde de la rétention sont aussi enregistrées dans les logs sous cette forme:
[YYYY-MM-DD HH:MM:SS] ERROR : [ SCHEDULERNAME ] [ MODULES-MANAGER ] The instance MongodbRetention raised an error: ERROR MESSAGE. Total time XX.XXs. I disable it and set it to restart it later |
[2025-02-11 09:56:50] ERROR : [ scheduler-master ] [ MODULES-MANAGER ] The instance MongodbRetention raised an error: [ SAVE GLOBAL ] FAILED Retention could not be saved in mongodb. Total time 194.80s. I disable it and set it to restart it later |
[2025-02-11 09:56:50] ERROR : [ scheduler-master ] [ MODULES-MANAGER ] The instance MongodbRetention raised an error: [ SAVE GLOBAL ] FAILED Retention could not be saved in mongodb because mongo is unreachable. Total time 194.80s. I disable it and set it to restart it later |
Les logs SAVE WORKERS donnent l'état de chaque worker de sa création à son succès/échec.
[YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ SAVE WORKERS ] Starting worker X with pid XXXX. Try: [ Y ], max time allowed [ ZZs ] [YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ SAVE WORKERS ] The worker X successfully ended ( after Y tries ) |
La préparation des données à sauvegarder a été longue :
[YYYY-MM-DD HH:MM:SS] WARNING: [ SCHEDULERNAME ] [ MongodbRetention ] [ SAVE WORKERS ] [ PERF ] [ X.XXXs ] atomization duration |
Des erreurs empêchent le bon déroulé de la sauvegarde :
[YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ SAVE WORKERS ] some workers did fail to exit or encountered an error. The retention save can be incomplete. |
[YYYY-MM-DD HH:MM:SS] ERROR : [ SCHEDULERNAME ] [ MongodbRetention ] [ SAVE WORKERS ] Too many tries failed |
[YYYY-MM-DD HH:MM:SS] ERROR : [ SCHEDULERNAME ] [ MongodbRetention ] [ SAVE WORKERS ] Cannot start the XXXX worker process as there is not enough memory |
[YYYY-MM-DD HH:MM:SS] ERROR : [ SCHEDULERNAME ] [ MongodbRetention ] [ SAVE WORKERS ] Cannot start the worker XXXX process: XX. Exiting the retention save, killing all currently launched workers |
[YYYY-MM-DD HH:MM:SS] ERROR : [ SCHEDULERNAME ] [ MongodbRetention ] [ SAVE WORKERS ] ERROR MESSAGE [YYYY-MM-DD HH:MM:SS] ERROR : [ SCHEDULERNAME ] [ MongodbRetention ] [ SAVE WORKERS ] "EXCEPTION PYTHON" |
Les logs SAVE WORKER X donnent pour le worker ayant l'identifiant X, les statistiques sur les sauvegardes qu'il a effectuées.
[YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ SAVE WORKER X ] Preparing elements to save [YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ SAVE WORKER X ] Took X.XXms to prepare XXX hosts/clusters and XXXX checks [YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ SAVE WORKER X ] Took X.XXms to connect to Mongo [YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ SAVE WORKER X ] hosts/clusters will be saved in groups of maximum 1000 [YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ SAVE WORKER X ] Saved XXX/XXX hosts/clusters ( took X.XXms ) [YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ SAVE WORKER X ] Took X.XXms to save XXX hosts/clusters [YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ SAVE WORKER X ] checks will be saved in groups of maximum 1000 [YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ SAVE WORKER X ] Saved XXXX/XXXX checks ( took X.XXms ) [YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ SAVE WORKER X ] Saved XXXX/XXXX checks ( took X.XXms ) [YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ SAVE WORKER X ] Took X.XXms to save XXXX checks [YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ SAVE WORKER X ] Worker ended in X.XXms |
Informations :
[YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ SAVE WORKER X ] [ MONGO ] Mongo raised ( Mongo connection failure to xxxxxxx ) on the operation get_connection. Operation failed : 1/5 [YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ SAVE WORKER X ] [ MONGO ] Mongo raised ( Mongo connection failure to xxxxxxx ) on the operation get_connection. Operation failed : 2/5 [YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ SAVE WORKER X ] [ MONGO ] Mongo raised ( Mongo connection failure to xxxxxxx ) on the operation get_connection. Operation failed : 3/5 [YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ SAVE WORKER X ] [ MONGO ] Mongo raised ( Mongo connection failure to xxxxxxx ) on the operation get_connection. Operation failed : 4/5 [YYYY-MM-DD HH:MM:SS] ERROR : [ SCHEDULERNAME ] [ MongodbRetention ] [ SAVE WORKER X ] [ MONGO ] Mongo raised ( Mongo connection failure to xxxxxxx ) on the operation get_connection. Operation failed : 5/5. We tried 5 times but it kept failing. [YYYY-MM-DD HH:MM:SS] ERROR : [ SCHEDULERNAME ] [ MongodbRetention ] [ SAVE WORKER X ] After 5 tries, worker could not connect to mongo :[Mongo raised ( Mongo connection failure to xxxxxxx ) on the operation get_connection. Operation failed : 5/5. We tried 5 times but it kept failing.] |
[YYYY-MM-DD HH:MM:SS] ERROR : [ SCHEDULERNAME ] [ MongodbRetention ] [ SAVE WORKER X ] Worker has an error:[ ERROR MESSAGE ] [YYYY-MM-DD HH:MM:SS] ERROR : [ SCHEDULERNAME ] [ MongodbRetention ] [ SAVE WORKER X ] (pid=XXXX) "EXCEPTION PYTHON" |
[YYYY-MM-DD HH:MM:SS] LOG_LEVEL: [ SCHEDULERNAME ] [ MongodbRetention ] [ SAVE WORKER X ] [ OVERSIZED DATA ] [ DETAILS ] oversized data of XXXXB for ELEMENT_TYPE ELEMENT_UUID may cause database query to fail. Detail of potential expensive content: ELEMENT_DETAILS [YYYY-MM-DD HH:MM:SS] LOG_LEVEL: [ SCHEDULERNAME ] [ MongodbRetention ] [ SAVE WORKER X ] [ OVERSIZED DATA ] [ SIZE ] oversized data of XXXXB for ELEMENT_TYPE ELEMENT_UUID may cause database query to fail. Size of potential expensive content: ELEMENT_SIZE_DETAILS |
La sauvegarde de la rétention peut échouer si au moins un élément dépasse la taille maximale que peut supporter la base de données. Le module va afficher les éléments pouvant causer cette erreur suivant des seuils définis dans sa configuration.
| Paramètre du module | Niveau de log | |
|---|---|---|
| WARNING | |
| ERROR |
[2025-07-23 10:29:45] WARNING: [ scheduler-master ] [ MongodbRetention ] [ SAVE WORKER 0 ] [ OVERSIZED DATA ] [ DETAILS ] oversized data of 12845B for service 80e69ea445e111f0abb10800270aacd1-97373c2245e111f080950800270aacd1 may cause database query to fail. Detail of potential expensive content: total notifications nb:141384, notified contacts uuid list nb:1, incident nb:1, notifications in progress nb:0, downtimes nb:0, checks in progress nb:0 [2025-07-23 10:29:45] WARNING: [ scheduler-master ] [ MongodbRetention ] [ SAVE WORKER 0 ] [ OVERSIZED DATA ] [ SIZE ] oversized data of 12845B for service 80e69ea445e111f0abb10800270aacd1-97373c2245e111f080950800270aacd1 may cause database query to fail. Size of potential expensive content: outputs size:167B, current and last perf data size:98B, downtimes user content size:0B, acknowledgement user content size:0B |
Avant de charger une rétention, le module s'assure que les autres Schedulers ont fini de l'enregistrer afin d'avoir des données à jour.
[YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ RETENTION STATUS ] Setting state to INIT, last_ping:CURRENT_TIME should be updated after SSs [YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ RETENTION STATUS ] Register in database done ( arbiter_uuid:ARBITER_ID scheduler_name:SCHEDULERNAME shard:SHARD_ID conf_date:CONF_TIME conf_uuid:CONF_ID ) |
Dans un premier temps, le Scheduler s'enregistre dans le système de vérification de l'état de la rétention ( état INIT pour initialisation ), puis il enregistre les données relatives à la partie de la configuration qu'il va gérer.
[2025-02-11 16:06:05] INFO : [ scheduler-master ] [ MongodbRetention ] [ RETENTION STATUS ] Setting state to INIT, last_ping:2025-02-11 16:06:05 should be updated after 30s [2025-02-11 16:06:05] INFO : [ scheduler-master ] [ MongodbRetention ] [ RETENTION STATUS ] Register in database done ( arbiter_uuid:f52f1c69-f9cf-45ad-ae8d-1c05bcf533bf scheduler_name:scheduler-master shard:514 conf_date:2025-02-11 15:16:24 conf_uuid:6371f59afcdd4f899ade96920ca6bdb5 ) |
La liste des différents états possibles dans le système de vérification de la rétention pour le Scheduler est la suivante :
| État | Paramètre du module définissant la durée maximale de l'état | Description | Durée maximale | ||
|---|---|---|---|---|---|
| INIT |
| Le Scheduler s'enregistre dans le système de vérification de l'état de la rétention ( , avant de pouvoir charger la rétention. | Cette opération étant effectuée pour faire un chargement de rétention, cet état ne doit pas durer plus que le délai d'expiration appliqué au chargement de la rétention. | ||
| LOAD |
| Le Scheduler indique qu'il est en train de charger la rétention. | |||
| DEL |
| Le Scheduler indique qu'il est en train de supprimer les entrées obsolètes de la rétention. | |||
| IDLE |
| Le Scheduler indique qu'il ne fait rien avec la rétention ( cas de fonctionnement normal ). | Dans le fonctionnement normal, le Scheduler effectuant une sauvegarde de rétention périodique, cet état ne doit pas durer plus que le délai entre deux sauvegardes. | ||
| SAVE |
| Le Scheduler indique qu'il enregistre la rétention. | Cet état ne peut durer plus que le délai accordé aux workers pour enregistrer la rétention. |
Si un autre Scheduler s'est déjà enregistré comme responsable de cette partie de la configuration, le module attend que cet autre Scheduler se dégage de cette responsabilité
[YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ RETENTION STATUS ] Shard SHARD_ID is already held by SCHEDULERNAME_2 ( reachable and in STATE state since XXXs with expiry in YYYs, remaining wait time: ZZZs ) |
[2025-02-11 18:42:59] INFO : [ scheduler-master ] [ MongodbRetention ] [ RETENTION STATUS ] Shard 513 is already held by scheduler-spare ( reachable and in IDLE state since 603s with expiry in 297s, remaining wait time: 55s ) |
Si le Scheduler ne peut pas devenir responsable de la partie de configuration qu'il veut gérer, passé un certain délai ( paramètre scheduler__retention_mongo__load_retention_chunk_timeout du module ), il renonce et se met en attente d'une nouvelle configuration de l'Arbiter
Une ligne de log supplémentaire est affichée pour information, pour le support, si ce cas doit être analysé.
[YYYY-MM-DD HH:MM:SS] WARNING: [ SCHEDULERNAME ] [ MongodbRetention ] [ RETENTION STATUS ] Shard SHARD_ID is not available after XXXs, aborting retention loading and going to idle mode
[YYYY-MM-DD HH:MM:SS] WARNING: [ SCHEDULERNAME ] [ MongodbRetention ] [ RETENTION STATUS ] [ FOR_SUPPORT ] retention_status_management with arbiter_uuid: ARBITER_ID, conf_date: CONF_TIME, shard_id: SHARD_ID => { DATA }
[YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ RETENTION STATUS ] Deregister from database done ( arbiter_uuid: ARBITER_ID, scheduler_name: SCHEDULERNAME ) |
[2025-02-11 18:43:56] WARNING: [ scheduler-master ] [ MongodbRetention ] [ RETENTION STATUS ] Shard 513 is not available after 301s, aborting retention loading and going to idle mode
[2025-02-11 18:43:56] WARNING: [ scheduler-master ] [ MongodbRetention ] [ RETENTION STATUS ] [ FOR_SUPPORT ] retention_status_management with arbiter_uuid: f52f1c69-f9cf-45ad-ae8d-1c05bcf533bf, conf_date: 1739290980, shard_id: 513 => {'scheduler_name': 'scheduler-spare', 'state': 'IDLE', 'save_timeout': 120, 'load_timeout': 300, 'save_interval': 900, 'del_timeout': 20, 'last_ping': 1739295176, 'conf_uuid': '51b98e90c89c440485da93a53665eac2'}
[2025-02-11 18:43:56] INFO : [ scheduler-master ] [ MongodbRetention ] [ RETENTION STATUS ] Deregister from database done ( arbiter_uuid: f52f1c69-f9cf-45ad-ae8d-1c05bcf533bf, scheduler_name: scheduler-null-3 ) |
Si l'autre Scheduler supposé géré cette partie de la configuration est injoignable, il est remplacé dans le système de vérification de l'état de la rétention par ce Scheduler
[YYYY-MM-DD HH:MM:SS] WARNING: [ SCHEDULERNAME ] [ MongodbRetention ] [ RETENTION STATUS ] [1/3] Connection to SCHEDULERNAME_2 ( uri="CONNECTION_URL" ) failed with error: ERROR_MESSAGE [YYYY-MM-DD HH:MM:SS] WARNING: [ SCHEDULERNAME ] [ MongodbRetention ] [ RETENTION STATUS ] [2/3] Connection to SCHEDULERNAME_2 ( uri="CONNECTION_URL" ) failed with error: ERROR_MESSAGE [YYYY-MM-DD HH:MM:SS] WARNING: [ SCHEDULERNAME ] [ MongodbRetention ] [ RETENTION STATUS ] [3/3] Connection to SCHEDULERNAME_2 ( uri="CONNECTION_URL" ) failed with error: ERROR_MESSAGE [YYYY-MM-DD HH:MM:SS] WARNING: [ SCHEDULERNAME ] [ MongodbRetention ] [ RETENTION STATUS ] Removing ( deregister ) unreachable Scheduler SCHEDULERNAME_2 because it held shard SHARD_ID in STATE state since XXs with expiry in YYYs |
Afin de pouvoir charger la rétention, le module s'assure qu'il n'y a pas de Scheduler actif sur une autre version de la configuration, ou qu'aucun des Schedulers sur la même version de la configuration n'est en train d'effectuer un enregistrement de la rétention.
Si un Scheduler, supposé gérer une autre partie de la configuration, ne peut être contacté, il est ignoré et considéré comme éteint. Il ne bloquera pas le chargement de la rétention
[YYYY-MM-DD HH:MM:SS] WARNING: [ SCHEDULERNAME ] [ MongodbRetention ] [ RETENTION STATUS ] [1/3] Connection to SCHEDULERNAME_2 ( uri="CONNECTION_URL" ) failed with error: ERROR_MESSAGE [YYYY-MM-DD HH:MM:SS] WARNING: [ SCHEDULERNAME ] [ MongodbRetention ] [ RETENTION STATUS ] [2/3] Connection to SCHEDULERNAME_2 ( uri="CONNECTION_URL" ) failed with error: ERROR_MESSAGE [YYYY-MM-DD HH:MM:SS] WARNING: [ SCHEDULERNAME ] [ MongodbRetention ] [ RETENTION STATUS ] [3/3] Connection to SCHEDULERNAME_2 ( uri="CONNECTION_URL" ) failed with error: ERROR_MESSAGE [YYYY-MM-DD HH:MM:SS] WARNING: [ SCHEDULERNAME ] [ MongodbRetention ] [ RETENTION STATUS ] Ignoring unreachable Scheduler SCHEDULERNAME_2 in STATE state since XXs with expiry in YYYs |
[2025-02-11 18:17:27] WARNING: [ scheduler-master ] [ MongodbRetention ] [ RETENTION STATUS ] [1/3] Connection to scheduler-other ( uri="http://192.168.1.56:12768/" ) failed with error: ( Connection error to http://192.168.1.56:12768/ : Failed to connect to 192.168.1.56 port 12768 after 2 ms: Could not connect to server ) [2025-02-11 18:17:29] WARNING: [ scheduler-master ] [ MongodbRetention ] [ RETENTION STATUS ] [2/3] Connection to scheduler-other ( uri="http://192.168.1.56:12768/" ) failed with error: ( Connection error to http://192.168.1.56:12768/ : Failed to connect to 192.168.1.56 port 12768 after 7 ms: Could not connect to server ) [2025-02-11 18:17:31] WARNING: [ scheduler-master ] [ MongodbRetention ] [ RETENTION STATUS ] [3/3] Connection to scheduler-other ( uri="http://192.168.1.56:12768/" ) failed with error: ( Connection error to http://192.168.1.56:12768/ : Failed to connect to 192.168.1.56 port 12768 after 5 ms: Could not connect to server ) [2025-02-11 18:17:33] INFO : [ scheduler-master ] [ MongodbRetention ] [ RETENTION STATUS ] Ignoring unreachable Scheduler scheduler-other in IDLE state since 24s with expiry in 876s |
Lorsqu'un autre Scheduler gère encore une ancienne version de la configuration, le module attend qu'il soit contacté par l'Arbiter avec la nouvelle version de la configuration, et qu'il enregistre ses données en rétention.
[YYYY-MM-DD HH:MM:SS] WARNING: [ SCHEDULERNAME ] [ MongodbRetention ] [ RETENTION STATUS ] Scheduler SCHEDULERNAME_2 is reachable and is running an old configuration ( conf_uuid:OLD_CONF_ID created at OLD_CONF_CREATION_TIME, mine with conf_uuid:CONF_ID was created at CONF_CREATION_TIME ), waiting for its configuration update, remaining wait time: SSSs |
[2025-02-11 14:38:12] WARNING: [ scheduler-master ] [ MongodbRetention ] [ RETENTION STATUS ] Scheduler scheduler-other is reachable and is running an old configuration ( conf_uuid:3bf39bebc2b34e3ba6c6e89cff524e50 created at 2025-02-11 11:57:19, mine with conf_uuid:6371f59afcdd4f899ade96920ca6bdb5 was created at 2025-02-11 14:38:03 ), waiting for its configuration update, remaining wait time: 298s |
Si l'Arbiter envoie une nouvelle configuration, les données de connexion aux autres Schedulers peuvent être indisponibles pendant sa réception :
[YYYY-MM-DD HH:MM:SS] WARNING: [ SCHEDULERNAME ] [ MongodbRetention ] [ RETENTION STATUS ] Unknown Scheduler SCHEDULERNAME_2 is running an old configuration ( conf_uuid:OLD_CONF_ID created at OLD_CONF_CREATION_TIME, mine with conf_uuid:CONF_ID was created at CONF_CREATION_TIME ), waiting for its configuration update, remaining wait time: SSSs |
Lorsqu'un autre Scheduler gère une version plus récente de la configuration, le chargement de la configuration actuelle est annulée et le Scheduler se met en attente de l'Arbiter pour disposer de cette nouvelle configuration.
[YYYY-MM-DD HH:MM:SS] WARNING: [ SCHEDULERNAME ] [ MongodbRetention ] [ RETENTION STATUS ] Scheduler SCHEDULERNAME_2 is reachable and has received a more recent configuration ( conf_uuid:NEW_COND_ID created at NEW_CONF_CREATION_TIME, mine with conf_uuid:CONF_ID was created at CONF_CREATION_TIME ), going to idle mode waiting for this new configuration [YYYY-MM-DD HH:MM:SS] WARNING: [ SCHEDULERNAME ] [ MongodbRetention ] [ RETENTION STATUS ] Deregister from database done ( arbiter_uuid: ARBITER_ID, scheduler_name: SCHEDULERNAME |
[2025-02-11 16:46:57] WARNING: [ scheduler-master ] [ MongodbRetention ] [ RETENTION STATUS ] Scheduler scheduler-other is reachable and has received a more recent configuration ( conf_uuid:e2044759f24a41c39b02656f313fccda created at 2025-02-11 16:45:33, mine with conf_uuid:0694cc6bebea4e4ca7ab4c40b828ccbb was created at 2025-02-11 15:16:24 ), going to idle mode waiting for this new configuration [2025-02-11 17:23:34] INFO : [ scheduler-master ] [ MongodbRetention ] [ RETENTION STATUS ] Deregister from database done ( arbiter_uuid: f52f1c69-f9cf-45ad-ae8d-1c05bcf533bf, scheduler_name: scheduler-master ) |
Si l'Arbiter envoie une nouvelle configuration, les données de connexion aux autres Schedulers peuvent être indisponibles pendant sa réception.
[YYYY-MM-DD HH:MM:SS] WARNING: [ SCHEDULERNAME ] [ MongodbRetention ] [ RETENTION STATUS ] Unknown Scheduler SCHEDULERNAME_2 has received a more recent configuration ( conf_uuid:NEW_COND_ID created at NEW_CONF_CREATION_TIME, mine with conf_uuid:CONF_ID was created at CONF_CREATION_TIME ), going to idle mode waiting for this new configuration [YYYY-MM-DD HH:MM:SS] WARNING: [ SCHEDULERNAME ] [ MongodbRetention ] [ RETENTION STATUS ] Deregister from database done ( arbiter_uuid: ARBITER_ID, scheduler_name: SCHEDULERNAME ) |
Lorsqu'un autre Scheduler est en train de faire une sauvegarde des données en rétention, le module bloque le chargement de la rétention, le temps que cet autre Scheduler termine
[YYYY-MM-DD HH:MM:SS] WARNING: [ SCHEDULERNAME ] [ MongodbRetention ] [ RETENTION STATUS ] Scheduler SCHEDULERNAME_2 is reachable and is saving its retention, waiting for its completion, remaining wait time: XXXs |
[2025-02-11 16:46:57] WARNING: [ scheduler-master ] [ MongodbRetention ] [ RETENTION STATUS ] Scheduler scheduler-other is reachable and is saving its retention, waiting for its completion, remaining wait time: 110s |
Si l'Arbiter envoie une nouvelle configuration, les données de connexion aux autres Schedulers peuvent être indisponibles pendant sa réception.
[YYYY-MM-DD HH:MM:SS] WARNING: [ SCHEDULERNAME ] [ MongodbRetention ] [ RETENTION STATUS ] Unknown Scheduler SCHEDULERNAME_2 is saving its retention, waiting for its completion, remaining wait time: |
[2025-02-11 16:46:57] WARNING: [ scheduler-master ] [ MongodbRetention ] [ RETENTION STATUS ] Unknown Scheduler scheduler-other is saving its retention, waiting for its completion, remaining wait time: 110s |
Si l'Arbiter envoie une nouvelle configuration pendant la phase de vérification de l'état de la rétention avant de pouvoir la charger, le démarrage du Scheduler est annulé pour charger cette nouvelle configuration.
[YYYY-MM-DD HH:MM:SS] WARNING: [ SCHEDULERNAME ] [ MongodbRetention ] [ RETENTION STATUS ] A new configuration has been received, aborting retention loading and going to idle mode |
Si le Scheduler avait pu s'enregistrer comme responsable de sa partie de configuration, le log suivant, indiquant qu'il se désengage de cette responsabilité, est affiché.
[YYYY-MM-DD HH:MM:SS] WARNING: [ SCHEDULERNAME ] [ MongodbRetention ] [ RETENTION STATUS ] Deregister from database done ( arbiter_uuid: ARBITER_ID, scheduler_name: SCHEDULERNAME ) |
[2025-02-11 17:23:34] INFO : [ scheduler-master ] [ MongodbRetention ] [ RETENTION STATUS ] A new configuration has been received, aborting retention loading and going to idle mode [2025-02-11 17:23:34] INFO : [ scheduler-master ] [ MongodbRetention ] [ RETENTION STATUS ] Deregister from database done ( arbiter_uuid: f52f1c69-f9cf-45ad-ae8d-1c05bcf533bf, scheduler_name: scheduler-master ) |
[YYYY-MM-DD HH:MM:SS] WARNING: [ SCHEDULERNAME ] [ MongodbRetention ] [ RETENTION STATUS ] Daemon has been stopped, aborting retention loading and going to idle mode |
Si le Scheduler avait pu s'enregistrer comme responsable de sa partie de configuration, le log suivant, indiquant qu'il se désengage de cette responsabilité, est affiché
[YYYY-MM-DD HH:MM:SS] WARNING: [ SCHEDULERNAME ] [ MongodbRetention ] [ RETENTION STATUS ] Deregister from database done ( arbiter_uuid: ARBITER_ID, scheduler_name: SCHEDULERNAME ) |
[2025-02-11 18:35:27] INFO : [ scheduler-master ] [ MongodbRetention ] [ RETENTION STATUS ] Daemon has been stopped, aborting retention loading and going to idle mode [2025-02-11 18:35:27] INFO : [ scheduler-master ] [ MongodbRetention ] [ RETENTION STATUS ] Deregister from database done ( arbiter_uuid: f52f1c69-f9cf-45ad-ae8d-1c05bcf533bf, scheduler_name: scheduler-master ) |
Les logs fournissent des informations liées au chargement de la rétention, permettant de suivre son avancée et l'état sur la connexion à Mongo.
[YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ LOAD RETENTION ] Try to open a Mongodb connection to [ mongodb://127.0.0.1/?safe=false ] database [ shinken ] [YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ LOAD RETENTION ] Mongo connection established in 4.94ms [YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ LOAD RETENTION ] [ HOSTS/CLUSTERS ] Scheduler has XXX/XXX hosts/clusters in its cache and need load retention for XXX/XXX [YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ LOAD RETENTION ] [ HOSTS/CLUSTERS ] Took 3.52ms to load XX/XX hosts/clusters [YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ LOAD RETENTION ] [ CHECKS ] Scheduler has YYY/YYY checks in its cache and need load retention for YYY/YYY [YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ LOAD RETENTION ] [ CHECKS ] Took 28.00ms to load YYY/YYY checks [YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ LOAD RETENTION ] Took 32.07ms to load ZZZ/ZZZ elements [YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ LOAD RETENTION ] Took 5.99ms to restore data to Scheduler |
Les erreurs lors du chargement de la rétention sont aussi enregistrées dans les logs sous cette forme:
[YYYY-MM-DD HH:MM:SS] ERROR : [ SCHEDULERNAME ] [ MongodbRetention ] [ LOAD RETENTION ] FAILED Retention could not be loaded from mongodb: ERROR MESSAGE DETAILS |
[YYYY-MM-DD HH:MM:SS] ERROR : [ SCHEDULERNAME ] [ MongodbRetention ] [ LOAD RETENTION ] error querying hosts/clusters entries: ERROR MESSAGE. Module exiting. |
[YYYY-MM-DD HH:MM:SS] ERROR : [ SCHEDULERNAME ] [ MongodbRetention ] [ LOAD RETENTION ] error querying checks entries: ERROR MESSAGE. Module exiting. |
Les logs de suppression permettent de voir le nombre d'objets supprimés (triés par hôtes et checks) ainsi que la date à partir de laquelle la rétention est conservée.
[YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ DELETE OLD RETENTION ] Checking old elements ( hosts/clusters/checks ) not updated since 7 days -> YYYY-MM-DD HH:MM UTC [YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ DELETE OLD RETENTION ] - XXX hosts/clusters deleted in 377.65ms [YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ DELETE OLD RETENTION ] - YYY checks deleted in 184.476ms [YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ DELETE OLD RETENTION ] Total time for deleting X old elements = 562.126ms |
[YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ DELETE OLD RETENTION ] Checking old elements ( hosts/clusters/checks ) not updated since 7 days -> YYYY-MM-DD HH:MM UTC [YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ DELETE OLD RETENTION ] - There is no data to delete [YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ DELETE OLD RETENTION ] Total time for deleting X old elements = 1.17ms |
[YYYY-MM-DD HH:MM:SS] WARNING: [ SCHEDULERNAME ] [ MongodbRetention ] [ DELETE OLD RETENTION ] We have been disconnected of mongo. Will retry [1/3] [YYYY-MM-DD HH:MM:SS] WARNING: [ SCHEDULERNAME ] [ MongodbRetention ] [ DELETE OLD RETENTION ] We have been disconnected of mongo. Will retry [2/3] [YYYY-MM-DD HH:MM:SS] WARNING: [ SCHEDULERNAME ] [ MongodbRetention ] [ DELETE OLD RETENTION ] We have been disconnected of mongo. Will retry [3/3] [YYYY-MM-DD HH:MM:SS] ERROR : [ SCHEDULERNAME ] [ MongodbRetention ] [ DELETE OLD RETENTION ] After 3 tries, we couldn't connect to mongo |
[YYYY-MM-DD HH:MM:SS] ERROR : [ SCHEDULERNAME ] [ MongodbRetention ] [ DELETE OLD RETENTION ] We have an error:[ERROR MESSAGE] [YYYY-MM-DD HH:MM:SS] ERROR : [ SCHEDULERNAME ] [ MongodbRetention ] [ DELETE OLD RETENTION ] "EXCEPTION PYTHON" |