Sur réception du signal SIGUSR1 le module va effectuer un dump de sa mémoire, pour tout autre signal, le module va s'éteindre :
[YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ MANAGE SIGNAL ] The worker with the pid XXXX received a signal XX |
Quand le processus de pilotage s'arrête de façon inopinée :
[YYYY-MM-DD HH:MM:SS] ERROR : [ SCHEDULERNAME ] [ MongodbRetention ] [ WORKER:XXXXX ] I am a worker with pid: XXXX and my master process YYYY is dead, I exit. |
[YYYY-MM-DD HH:MM:SS] ERROR : [ SCHEDULERNAME ] [ MongodbRetention ] [ WORKER pid=XXXX ] (support-only) Memory information dumped to file FFFFFFF (to be sent to the support) |
[YYYY-MM-DD HH:MM:SS] ERROR : [ SCHEDULERNAME ] [ MongodbRetention ] [ WORKER pid=XXXX ] (support-only) MEMORY DUMP: FAIL check if meliae lib is installed |
Dans les logs suivants, le mot clé SOUS-SECTION peut valoir une des valeurs suivantes :
[YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ SOUS-SECTION ] We are creating mongo connection [uri=mongodb://192.168.1.120/?safe=false] [database=shinken] [ssh=True] [YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ SOUS-SECTION ] Connection created in : 0.200s |
[YYYY-MM-DD HH:MM:SS] WARNING: [ SCHEDULERNAME ] [ MongodbRetention ] [ SOUS-SECTION ] Mongo connection failed 1/X time, we will try again [YYYY-MM-DD HH:MM:SS] WARNING: [ SCHEDULERNAME ] [ MongodbRetention ] [ SOUS-SECTION ] Mongo connection failed Y/X times, we will try again [YYYY-MM-DD HH:MM:SS] ERROR : [ SCHEDULERNAME ] [ MongodbRetention ] [ SOUS-SECTION ] Mongo connection failed X/X times, we stop trying |
[YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ SOUS-SECTION ] We need to create a mongo connection |
suivi des logs de la connexion normale
[YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ SOUS-SECTION ] Could not create mongo connection |
Si plusieurs url mongo sont précisées
[YYYY-MM-DD HH:MM:SS] ERROR : [ SCHEDULERNAME ] [ MODULES-MANAGER ] The instance MongodbRetention raised an error: Multiples urls were found in the module's configuration file. I disable it and set it to restart it later |
Pour la sauvegarde de la rétention, trois types de logs existent:
| Section | Description |
|---|---|
| SAVE GLOBAL | Correspond au processus global de la sauvegarde |
| SAVE WORKERS | Correspond à un sous-processus de SAVE GLOBAL, qui s'occupe de la file d'attente des différents workers de la sauvegarde |
| SAVE WORKER X | C'est un sous-processus de SAVE WORKERS, correspondant à un worker numéroté X qui permet de sauvegarder une partie des informations du scheduler en base. Le nombre de workers est paramétrable dans les paramètres du module. ( voir Rétention en base de données centralisée par royaume ( Module MongodbRetention ) ) |
Les logs SAVE GLOBAL donnent des informations relatives au fonctionnement global du module ou de sa configuration.
[YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ SAVE GLOBAL ] Starting to save retention data. [XXX:hosts] [XXX:checks] (Database used = mongodb://HOST/?safe=false, use ssh = False) [YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ SAVE GLOBAL ] SUCCESS Retention data was saved into mongodb. Total time X.XXs |
Les erreurs lors de la sauvegarde de la rétention sont aussi enregistrées dans les logs sous cette forme:
[YYYY-MM-DD HH:MM:SS] ERROR : [ SCHEDULERNAME ] [ MODULES-MANAGER ] The instance MongodbRetention raised an error: ERROR MESSAGE. Total time XX.XXs. I disable it and set it to restart it later |
[YYYY-MM-DD HH:MM:SS] ERROR : [ SCHEDULERNAME ] [ MODULES-MANAGER ] The instance MongodbRetention raised an error: [ SAVE GLOBAL ] FAILED Retention data could not be saved in mongodb. Total time 22.20s. I disable it and set it to restart it later |
[YYYY-MM-DD HH:MM:SS] ERROR : [ SCHEDULERNAME ] [ MODULES-MANAGER ] The instance MongodbRetention raised an error: [ SAVE GLOBAL ] FAILED Retention data could not be saved in mongodb because mongo is unreachable. Total time 2.11s. I disable it and set it to restart it later |
Les logs SAVE WORKERS donnent l'état de chaque worker de sa création à son succès/échec.
[YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ SAVE WORKERS ] Starting worker X with pid XXXXX. Try: X/X [YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ SAVE WORKERS ] The worker X did SUCCESS (after X try) |
La préparation des données à sauvegarder a été longue :
[YYYY-MM-DD HH:MM:SS] WARNING: [ SCHEDULERNAME ] [ MongodbRetention ] [ PERF ] [ X.XXXs ] atomization duration |
Des erreurs empêchent le bon déroulé de la sauvegarde :
[YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ SAVE WORKERS ] some workers did fail to exit or encountered an error. The retention save can be incomplete. |
[YYYY-MM-DD HH:MM:SS] ERROR : [ SCHEDULERNAME ] [ MongodbRetention ] [ SAVE WORKERS ] Too many tries failed |
[YYYY-MM-DD HH:MM:SS] ERROR : [ SCHEDULERNAME ] [ MongodbRetention ] [ SAVE WORKERS ] Cannot start the XXXXX worker process as there is not enough memory |
[YYYY-MM-DD HH:MM:SS] ERROR : [ SCHEDULERNAME ] [ MongodbRetention ] [ SAVE WORKERS ] Cannot start the worker X process: XX. Exiting the retention save, killing all currently launched workers |
[YYYY-MM-DD HH:MM:SS] ERROR : [ SCHEDULERNAME ] [ MongodbRetention ] [ SAVE WORKERS ] ERROR MESSAGE [YYYY-MM-DD HH:MM:SS] ERROR : [ SCHEDULERNAME ] [ MongodbRetention ] [ SAVE WORKERS ] "EXCEPTION PYTHON" |
Les logs SAVE WORKER X donne pour le worker ayant l'identifiant X, les statistiques sur les sauvegardes qu'il a effectuées : le nombre d'éléments, résultat et temps d'exécution.
[YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ SAVE WORKER 0 ] Updating retention with elements: checks [ XXX ] -- hosts [ XX ] in mongodb [YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ SAVE WORKER 0 ] Retention data saved into mongodb in X.XXX seconds |
[YYYY-MM-DD HH:MM:SS] WARNING: [ SCHEDULERNAME ] [ MongodbRetention ] [ SAVE WORKER X ] The worker (pid:XXXX | try:XX) did not exit on time (XX s). We are restarting it. |
[YYYY-MM-DD HH:MM:SS] ERROR : [ SCHEDULERNAME ] [ MongodbRetention ] [ SAVE WORKER X ] Failed connection with the following message : ERROR MESSAGE |
[YYYY-MM-DD HH:MM:SS] WARNING: [SCHEDULERNAME] [ MongodbRetention ] [ SAVE WORKER X ] worker has been disconnected of mongo. Will retry [1/X] [YYYY-MM-DD HH:MM:SS] WARNING: [SCHEDULERNAME] [ MongodbRetention ] [ SAVE WORKER X ] worker has been disconnected of mongo. Will retry [Y/X] [YYYY-MM-DD HH:MM:SS] WARNING: [SCHEDULERNAME] [ MongodbRetention ] [ SAVE WORKER X ] worker has been disconnected of mongo. Will retry [X/X] [YYYY-MM-DD HH:MM:SS] ERROR : [SCHEDULERNAME] [ MongodbRetention ] [ SAVE WORKER X ] After X tries, worker could not connect to mongo :[ERROR MESSAGE] [YYYY-MM-DD HH:MM:SS] ERROR : [SCHEDULERNAME] [ MongodbRetention ] [ SAVE WORKER X ] (pid=XXXX) "EXCEPTION PYTHON" |
[YYYY-MM-DD HH:MM:SS] ERROR : [SCHEDULERNAME] [ MongodbRetention ] [ SAVE WORKER X ] Worker has an error: [ ERROR MESSAGE ] [YYYY-MM-DD HH:MM:SS] ERROR : [SCHEDULERNAME] [ MongodbRetention ] [ SAVE WORKER X ] (pid=XXXX) "EXCEPTION PYTHON" |
Les logs fournissent des informations liées au chargement de la rétention, permettant de suivre son avancée et l'état sur la connexion à Mongo.
[YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ LOAD RETENTION ] [ HOSTS / CLUSTERS ] [ X.XXXs ] We took X hosts/clusters from the retention [ in scheduler hosts/clusters : without retention=X / total=1 ] [YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ LOAD RETENTION ] [ HOSTS / CLUSTERS ] No host/cluster are needed for retention load (scheduler already have all X hosts retention data). [YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ LOAD RETENTION ] [ CHECKS ] [ X.XXXs ] We took X checks from the retention [ in scheduler checks : without retention=XX / total=XX ] [YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ LOAD RETENTION ] [ CHECKS ] No checks are needed for retention load (scheduler already have all X checks retention data). [YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ LOAD RETENTION ] [ X.XXXs] Total number of elements load from mongo database: X ( scheduler have a total of XX elements ) [YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ LOAD RETENTION ] [ X.XXXs ] SUCCESS Retention data loaded successfully. |
Les erreurs lors du chargement de la rétention sont aussi enregistrées dans les logs sous cette forme:
[YYYY-MM-DD HH:MM:SS] ERROR : [ SCHEDULERNAME ] [ MongodbRetention ] [ LOAD RETENTION ] FAILED Retention data could not be loaded from mongodb: ERROR MESSAGE DETAILS |
[YYYY-MM-DD HH:MM:SS] ERROR : [ SCHEDULERNAME ] [ MongodbRetention ] [ LOAD RETENTION ] error querying host entries: ERROR MESSAGE. Module exiting. |
[YYYY-MM-DD HH:MM:SS] ERROR : [ SCHEDULERNAME ] [ MongodbRetention ] [ LOAD RETENTION ] error querying checks entries: ERROR MESSAGE. Module exiting. |
Les logs de suppression permettent de voir le nombre d'objets supprimés (triés par hôtes et checks) ainsi que la date à partir de laquelle la rétention est conservée.
[YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ DELETE OLD RETENTION ] We will delete all retention data that were saved before the XXXX-XX-XX XX:XX UTC (X days) [YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ DELETE OLD RETENTION ] - Deleting XXX hosts from old retention [XXXX by XXXX] [YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ DELETE OLD RETENTION ] - XXX - hosts deleted in X.XXXs [YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ DELETE OLD RETENTION ] - Deleting XXX services from old retention [XXXX by XXXX] [YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ DELETE OLD RETENTION ] - XXX - services deleted in X.XXXs [YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ DELETE OLD RETENTION ] Total time for deleting XXXX entries = X.XXXs |
[YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ DELETE OLD RETENTION ] We will delete all retention data that were saved before the XXXX-XX-XX XX:XX UTC (X days) [YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ DELETE OLD RETENTION ] There is no data to delete [YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ DELETE OLD RETENTION ] Total time for deleting 0 entries = X.XXXs |
[YYYY-MM-DD HH:MM:SS] WARNING: [SCHEDULERNAME] [ MongodbRetention ] [ DELETE OLD RETENTION ] We have been disconnected of mongo. Will retry [1/3] [YYYY-MM-DD HH:MM:SS] WARNING: [SCHEDULERNAME] [ MongodbRetention ] [ DELETE OLD RETENTION ] We have been disconnected of mongo. Will retry [2/3] [YYYY-MM-DD HH:MM:SS] WARNING: [SCHEDULERNAME] [ MongodbRetention ] [ DELETE OLD RETENTION ] We have been disconnected of mongo. Will retry [3/3] [YYYY-MM-DD HH:MM:SS] ERROR : [SCHEDULERNAME] [ MongodbRetention ] [ DELETE OLD RETENTION ] After 3 tries, we couldn't connect to mongo |
[YYYY-MM-DD HH:MM:SS] ERROR : [SCHEDULERNAME] [ MongodbRetention ] [ DELETE OLD RETENTION ] We have an error:[ERROR MESSAGE] [YYYY-MM-DD HH:MM:SS] ERROR : [SCHEDULERNAME] [ MongodbRetention ] [ DELETE OLD RETENTION ] "EXCEPTION PYTHON" |
Avant de faire la rétention, le module nous informe de l'URI utilisé ainsi que du nombre total d'hôtes et de checks à sauvegarder.
[YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ SAVE GLOBAL ] Starting to save retention data with X worker(s). [YY:hosts] [ZZ:checks] (Database used = mongodb://127.0.0.1/?safe=false, use ssh = True/False) |
Dans l'exemple :
La sauvegarde dans la base mongo se fait avec plusieurs workers, un log par worker nous informe de son PID lors leur création.
[YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ SAVE WORKERS ] Starting worker 0 with pid XXXXX. Try: 1/3 [YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ SAVE WORKERS ] Starting worker 1 with pid YYYYY. Try: 1/3 |
Quand le module mongo se connecte à une base de données, on va avoir le log suivant:
[YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ SAVE WORKER X ] We are creating mongo connection [uri=mongodb://127.0.0.1/?safe=false] [database=shinken_retention] [ssh=False] [YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ SAVE WORKER X ] Connection created in : 0.006s |
Il y indique :
Après avoir créé des workers, chaque worker nous informe de sa progression avec les logs suivants :
[YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ SAVE WORKER X ] Updating retention with elements: checks [ YY ] -- hosts [ ZZ ] in mongodb [YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ SAVE WORKER X ] Retention data saved into mongodb in 0.018 seconds |
Nous sommes donc informés de :
Si la sauvegarde se passe bien, nous sommes informés du succès de chaque worker, ainsi que du succès global si tous les workers ont réussi à sauvegarder les données dans la base mongo, et le temps qu'ils ont mis.
[YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ SAVE WORKERS ] The worker 0 did SUCCESS (after 1 try) [YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ SAVE WORKERS ] The worker 1 did SUCCESS (after 1 try) [YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ SAVE GLOBAL ] Retention data was saved into mongodb. Total time 0.28s |