Gestion du module
Sur réception du signal SIGUSR1 le module va effectuer un dump de sa mémoire, pour tout autre signal, le module va s'arrêter :
[YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ MANAGE SIGNAL ] The worker with the pid XXXX received a signal XX
Arrêt critique
Quand le processus de pilotage s'arrête de façon inopinée :
[YYYY-MM-DD HH:MM:SS] ERROR : [ SCHEDULERNAME ] [ MongodbRetention ] [ WORKER:XXXX ] I am a worker with pid: XXXX and my master process YYYY is dead, I exit.
Demande d'un dump de la mémoire
Le dump est fait
Python 2.6
[YYYY-MM-DD HH:MM:SS] ERROR : [ SCHEDULERNAME ] [ MongodbRetention ] [ WORKER pid=XXXX ] (support-only) MEMORY DUMP (to be sent to the support): xxxxxxxx xxxxxxxx xxxxxxxx
Python 2.7
[YYYY-MM-DD HH:MM:SS] ERROR : [ SCHEDULERNAME ] [ MongodbRetention ] [ WORKER pid=XXXX ] (support-only) Memory information dumped to file FFFFFFF (to be sent to the support)
Le dump a échoué
Python 2.6
[YYYY-MM-DD HH:MM:SS] ERROR : [ SCHEDULERNAME ] [ MongodbRetention ] [ WORKER pid=XXXX ] MEMORY DUMP: FAIL check if guppy lib is installed
Python 2.7
[YYYY-MM-DD HH:MM:SS] ERROR : [ SCHEDULERNAME ] [ MongodbRetention ] [ WORKER pid=XXXX ] (support-only) MEMORY DUMP: FAIL check if meliae lib is installed
Connexion à la base de données
| Section | Description |
|---|---|
| LOAD RETENTION | Correspond au chargement de la rétention |
| DELETE OLD RETENTION | Correspond à la suppression des anciennes rétentions |
| SAVE | Correspond à la sauvegarde |
Connexion normale
[YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ SOUS-SECTION ] We are creating mongo connection [uri=mongodb://192.168.1.120/?safe=false] [database=shinken] [ssh=True] [YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ SOUS-SECTION ] Connection created in : 0.200s
Il y indique :
- L'URL utilisée
- La base de données (peut être différente du défaut "shinken" comme ici)
- Si un tunnel SSH va être utilisé ou pas
- Le temps prit pour se connecter à la base mongo
La connexion échoue
[YYYY-MM-DD HH:MM:SS] WARNING: [ SCHEDULERNAME ] [ MongodbRetention ] [ SOUS-SECTION ] Mongo connection failed 1/X time, we will try again [YYYY-MM-DD HH:MM:SS] WARNING: [ SCHEDULERNAME ] [ MongodbRetention ] [ SOUS-SECTION ] Mongo connection failed Y/X times, we will try again [YYYY-MM-DD HH:MM:SS] ERROR : [ SCHEDULERNAME ] [ MongodbRetention ] [ SOUS-SECTION ] Mongo connection failed X/X times, we stop trying
La connexion a été perdue ou n'existe pas
[YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ SOUS-SECTION ] We need to create a mongo connection
La connexion n'a pas pu être établie
[YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ SOUS-SECTION ] Could not create mongo connection
Erreur de configuration du module
Si plusieurs url mongo sont précisées
[YYYY-MM-DD HH:MM:SS] ERROR : [ SCHEDULERNAME ] [ MODULES-MANAGER ] The instance MongodbRetention raised an error: Multiples urls were found in the module's configuration file. I disable it and set it to restart it later
Sauvegarde en rétention
Pour la sauvegarde de la rétention, trois SOUS-SECTION existent:
| Section | Description |
|---|---|
| SAVE GLOBAL | Correspond au processus global de la sauvegarde |
| SAVE WORKERS | Correspond à un sous-processus de SAVE GLOBAL, qui s'occupe de la file d'attente des différents workers de la sauvegarde |
| SAVE WORKER X | C'est un sous-processus de SAVE WORKERS, correspondant à un worker numéroté X qui permet de sauvegarder une partie des informations du scheduler en base. Le nombre de workers est paramétrable dans les paramètres du module. ( voir Rétention en base de données centralisée par royaume ( Module MongodbRetention ) ) |
SAVE GLOBAL
Les logs SAVE GLOBAL donnent des informations relatives au fonctionnement global du module ou de sa configuration.
Avant de faire la rétention, le module nous informe de l'URI utilisé ainsi que du nombre total d'hôtes et de checks à sauvegarder.
[YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ SAVE GLOBAL ] Starting to save retention with VV worker(s). [ XX:hosts/clusters ] [ YY:checks ] ( Database used = mongodb://127.0.0.1safe=false, use ssh = 0 ), max time allowed for the save ZZ seconds
Dans l'exemple :
- VV : Le nombre de workers lancés en parallèle pour effectuer la sauvegarde.
- XX : Le nombre d'hôtes et clusters qui vont être sauvegardés.
- YY : Le nombre de checks qui vont être sauvegardés.
- ZZ : Le temps défini pour que la sauvegarde de la rétention se réalise
[YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ SAVE GLOBAL ] Retention was saved into mongodb. Total time X.XXs
Erreurs
Les erreurs lors de la sauvegarde de la rétention sont aussi enregistrées dans les logs sous cette forme:
[YYYY-MM-DD HH:MM:SS] ERROR : [ SCHEDULERNAME ] [ MODULES-MANAGER ] The instance MongodbRetention raised an error: ERROR MESSAGE. Total time XX.XXs. I disable it and set it to restart it later
Exemples
[YYYY-MM-DD HH:MM:SS] ERROR : [ SCHEDULERNAME ] [ MODULES-MANAGER ] The instance MongodbRetention raised an error: [ SAVE GLOBAL ] FAILED Retention data could not be saved in mongodb. Total time 22.20s. I disable it and set it to restart it later
[YYYY-MM-DD HH:MM:SS] ERROR : [ SCHEDULERNAME ] [ MODULES-MANAGER ] The instance MongodbRetention raised an error: [ SAVE GLOBAL ] FAILED Retention could not be saved in mongodb because mongo is unreachable. Total time 123.46s. I disable it and set it to restart it later
SAVE WORKERS
Les logs SAVE WORKERS donnent l'état de chaque worker de sa création à son succès/échec.
[YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ SAVE WORKERS ] Starting worker X with pid XXXX. Try: [ Y ], max time allowed [ ZZs ] [YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ SAVE WORKERS ] The worker X successfully ended ( after Y tries )
La préparation des données à sauvegarder a été longue :
[YYYY-MM-DD HH:MM:SS] WARNING: [ SCHEDULERNAME ] [ MongodbRetention ] [ PERF ] [ X.XXXs ] atomization duration
Des erreurs empêchent le bon déroulé de la sauvegarde :
[YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ SAVE WORKERS ] some workers did fail to exit or encountered an error. The retention save can be incomplete.
[YYYY-MM-DD HH:MM:SS] ERROR : [ SCHEDULERNAME ] [ MongodbRetention ] [ SAVE WORKERS ] Too many tries failed
[YYYY-MM-DD HH:MM:SS] ERROR : [ SCHEDULERNAME ] [ MongodbRetention ] [ SAVE WORKERS ] Cannot start the XXXX worker process as there is not enough memory
[YYYY-MM-DD HH:MM:SS] ERROR : [ SCHEDULERNAME ] [ MongodbRetention ] [ SAVE WORKERS ] Cannot start the worker XXXX process: XX. Exiting the retention save, killing all currently launched workers
[YYYY-MM-DD HH:MM:SS] ERROR : [ SCHEDULERNAME ] [ MongodbRetention ] [ SAVE WORKERS ] ERROR MESSAGE [YYYY-MM-DD HH:MM:SS] ERROR : [ SCHEDULERNAME ] [ MongodbRetention ] [ SAVE WORKERS ] "EXCEPTION PYTHON"
SAVE WORKER X
Les logs SAVE WORKER X donnent pour le worker ayant l'identifiant X, les statistiques sur les sauvegardes qu'il a effectuées.
[YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ SAVE WORKER X ] Preparing elements to save [YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ SAVE WORKER X ] Took X.XXms to prepare XXX hosts/clusters and XXXX checks [YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ SAVE WORKER X ] Took X.XXms to connect to Mongo [YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ SAVE WORKER X ] hosts/clusters will be saved in groups of maximum 1000 [YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ SAVE WORKER X ] Saved XXX/XXX hosts/clusters ( took X.XXms ) [YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ SAVE WORKER X ] Took X.XXms to save XXX hosts/clusters [YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ SAVE WORKER X ] checks will be saved in groups of maximum 1000 [YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ SAVE WORKER X ] Saved XXXX/XXXX checks ( took X.XXms ) [YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ SAVE WORKER X ] Saved XXXX/XXXX checks ( took X.XXms ) [YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ SAVE WORKER X ] Took X.XXms to save XXXX checks [YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ SAVE WORKER X ] Worker ended in X.XXms
Nous sommes donc informés :
- Du démarrage du worker
- Du temps que le worker met a préparer les éléments ( sélection, sérialisation)
- Du temps prit pour se connecter à la base Mongo
- De la taille des groupes d'éléments sauvegardés
- De l'avancement de chaque groupe et du temps prit
- Du temps total pris par le worker
Erreurs
Perte de connexion à la base de données
[YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ SAVE WORKER X ] [ MONGO ] Mongo raised ( Mongo connection failure to xxxxxxx ) on the operation get_connection. Operation failed : 1/5 [YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ SAVE WORKER X ] [ MONGO ] Mongo raised ( Mongo connection failure to xxxxxxx ) on the operation get_connection. Operation failed : 2/5 [YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ SAVE WORKER X ] [ MONGO ] Mongo raised ( Mongo connection failure to xxxxxxx ) on the operation get_connection. Operation failed : 3/5 [YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ SAVE WORKER X ] [ MONGO ] Mongo raised ( Mongo connection failure to xxxxxxx ) on the operation get_connection. Operation failed : 4/5 [YYYY-MM-DD HH:MM:SS] ERROR : [ SCHEDULERNAME ] [ MongodbRetention ] [ SAVE WORKER X ] [ MONGO ] Mongo raised ( Mongo connection failure to xxxxxxx ) on the operation get_connection. Operation failed : 5/5. We tried 5 times but it kept failing. [YYYY-MM-DD HH:MM:SS] ERROR : [ SCHEDULERNAME ] [ MongodbRetention ] [ SAVE WORKER X ] After 5 tries, worker could not connect to mongo :[Mongo raised ( Mongo connection failure to xxxxxxx ) on the operation get_connection. Operation failed : 5/5. We tried 5 times but it kept failing.]
Erreur Inconnue
[YYYY-MM-DD HH:MM:SS] ERROR : [ SCHEDULERNAME ] [ MongodbRetention ] [ SAVE WORKER X ] Worker has an error:[ ERROR MESSAGE ] [YYYY-MM-DD HH:MM:SS] ERROR : [ SCHEDULERNAME ] [ MongodbRetention ] [ SAVE WORKER X ] (pid=XXXX) "EXCEPTION PYTHON"
Chargement de la rétention
Les logs fournissent des informations liées au chargement de la rétention, permettant de suivre son avancée et l'état sur la connexion à Mongo.
[YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ LOAD RETENTION ] Try to open a Mongodb connection to [ mongodb://127.0.0.1/?safe=false ] database [ shinken ] [YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ LOAD RETENTION ] Mongo connection established in 4.94ms [YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ LOAD RETENTION ] [ HOSTS/CLUSTERS ] Scheduler has XXX/XXX hosts/clusters in its cache and need load retention for XXX/XXX [YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ LOAD RETENTION ] [ HOSTS/CLUSTERS ] Took 3.52ms to load XX/XX hosts/clusters [YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ LOAD RETENTION ] [ CHECKS ] Scheduler has YYY/YYY checks in its cache and need load retention for YYY/YYY [YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ LOAD RETENTION ] [ CHECKS ] Took 28.00ms to load YYY/YYY checks [YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ LOAD RETENTION ] Took 32.07ms to load ZZZ/ZZZ elements [YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ LOAD RETENTION ] Took 5.99ms to restore data to Scheduler
Erreurs
Les erreurs lors du chargement de la rétention sont aussi enregistrées dans les logs sous cette forme:
[YYYY-MM-DD HH:MM:SS] ERROR : [ SCHEDULERNAME ] [ MongodbRetention ] [ LOAD RETENTION ] FAILED Retention could not be loaded from mongodb: ERROR MESSAGE DETAILS
[YYYY-MM-DD HH:MM:SS] ERROR : [ SCHEDULERNAME ] [ MongodbRetention ] [ LOAD RETENTION ] error querying hosts/clusters entries: ERROR MESSAGE. Module exiting.
[YYYY-MM-DD HH:MM:SS] ERROR : [ SCHEDULERNAME ] [ MongodbRetention ] [ LOAD RETENTION ] error querying checks entries: ERROR MESSAGE. Module exiting.
Suppression des anciennes rétentions
Les logs de suppression permettent de voir le nombre d'objets supprimés (triés par hôtes et checks) ainsi que la date à partir de laquelle la rétention est conservée.
[YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ DELETE OLD RETENTION ] Checking old elements ( hosts/clusters/checks ) not updated since 7 days -> YYYY-MM-DD HH:MM UTC [YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ DELETE OLD RETENTION ] - XXX hosts/clusters deleted in 377.65ms [YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ DELETE OLD RETENTION ] - YYY checks deleted in 184.476ms [YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ DELETE OLD RETENTION ] Total time for deleting X old elements = 562.126ms
[YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ DELETE OLD RETENTION ] Checking old elements ( hosts/clusters/checks ) not updated since 7 days -> YYYY-MM-DD HH:MM UTC [YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ DELETE OLD RETENTION ] - There is no data to delete [YYYY-MM-DD HH:MM:SS] INFO : [ SCHEDULERNAME ] [ MongodbRetention ] [ DELETE OLD RETENTION ] Total time for deleting X old elements = 1.17ms
Erreur : perte de connexion à la base de données
[YYYY-MM-DD HH:MM:SS] WARNING: [ SCHEDULERNAME ] [ MongodbRetention ] [ DELETE OLD RETENTION ] We have been disconnected of mongo. Will retry [1/3] [YYYY-MM-DD HH:MM:SS] WARNING: [ SCHEDULERNAME ] [ MongodbRetention ] [ DELETE OLD RETENTION ] We have been disconnected of mongo. Will retry [2/3] [YYYY-MM-DD HH:MM:SS] WARNING: [ SCHEDULERNAME ] [ MongodbRetention ] [ DELETE OLD RETENTION ] We have been disconnected of mongo. Will retry [3/3] [YYYY-MM-DD HH:MM:SS] ERROR : [ SCHEDULERNAME ] [ MongodbRetention ] [ DELETE OLD RETENTION ] After 3 tries, we couldn't connect to mongo
[YYYY-MM-DD HH:MM:SS] ERROR : [ SCHEDULERNAME ] [ MongodbRetention ] [ DELETE OLD RETENTION ] We have an error:[ERROR MESSAGE] [YYYY-MM-DD HH:MM:SS] ERROR : [ SCHEDULERNAME ] [ MongodbRetention ] [ DELETE OLD RETENTION ] "EXCEPTION PYTHON"