Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Make by tools (01.00.01) - action=clean_macro_parameter
Scroll Ignore
scroll-viewporttrue
scroll-pdftrue
scroll-officetrue
scroll-chmtrue
scroll-docbookhtmltruefalse
scroll-eclipsehelpdocbooktrue
scroll-epubeclipsehelptrue
scroll-htmlepubtrue
Panel
titleSommaire

Table of Contents
stylenone

Libération de ressources lors de la création d'un processus (fork)

Pour exécuter certaines tâches ( Modules, Workers, ... ), le Broker doit créer de nouveaux processus sur le système.

Sur les environnements UNIX et notamment Linux, la procédure pour créer un nouveau processus consiste à dupliquer le processus courant. L'appel système associé se nomme fork, ainsi, cette action est souvent appelée un fork.

Sous Linux, l'opération de fork est rapide, l'allocation effective de la mémoire du nouveau processus se faisant lorsque le nouveau processus veut y accéder.

Python disposant d'un garbage collector, un outil parcourant la mémoire pour identifier les zones inutilisées et la restituer au système, la mémoire héritée du parent finit par être intégralement copiée dans le nouveau processus.

Pour éviter une consommation inutile de ressources, après le fork, on nettoie les données inutiles issues du processus père  (mémoire principalement)

Plus concrètement, le Broker va supprimer les données produites par les modules qu'il a instanciés.

Les modules peuvent fournir une méthode pour gérer la libération de leurs ressources.

Dans tous les cas, le log suivant permet de suivre le nettoyage des ressources :

Code Block
languagejs
themeConfluence
[YYYY-MM-DD HH:MM:SS] INFO: [ MODULE_NAME ] [ CLEAN AFTER FORK ] [ pid:PID] cleanup is starting
[YYYY-MM-DD HH:MM:SS] INFO: [ MODULE_NAME ] [ CLEAN AFTER FORK ] [ pid:PID] cleanup done in X.XXXs
Code Block
languagetext
themeEmacs
titleExemple
[2021-10-11 06:10:52] INFO   : [ WebUI           ] [ CLEAN AFTER FORK ] [ pid:3736 ] cleanup is starting
[2021-10-11 06:10:52] INFO   : [ WebUI           ] [ CLEAN AFTER FORK ] [ pid:3736 ] cleanup done in 0.143s

Une erreur survient pendant le nettoyage des ressources d'un module

Si la méthode de nettoyage d'un module rencontre une erreur, on retrouvera le log qui suit.

Certaines ressources (la mémoire notamment) peuvent ne pas avoir été libérées, mais si le système en a suffisamment de disponibles, Shinken fonctionnera normalement.

Code Block
languagejs
themeConfluence
[YYYY-MM-DD HH:MM:SS] ERROR : [ MODULE_NAME ] [ CLEAN AFTER FORK ] [ pid:PID] [ MODULE_NAME2 ] On linux system, the forking mechanism (process creation) is fast but create a copy of the father process. So we have to release unnecessary resources inherited from father. The cleaning has been performed but we encountered an error:
[YYYY-MM-DD HH:MM:SS] ERROR : [ MODULE_NAME ] [ CLEAN AFTER FORK ] [ pid:PID] [ MODULE_NAME2 ] Cleanup of data from module 〖 MODULE_NAME2 〗 raised error 〖 IOError occurred 〗
[YYYY-MM-DD HH:MM:SS] ERROR : [ MODULE_NAME ] [ CLEAN AFTER FORK ] [ pid:PID] [ MODULE_NAME2 ] Some memory may have not been freed. Shinken will still run if enough memory remains available.
[YYYY-MM-DD HH:MM:SS] ERROR : [ MODULE_NAME ] [ CLEAN AFTER FORK ] [ pid:PID] [ MODULE_NAME2 ] You can report this message to support in order to optimize Shinken memory consumption
[YYYY-MM-DD HH:MM:SS] ERROR : [ MODULE_NAME ] [ CLEAN AFTER FORK ] [ pid:PID] [ MODULE_NAME2 ] ERROR stack : Traceback (most recent call last):
[YYYY-MM-DD HH:MM:SS] ERROR : [ MODULE_NAME ] [ CLEAN AFTER FORK ] [ pid:PID] [ MODULE_NAME2 ] ERREUR PYTHON 
Code Block
languagetext
themeEmacs
titleExemple
[2021-09-29 15:05:58] ERROR  : [ Livestatus       ] [ CLEAN AFTER FORK ] [ pid:20404 ] [ sla ] On linux system, the forking mechanism (process creation) is fast but create a copy of the father process. So we have to release unnecessary resources inherited from father. The cleaning has been performed but we encountered an error:
[2021-09-29 15:05:58] ERROR  : [ Livestatus       ] [ CLEAN AFTER FORK ] [ pid:20404 ] [ sla ] Cleanup of data from module 〖 sla 〗 raised error 〖 IOError occurred 〗
[2021-09-29 15:05:58] ERROR  : [ Livestatus       ] [ CLEAN AFTER FORK ] [ pid:20404 ] [ sla ] Some memory may have not been freed. Shinken will still run if enough memory remains available.
[2021-09-29 15:05:58] ERROR  : [ Livestatus       ] [ CLEAN AFTER FORK ] [ pid:20404 ] [ sla ] You can report this message to support in order to optimize Shinken memory consumption
[2021-09-29 15:05:58] ERROR  : [ Livestatus       ] [ CLEAN AFTER FORK ] [ pid:20404 ] [ sla ] ERROR stack : Traceback (most recent call last):
[2021-09-29 15:05:58] ERROR  : [ Livestatus       ] [ CLEAN AFTER FORK ] [ pid:20404 ] [ sla ]   File "/usr/lib/python2.7/site-packages/shinken/modulesmanager.py", line 877, in do_after_fork_cleanup
[2021-09-29 15:05:58] ERROR  : [ Livestatus       ] [ CLEAN AFTER FORK ] [ pid:20404 ] [ sla ]     inst_cleanup()
[2021-09-29 15:05:58] ERROR  : [ Livestatus       ] [ CLEAN AFTER FORK ] [ pid:20404 ] [ sla ]   File "/var/lib/shinken/modules/sla/sla_module_broker.py", line 93, in after_fork_cleanup
[2021-09-29 15:05:58] ERROR  : [ Livestatus       ] [ CLEAN AFTER FORK ] [ pid:20404 ] [ sla ]     raise IOError(u'IOError occurred')
[2021-09-29 15:05:58] ERROR  : [ Livestatus       ] [ CLEAN AFTER FORK ] [ pid:20404 ] [ sla ] IOError: IOError occurred

Envoi et réception de commande entre plusieurs processus

Les logs suivants permettent de suivre l'envoi et la réception de commande se faisant entre plusieurs processus.

Il existe deux types de communications par commande :

  • Communication entre le Broker et un de ses modules
  • Communication entre un module et un de ses workers

Envoi d'une commande

Debug

Warning

Dans le cas d'un premier timeout d'une réussite de l'envoi de la commande, jusqu'à sa réception, ce log s'affichera :. La commande sera alors renvoyée une deuxième fois.

Code Block
languagejs
themeConfluence
[
Code Block
[YYYY-MM-DD HH:MM:SS] DEBUG WARNING : [ NOM_DU_BROKERBROKER_NAME ] [ MODULE_NAME ] [ COMMAND CALL ] [ PID:XXXX ] [ NOMTHREAD_DU_MODULENAME ] The command call [NOM_DE_LA_COMMANDE] was executed by the module NOM_DU_MODULE in TEMPS_D'EXECUTIONs COMMAND_NAME ] for module MODULE_NAME was sent, but the call timed out (TEMPS_TIMEOUTs). Will retry one time.
Code Block
Code Block
languagetext
themeEmacs
titleExemple
[2020-11-17 09:12:11] DEBUGWARNING  : [ broker-master   ] [ Livestatus       ] [ COMMAND CALL ] [ PID:29341 ] [ CP Server Thread-46 ] The command call [get_module_info] for module Livestatus was executedsent bybut the module Livestatus in 0.143s

Warning

call timed out (1s). We will retry one time.



Il peut arriver qu'à l'envoi Dans le cas d'un premier timeout d'une commande une autre réponse soit reçue, si la précédente commande n'a pas fonctionné par exemple. Dans ce cas, ce log s'affichera. La commande sera alors renvoyée une deuxième foisafin de récupérer la bonne réponse.

Code Block
languagejs
themeConfluence
[YYYY-MM-DD HH:MM:SS] WARNING : [ NOMBROKER_DU_BROKERNAME ] [ NOMMODULE_DU_MODULENAME ] [ TheCOMMAND command call [NOM_DE_LA_COMMANDE] for module NOM_DU_MODULE did timeout (TEMPS_TIMEOUTs). We will retry one time.CALL ] [ PID:XXXX ] [ THREAD_NAME ] The command call [ COMMAND_NAME ] was sent but another answer was received. Retrying.
code
Code Block
languagetext
themeEmacs
titleExemple
[2020-11-17 09:12:11] WARNING  : [ broker-master   ] [ Livestatus       ] [ COMMAND CALL ] [ PID:29341 ] [ CP Server Thread-46 ] The command call [get_module_info] for module Livestatus was sent but theanother callanswer did timeout (1s). We will retry one time.was received. Retrying.

Error

Si l'envoi de la commande a connu un premier timeout, à son deuxième elle passera en erreur et ne sera pas renvoyée. Ce log sera affiché :

Code Block
languagejs
themeConfluence

Il peut arriver qu'à l'envoi d'une commande une autre réponse soit reçue, si la précédente commande n'a pas fonctionné par exemple. Dans ce cas, ce log s'affichera. La commande sera alors renvoyée afin de récupérer la bonne réponse.

Code Block
[YYYY-MM-DD HH:MM:SS] WARNING ERROR : [ NOMBROKER_DU_BROKERNAME ] [ NOMMODULE_DU_MODULENAME ] The[ commandCOMMAND call [NOM_DE_LA_COMMANDE] was called but another respond was present. Retrying.CALL ] [ PID:XXXX ] [ THREAD_NAME ] Failed to send command call [ COMMAND_NAME ] for module MODULE_NAME because of timeout (TEMPS_TIMEOUTs).
code
Code Block
languagetext
themeEmacs
titleExemple
[2020-11-17 09:12:11] WARNINGERROR  : [ broker-master   ] [ Livestatus       ] The[ commandCOMMAND call [get_module_infoCALL ] was[ calledPID:29341 but] another[ respond was present. Retrying.

Error

CP Server Thread-46 ] Failed to send command call [get_module_info] for module Livestatus because of timeout (1s).



Il peut arriver que la commande échoue à cause d'un problème du côté du module/worker. Dans ce cas ce log sera affiché après l'affichage de la stack.

Code Block
languagejs
themeConfluence

Si l'envoi de la commande a connu un premier timeout, à son deuxième elle passera en erreur et ne sera pas renvoyée. Ce log sera affiché :

codecode
[YYYY-MM-DD HH:MM:SS] ERROR : [ NOM_DU_BROKER ] [ NOM_DU_MODULE ]  Fail to send command call [NOM_DE_LA_COMMANDE] for module NOM_DU_MODULE because the module did timeout (TEMPS_TIMEOUTs).
] ERROR : [ BROKER_NAME ] [ MODULE_NAME ] [ COMMAND CALL ] [ PID:XXXX ] [ THREAD_NAME ] Failed to send command call [ COMMAND_NAME ] for module MODULE_NAME because of error: ERROR_MESSAGE
Code Block
languagetext
themeEmacs
titleExemple
[2020-11-17 09:12:11] ERROR  : [ broker-master   ] [ Livestatus       ] [ COMMAND CALL ] [ PID:29341 ] [ CP Server Thread-46 ] Failed to send command call [get_module_info] for module Livestatus because of error: 'int' object is not iterable

Debug

Dans le cas d'une réussite de l'envoi de la commande, jusqu'à sa réception, ce log s'affichera :

Code Block
languagejs
themeConfluence
[YYYY-MM-DD HH:MM:SS] DEBUG : [ BROKER_NAME ] [ MODULE_NAME ] [ COMMAND CALL ] [ PID:XXXX ] [ THREAD_NAME ] The command call [ COMMAND_NAME ] was executed by the module MODULE_NAME in RUNNING_TIMEs
Code Block
languagetext
themeEmacs
titleExemple
[2020-11-17 09:12:11] DEBUG  : [ broker-master   ] [ Livestatus       ] [ COMMAND CALL ] [ PID:29341 ] [ CP Server Thread-46 ] The command call [get_module_info] was executed by the module Livestatus in 0.143s

Réception d'une commande

Warning

Si une commande inconnue est reçue, ce log sera affiché :

Code Block
languagejs
themeConfluence
[YYYY-MM-DD HH:MM:SS] WARNING : [ BROKER_NAME ] [ MODULE_NAME ] [ COMMAND CALL ] [ PID:XXXX ] [ THREAD_NAME ] Received unknown command [ COMMAND_NAME ] to execute !
Code Block
languagetext
themeEmacs
titleExemple
[2020-11-17 09:12:11] WARNING  : [ broker-master   ] [ Livestatus       ] [ COMMAND CALL ] [ PID:29341 ] [ CP Server Thread-46 ] Received unknown command [get_module_info] to execute !

Error

Si la commande crash, ce log sera affiché :

Code Block
languagejs
themeConfluence
[YYYY-MM-DD HH:MM:SS] ERROR : [ BROKER_NAME ] [ MODULE_NAME ] [ COMMAND CALL ] [ PID:XXXX ] [ THREAD_NAME ] Failed to execute received command [ COMMAND_NAME ] with error: ERROR
Code Block
languagetext
themeEmacs
titleExemple
[2020-11-17 09:12:11] ERROR  : [ broker-master   ] [ Livestatus       ] [ COMMAND CALL ] [ PID:29341 ] [ CP Server Thread-46 ] Failed to execute received command [get_module_info] with error: Exception 
[2020-11-17 09:12:11] ERROR  : [ broker-master   ] [ Livestatus       ] [ COMMAND CALL ] [ PID:29341 ] [ CP Server Thread-46 ] Traceback (most recent call last):
[2020-11-17 09:12:11] ERROR  : [ broker-master   ] [ Livestatus       ] [ COMMAND CALL ] [ PID:29341 ] [ CP Server Thread-46 ]   File "C:\dev\workspace\shinken-enterprise\sources\framework\shinken\shinken\basesubprocess.py", line 117, in get_and_execute_command_from_master
[2020-11-17 09:12:11] ERROR  : [ broker-master   ] [ Livestatus       ] [ COMMAND CALL ] [ PID:29341 ] [ CP Server Thread-46 ]     result = f()
[2020-11-17 09:12:11] ERROR  : [ broker-master   ] [ Livestatus       ] [ COMMAND CALL ] [ PID:29341 ] [ CP Server Thread-46 ]   File "C:\dev\workspace\shinken-enterprise\testing\test_command_queue_handler.py", line 43, in fail_command
[2020-11-17 09:12:11] ERROR  : [ broker-master   ] [ Livestatus       ] [ COMMAND CALL ] [ PID:29341 ] [ CP Server Thread-46 ]     raise Exception
[2020-11-17 09:12:11] ERROR  : [ broker-master   ] [ Livestatus       ] [ COMMAND CALL ] [ PID:29341 ] [ CP Server Thread-46 ] Exception

Debug

Au moment de la réception de la commande, si cette dernière est exécutable, ce log sera affiché :

Code Block
languagejs
themeConfluence
[YYYY-MM-DD HH:MM:SS] DEBUG : [ BROKER_NAME ] [ MODULE_NAME ] [ COMMAND CALL ] [ PID:XXXX ] [ THREAD_NAME ] Executing command [ COMMAND_NAME ] with param  PARAMETER_LIST
Code Block
languagetext
themeEmacs
titleExemple
[2020-11-17 09:12:11] DEBUG  : [ broker-master   ] [ Livestatus       ] [ COMMAND CALL ] [ PID:29341 ] [ CP Server Thread-46 ] Executing command [get_module_info] with param []

Envoie trop long d'un brok ( Sérialisation ) lors de l'envoi au module ( et ses workers )

Si le délai dépasse le seuil WARNING ( broker__manage_brok__oversized_data_warning_threshold__serialization_time ) le log sera afficher en WARNING.
Sinon si le délai dépasse le seuil ERROR ( broker__manage_brok__oversized_data_error_threshold__serialization_time ) le log sera afficher en ERROR.

( Voir la page Le Broker)

Log avec la taille des éléments variable

Code Block
languagejs
themeConfluence
[YYYY-MM-DD HH:MM:SS] LOG_LEVEL: [ BROKER_NAME ] [  MODULE_NAME  ] [ MANAGE BROKS ] [ OVERSIZED DATA ] [ SIZE ] The brok of type "BROK_TYPE" (item uuid: ITEM_UUID) took too much time to be serialized [X.XXXs] (with size XXXXXXB) and may cause Brok management slow down. Size of potential expensive content: outputs size:XXXXXXB, current perf data size:XXXXXXB, downtimes user content size:XXXXXXB, acknowledgement user content size:XXXXXXB
Code Block
languagetext
themeEmacs
titleExemple
[20202025-1107-23 17 09:12:11] ERROR  : [ broker-master   ] [ Livestatus       ] Fail to send command call [get_module_info] for module Livestatus because the module did timeout (1s).

Il peut arriver que la commande échoue à cause d'un problème du côté du module/worker. Dans ce cas ce log sera affiché après l'affichage de la stack.

Code Block
[YYYY-MM-DD HH:MM:SS] ERROR : [ NOM_DU_BROKER ] [ NOM_DU_MODULE ] Fail to send command call [NOM_DE_LA_COMMANDE] for module NOM_DU_MODULE because of an unknown error MESSAGE_D'ERREUR
Code Block
themeEmacs
titleExemple
[2020-11-17 09:12:11] ERROR  : [ broker-master   ] [ Livestatus       ] Fail to send command call [get_module_info] for module Livestatus because of an unknown error 'int' object is not iterable

Réception d'une commande

Debug

Au moment de la réception de la commande, si cette dernière est exécutable, ce log sera affiché :

Code Block
[YYYY-MM-DD HH:MM:SS] DEBUG : [ NOM_DU_BROKER ] [ NOM_DU_MODULE ] [PID:PID_DU_PROCESSUS] Executing command [NOM_DE_LA_COMMANDE] with param LISTE_DES_PARAMETRES
Code Block
themeEmacs
titleExemple
[2020-11-17 09:12:11] DEBUG  : [ broker-master   ] [ Livestatus       ] [PID:2564] Executing command [get_module_info] with param []
57:54] WARNING: [ broker-master   ] [ event-manager-writer ] [ MANAGE BROKS ] [ OVERSIZED DATA ] [ SIZE ] The brok of type "update_service_status" (item uuid: f2f32d781a324394aa9eab104b239b6e-b8c98c745b3411eb97fd080027d2cb3b) took too much time to be serialized [0.004s] (with size 170528B) and may cause Brok management slow down. Size of potential expensive content: outputs size:3518B, current perf data size:301B, downtimes user content size:0B, acknowledgement user content size:0B
[2025-07-23 17:57:54] ERROR: [ broker-master   ] [ event-manager-writer ] [ MANAGE BROKS ] [ OVERSIZED DATA ] [ SIZE ] The brok of type "update_service_status" (item uuid: f2f32d781a324394aa9eab104b239b6e-b8c98c745b3411eb97fd080027d2cb3b) took too much time to be serialized [0.300s] (with size 250528B) and may cause Brok management slow down. Size of potential expensive content: outputs size:3518B, current perf data size:301B, downtimes user content size:0B, acknowledgement user content size:0B

Log avec le nombre des éléments variable

Code Block
languagejs
themeConfluence

Warning

Si une commande inconnue est reçue, ce log sera affiché :

code
[YYYY-MM-DD HH:MM:SS] WARNING : [ NOM_DU_BROKER:SS] LOG_LEVEL: [  BROKER_NAME ] [  MODULE_NAME  ] [ MANAGE BROKS ] [ NOM_DU_MODULEOVERSIZED DATA ] [PID:PID_DU_PROCESSUS] Received unknown command [NOM_DE_LA_COMMANDE] from father process !
Code Block
themeEmacs
titleExemple
[2020-11-17 09:12:11] WARNING  : [ broker-master   ] [ Livestatus       ] [PID:2564] Received unknown command [get_module_info] from father process !

Error

 DETAILS ] The brok of type "BROK_TYPE" (item uuid: ITEM_UUID) took too much time to be serialized [X.XXXs] (with size XXXXXXB) and may cause Brok management slow down. Detail of potential expensive content: total notifications nb:XXXXXX, incident nb:XXXXXX, parent dependencies (hosts) nb:XXXXXX, source problems nb:XXXXXX, parent dependencies (services) nb:XXXXXX, impacts (services) nb:XXXXXX, impacts (hosts) nb:XXXXXX, downtimes nb:XXXXXX, child dependencies (services) nb:XXXXXX, child dependencies (hosts) nb:XXXXXX
Code Block

Si la commande crash, ce log sera affiché :

Code Block
[YYYY-MM-DD HH:MM:SS] ERROR : [ NOM_DU_BROKER ] [ NOM_DU_MODULE ] Our father process did send us the command [NOM_DE_LA_COMMANDE] that did fail: TRACEBACK
Code Block
languagetext
themeEmacs
titleExemple
[20202025-1107-23 17 09:1257:1154] ERROR  WARNING: [ broker-master   ] [ event-manager-writer ] [ LivestatusMANAGE BROKS ] [ OVERSIZED DATA ] [ DETAILS ] Our father process did send us the command [get_module_info] that did fail: 
[2020-11-17 09:12:11] ERROR  : [ broker-master   ] [ Livestatus       ] Traceback (most recent call last):
[2020-11-17 09:12:11] ERROR  : [ broker-master   ] [ Livestatus       ]   File "C:\dev\workspace\shinken-enterprise\sources\framework\shinken\shinken\basesubprocess.py", line 117, in get_and_execute_command_from_master
[2020-11-17 09:12:11] ERROR  The brok of type "update_service_status" (item uuid: f2f32d781a324394aa9eab104b239b6e-b8c98c745b3411eb97fd080027d2cb3b) took too much time to be serialized [0.004s] (with size 170528B) and may cause Brok management slow down. Detail of potential expensive content: total notifications nb:3226, incident nb:2, parent dependencies (hosts) nb:1, source problems nb:0, parent dependencies (services) nb:0, impacts (services) nb:0, impacts (hosts) nb:0, downtimes nb:0, child dependencies (services) nb:0, child dependencies (hosts) nb:0
[2025-07-23 17:57:54] ERROR: [ broker-master    ] [ Livestatus      event-manager-writer ] [ MANAGE   result = f()
[2020-11-17 09:12:11] ERROR  : [ broker-master  BROKS ] [ OVERSIZED DATA ] [ LivestatusDETAILS ] The brok of type "update_service_status" (item ]   File "C:\dev\workspace\shinken-enterprise\testing\test_command_queue_handler.py", line 43, in fail_command
[2020-11-17 09:12:11] ERROR  : [ broker-master   ] [ Livestatus       ]     raise Exception
[2020-11-17 09:12:11] ERROR  : [ broker-master   ] [ Livestatus       ] Exceptionuuid: f2f32d781a324394aa9eab104b239b6e-b8c98c745b3411eb97fd080027d2cb3b) took too much time to be serialized [0.300s] (with size 250528B) and may cause Brok management slow down. Detail of potential expensive content: total notifications nb:3226, incident nb:2, parent dependencies (hosts) nb:1, source problems nb:0, parent dependencies (services) nb:0, impacts (services) nb:0, impacts (hosts) nb:0, downtimes nb:0, child dependencies (services) nb:0, child dependencies (hosts) nb:0