How should I log while using multiprocessing in Python? How should I log while using multiprocessing in Python? python python

How should I log while using multiprocessing in Python?


I just now wrote a log handler of my own that just feeds everything to the parent process via a pipe. I've only been testing it for ten minutes but it seems to work pretty well.

(Note: This is hardcoded to RotatingFileHandler, which is my own use case.)


Update: @javier now maintains this approach as a package available on Pypi - see multiprocessing-logging on Pypi, github at https://github.com/jruere/multiprocessing-logging


Update: Implementation!

This now uses a queue for correct handling of concurrency, and also recovers from errors correctly. I've now been using this in production for several months, and the current version below works without issue.

from logging.handlers import RotatingFileHandlerimport multiprocessing, threading, logging, sys, tracebackclass MultiProcessingLog(logging.Handler):    def __init__(self, name, mode, maxsize, rotate):        logging.Handler.__init__(self)        self._handler = RotatingFileHandler(name, mode, maxsize, rotate)        self.queue = multiprocessing.Queue(-1)        t = threading.Thread(target=self.receive)        t.daemon = True        t.start()    def setFormatter(self, fmt):        logging.Handler.setFormatter(self, fmt)        self._handler.setFormatter(fmt)    def receive(self):        while True:            try:                record = self.queue.get()                self._handler.emit(record)            except (KeyboardInterrupt, SystemExit):                raise            except EOFError:                break            except:                traceback.print_exc(file=sys.stderr)    def send(self, s):        self.queue.put_nowait(s)    def _format_record(self, record):        # ensure that exc_info and args        # have been stringified.  Removes any chance of        # unpickleable things inside and possibly reduces        # message size sent over the pipe        if record.args:            record.msg = record.msg % record.args            record.args = None        if record.exc_info:            dummy = self.format(record)            record.exc_info = None        return record    def emit(self, record):        try:            s = self._format_record(record)            self.send(s)        except (KeyboardInterrupt, SystemExit):            raise        except:            self.handleError(record)    def close(self):        self._handler.close()        logging.Handler.close(self)


The only way to deal with this non-intrusively is to:

  1. Spawn each worker process such that its log goes to a different file descriptor (to disk or to pipe.) Ideally, all log entries should be timestamped.
  2. Your controller process can then do one of the following:
    • If using disk files: Coalesce the log files at the end of the run, sorted by timestamp
    • If using pipes (recommended): Coalesce log entries on-the-fly from all pipes, into a central log file. (E.g., Periodically select from the pipes' file descriptors, perform merge-sort on the available log entries, and flush to centralized log. Repeat.)


QueueHandler is native in Python 3.2+, and does exactly this. It is easily replicated in previous versions.

Python docs have two complete examples: Logging to a single file from multiple processes

For those using Python < 3.2, just copy QueueHandler into your own code from: https://gist.github.com/vsajip/591589 or alternatively import logutils.

Each process (including the parent process) puts its logging on the Queue, and then a listener thread or process (one example is provided for each) picks those up and writes them all to a file - no risk of corruption or garbling.