How should I log while using multiprocessing in Python?
I just now wrote a log handler of my own that just feeds everything to the parent process via a pipe. I've only been testing it for ten minutes but it seems to work pretty well.
(Note: This is hardcoded to
RotatingFileHandler, which is my own use case.)
Update: @javier now maintains this approach as a package available on Pypi - see multiprocessing-logging on Pypi, github at https://github.com/jruere/multiprocessing-logging
This now uses a queue for correct handling of concurrency, and also recovers from errors correctly. I've now been using this in production for several months, and the current version below works without issue.
from logging.handlers import RotatingFileHandlerimport multiprocessing, threading, logging, sys, tracebackclass MultiProcessingLog(logging.Handler): def __init__(self, name, mode, maxsize, rotate): logging.Handler.__init__(self) self._handler = RotatingFileHandler(name, mode, maxsize, rotate) self.queue = multiprocessing.Queue(-1) t = threading.Thread(target=self.receive) t.daemon = True t.start() def setFormatter(self, fmt): logging.Handler.setFormatter(self, fmt) self._handler.setFormatter(fmt) def receive(self): while True: try: record = self.queue.get() self._handler.emit(record) except (KeyboardInterrupt, SystemExit): raise except EOFError: break except: traceback.print_exc(file=sys.stderr) def send(self, s): self.queue.put_nowait(s) def _format_record(self, record): # ensure that exc_info and args # have been stringified. Removes any chance of # unpickleable things inside and possibly reduces # message size sent over the pipe if record.args: record.msg = record.msg % record.args record.args = None if record.exc_info: dummy = self.format(record) record.exc_info = None return record def emit(self, record): try: s = self._format_record(record) self.send(s) except (KeyboardInterrupt, SystemExit): raise except: self.handleError(record) def close(self): self._handler.close() logging.Handler.close(self)
The only way to deal with this non-intrusively is to:
- Spawn each worker process such that its log goes to a different file descriptor (to disk or to pipe.) Ideally, all log entries should be timestamped.
- Your controller process can then do one of the following:
- If using disk files: Coalesce the log files at the end of the run, sorted by timestamp
- If using pipes (recommended): Coalesce log entries on-the-fly from all pipes, into a central log file. (E.g., Periodically
selectfrom the pipes' file descriptors, perform merge-sort on the available log entries, and flush to centralized log. Repeat.)
QueueHandler is native in Python 3.2+, and does exactly this. It is easily replicated in previous versions.
Python docs have two complete examples: Logging to a single file from multiple processes
Each process (including the parent process) puts its logging on the
Queue, and then a
listener thread or process (one example is provided for each) picks those up and writes them all to a file - no risk of corruption or garbling.