Multiple Python Processes slow Multiple Python Processes slow unix unix

Multiple Python Processes slow


Check the ulimit and quota for the box and the user running the scripts. /etc/security/limits.conf may also contain resource restrictions that you might want to modify.

ulimit -n will show the max number of open file descriptors allowed.

  • Might this have been exceeded with all of the open sockets?
  • Is the script closing each sockets when it's done with it?

You can also check the fd's with ls -l /proc/[PID]/fd/ where [PID] is the process id of one of the scripts.

Would need to see some code to tell what's really going on..


Edit (Importing comments and more troubleshooting ideas):

Can you show the code where your opening and closing the connections?
When just run a few script processes are running, do they too start to go idle after a while? Or is it only when there are several hundred+ running at once that this happens?
Is there a single parent process that starts all of these scripts?

If your using s = urllib2.urlopen(someURL), make sure to s.close() when your done with it. Python can often close things down for you (like if your doing x = urllib2.urlopen(someURL).read()), but it will leave that to you if you if told to (such as assigning a variable to the return value of .urlopen()). Double check your opening and closing of urllib calls (or all I/O code to be safe). If each script is designed to only have 1 open socket at a time, and your /proc/PID/fd is showing multiple active/open sockets per script process, then there is definitely a code issue to fix.

ulimit -n showing 1024 is giving the limit of open socket/fd's that the mysql user can have, you can change this with ulimit -S -n [LIMIT_#] but check out this article first:
Changing process.max-file-descriptor using 'ulimit -n' can cause MySQL to change table_open_cache value.

You may need to log out and shell back in after. And/Or add it to /etc/bashrc (don't forget to source /etc/bashrc if you change bashrc and don't want to log out/in).

Disk space is another thing that I have found out (the hard way) can cause very weird issues. I have had processes act like they are running (not zombied) but not doing what is expected because they had open handles to a log file on a partition with zero disk space left.

netstat -anpTee | grep -i mysql will also show if these sockets are connected/established/waiting to be closed/waiting on timeout/etc.

watch -n 0.1 'netstat -anpTee | grep -i mysql' to see the sockets open/close/change state/etc in real time in a nice table output (may need to export GREP_OPTIONS= first if you have it set to something like --color=always).

lsof -u mysql or lsof -U will also show you open FD's (the output is quite verbose).


import urllib2import socketsocket.settimeout(15) # or settimeout(0) for non-blocking:#In non-blocking mode (blocking is the default), if a recv() call # doesn’t find any data, or if a send() call can’t# immediately dispose of the data,# a error exception is raised.#......try:    s = urllib2.urlopen(some_url)    # do stuff with s like s.read(), s.headers, etc..except (HTTPError, etcError):    # myLogger.exception("Error opening: %s!", some_url)finally:    try:        s.close()    # del s - although, I don't know if deleting s will help things any.    except:        pass

Some man pages and reference links:


Solved! - with massive help from Chown - thank you very much!

The slow down was because I was not setting socket timeout and as such over a period of time the robots where hanging trying to read data that did not exist. Adding a simple

timeout = 5socket.setdefaulttimeout(timeout)

solved it (shame on me - but in my defence I am still learning python)

The memory leak is down to urllib and the version of python I am using. After a lot of googling it appears it is a problem with nested urlopens - lots of post online about it when you work out how to ask the right question of Google.

Thanks all for your help.

EDIT:

Something that also helped the memory leak issue (although not solved it completely) was doing manual garbage collection:

import gcgc.collect

Hope it helps someone else.


Another system resource to take into account is ephemeral ports /proc/sys/net/ipv4/ip_local_port_range (on Linux). Together with /proc/sys/net/ipv4/tcp_fin_timeout they limit the number of concurrent connections.

From Benchmark of Python WSGI Servers:

This basically enables the server to open LOTS of concurrent connections.

echo “10152 65535″ > /proc/sys/net/ipv4/ip_local_port_rangesysctl -w fs.file-max=128000sysctl -w net.ipv4.tcp_keepalive_time=300sysctl -w net.core.somaxconn=250000sysctl -w net.ipv4.tcp_max_syn_backlog=2500sysctl -w net.core.netdev_max_backlog=2500ulimit -n 10240