Multiple Python Processes slow
Check the ulimit
and quota
for the box and the user running the scripts. /etc/security/limits.conf
may also contain resource restrictions that you might want to modify.
ulimit -n
will show the max number of open file descriptors allowed.
- Might this have been exceeded with all of the open sockets?
- Is the script closing each sockets when it's done with it?
You can also check the fd's with ls -l /proc/[PID]/fd/
where [PID]
is the process id of one of the scripts.
Would need to see some code to tell what's really going on..
Edit (Importing comments and more troubleshooting ideas):
Can you show the code where your opening and closing the connections?
When just run a few script processes are running, do they too start to go idle after a while? Or is it only when there are several hundred+ running at once that this happens?
Is there a single parent process that starts all of these scripts?
If your using s = urllib2.urlopen(someURL)
, make sure to s.close()
when your done with it. Python can often close things down for you (like if your doing x = urllib2.urlopen(someURL).read()
), but it will leave that to you if you if told to (such as assigning a variable to the return value of .urlopen()
). Double check your opening and closing of urllib calls (or all I/O code to be safe). If each script is designed to only have 1 open socket at a time, and your /proc/PID/fd
is showing multiple active/open sockets per script process, then there is definitely a code issue to fix.
ulimit -n
showing 1024
is giving the limit of open socket/fd's that the mysql user can have, you can change this with ulimit -S -n [LIMIT_#]
but check out this article first:
Changing process.max-file-descriptor using 'ulimit -n' can cause MySQL to change table_open_cache value.
You may need to log out and shell back in after. And/Or add it to /etc/bashrc
(don't forget to source /etc/bashrc
if you change bashrc
and don't want to log out/in).
Disk space is another thing that I have found out (the hard way) can cause very weird issues. I have had processes act like they are running (not zombied) but not doing what is expected because they had open handles to a log file on a partition with zero disk space left.
netstat -anpTee | grep -i mysql
will also show if these sockets are connected/established/waiting to be closed/waiting on timeout/etc.
watch -n 0.1 'netstat -anpTee | grep -i mysql'
to see the sockets open/close/change state/etc in real time in a nice table output (may need to export GREP_OPTIONS=
first if you have it set to something like --color=always
).
lsof -u mysql
or lsof -U
will also show you open FD's (the output is quite verbose).
import urllib2import socketsocket.settimeout(15) # or settimeout(0) for non-blocking:#In non-blocking mode (blocking is the default), if a recv() call # doesn’t find any data, or if a send() call can’t# immediately dispose of the data,# a error exception is raised.#......try: s = urllib2.urlopen(some_url) # do stuff with s like s.read(), s.headers, etc..except (HTTPError, etcError): # myLogger.exception("Error opening: %s!", some_url)finally: try: s.close() # del s - although, I don't know if deleting s will help things any. except: pass
Some man pages and reference links:
Solved! - with massive help from Chown - thank you very much!
The slow down was because I was not setting socket timeout and as such over a period of time the robots where hanging trying to read data that did not exist. Adding a simple
timeout = 5socket.setdefaulttimeout(timeout)
solved it (shame on me - but in my defence I am still learning python)
The memory leak is down to urllib and the version of python I am using. After a lot of googling it appears it is a problem with nested urlopens - lots of post online about it when you work out how to ask the right question of Google.
Thanks all for your help.
EDIT:
Something that also helped the memory leak issue (although not solved it completely) was doing manual garbage collection:
import gcgc.collect
Hope it helps someone else.
Another system resource to take into account is ephemeral ports /proc/sys/net/ipv4/ip_local_port_range
(on Linux). Together with /proc/sys/net/ipv4/tcp_fin_timeout
they limit the number of concurrent connections.
From Benchmark of Python WSGI Servers:
This basically enables the server to open LOTS of concurrent connections.
echo “10152 65535″ > /proc/sys/net/ipv4/ip_local_port_rangesysctl -w fs.file-max=128000sysctl -w net.ipv4.tcp_keepalive_time=300sysctl -w net.core.somaxconn=250000sysctl -w net.ipv4.tcp_max_syn_backlog=2500sysctl -w net.core.netdev_max_backlog=2500ulimit -n 10240