Redis Queue + python-rq: Right pattern to prevent high memory usage?

python heroku asynchronous redis redistogo

After two more days of playing around, I have found the problem. I would like to share this with you, along with the tools that were helpful:

Core Problem

The actual problem was that we had overlooked to cast an object to a string before saving it to the PostgreSQL database. Without this cast, the string representation ended up in the DB (due to the __str__() function of the respective object returning exactly the representation we wanted); however, to Redis, the whole object was passed. After passing it to Redis, the associated task crashed with an UnpickleError exception. This consumed 5 MB RAM that were not freed up after the crash.

Additional Actions

To reduce memory footprint further, we implemented the following supplementary actions (mind that we are saving everything to a separate DB so the results that Redis saves are not used at all in our application):

We set the TTL of the task result to 0 with the call enqueue_call([...] result_ttl=0)
We defined a custom Exception handler - black_hole - to take all exceptions and return False. This prevents Redis from moving a task to the failed queue where it would still use a bit of memory. Exceptions are beforehand sent via e-mail to us to keep track of them.

Useful tools along the way:

We just worked with redis-cli.

redis-cli info | grep used_memory_human --> shows current memory usage. ideal to compare memory footprint before and after a task was executed.
redis-cli keys '*' --> shows all current keys that exist. This overview led me to the insight that some tasks are not deleted even though they should have been (as written above, they crashed with an UnpickleError and because of this were not removed).
redis-cli monitor --> shows a realtime overview of what is happening in Redis. This helped me find out that the objects that were moved back and forth were too massive.
redis-cli debug object <key> --> shows a dump of the key's value.
redis-cli hgetall <key> --> shows a more readable dump of the key's value (especially useful for the specific use case of using Redis purely as task queue, since it seems that the tasks are created by python-rq in this format.

Furthermore, I can answer some of the questions I had posted above:

From the docs I know that the 500 sec TTL means that a key is then "expired", but not really deleted. Does the key still consume memory at this point? Can I somehow change this behavior?

Actually, they are deleted, just as the docs imply.

Does it have something to do with the failed queue (which apparently does not have a TTL attached to the jobs, meaning (I think) that these are kept forever)?

Surprisingly, the jobs for which Redis itself crashed were not moved to the Failed Queue, they were just "abandoned", meaning the values remained but RQ didn't care about it the normal way it does with failed jobs.

Relevant Documentation

Redis Commands: http://redis.io/commands
"Black Hole" exception handler and request_ttl in python-rq: http://python-rq.org/docs/

python heroku asynchronous redis redistogo

If you are using the "Black Hole" exception handler from http://python-rq.org/docs/exceptions/, you should also add job.cancel() there:

def black_hole(job, *exc_info):    # Delete the job hash on redis, otherwise it will stay on the queue forever    job.cancel()    return False

python heroku asynchronous redis redistogo

A thing that wasn't immediately obvious to me is that an RQ job has both 'description' and 'data' properties. If not specified, the description is set as a string representation of the data, which in my case was unnecessarily verbose. Explicitly setting the description to a short summary saved me that overhead.

enqueue(func, longdata, description='short job summary')

CodeHunter

Redis Queue + python-rq: Right pattern to prevent high memory usage?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last