How do I determine the right number of Puma workers and threads to run on a Heroku Performance dyno? How do I determine the right number of Puma workers and threads to run on a Heroku Performance dyno? multithreading multithreading

How do I determine the right number of Puma workers and threads to run on a Heroku Performance dyno?


The roadmap I currently figured out like this:

  1. heroku run "cat /proc/cpuinfo" --size performance-m --app yourapp
  2. heroku run "cat /proc/cpuinfo" --size performance-l --app yourapp
  3. Write down the process information you have
  4. Googling model type, family, model, step number of Intel processor, and looking for how many core does this processor has or simulates.
  5. Take a look this https://devcenter.heroku.com/articles/dynos#process-thread-limits
  6. Do some small experiments with standard-2X / standard-1X to determine PUMA_WORKER value.
  7. Do your math like this:

(Max Threads of your desired dyno type could support) / (Max Threads of baseline dyno could support) x (Your experiment `PUMA_WORKER` value on baseline dyno) - (Number of CPU core)

For example, if the PUMA_WORKER is 3 on my standard-2X dyno as baseline, then the PUMA_WORKER number on performance-m I would start to test it out would be:

16384 / 512 * 3 - 4 = 92

You should also consider how much memory your app consumes and pick the lowest one.

EDIT: Previously my answer was written before ps:exec available. You could read the official document and learn how to ssh into running dyno(s). It should be quite easier than before.


Currently facing the same issue for an application running in production in AWS (we are using ECS), and trying to define the good fit between:

  • Quantity of vCPU / Ram per instance
  • Number of instances
  • Number of puma_threads running per instance (each instance is having a single puma process)

In order to better understand how our application is consuming the pool of puma_threads we did the following:

  • Export puma metrics to cloudwatch (threads running + backlog), we then saw that around 15 concurent threads, the backlog is starting to grow.
  • Put this in comparaison with vCPU (usage), we saw that our vCPU was never above 25%

Using these two informations together we decided to take the actions described above.

Finally I would like to share this article, that I found very interesting about this topic.