Best practice for creating millions of small temporary objects Best practice for creating millions of small temporary objects java java

Best practice for creating millions of small temporary objects


Run the application with verbose garbage collection:

java -verbose:gc

And it will tell you when it collects. There would be two types of sweeps, a fast and a full sweep.

[GC 325407K->83000K(776768K), 0.2300771 secs][GC 325816K->83372K(776768K), 0.2454258 secs][Full GC 267628K->83769K(776768K), 1.8479984 secs]

The arrow is before and after size.

As long as it is just doing GC and not a full GC you are home safe. The regular GC is a copy collector in the 'young generation', so objects that are no longer referenced are simply just forgotten about, which is exactly what you would want.

Reading Java SE 6 HotSpot Virtual Machine Garbage Collection Tuning is probably helpful.


Since version 6, the server mode of JVM employs an escape analysis technique. Using it you can avoid GC all together.


Well, there are several questions in one here !

1 - How are short-lived objects managed ?

As previously stated, the JVM can perfectly deal with a huge amount of short lived object, since it follows the Weak Generational Hypothesis.

Note that we are speaking of objects that reached the main memory (heap). This is not always the case. A lot of objects you create does not even leave a CPU register. For instance, consider this for-loop

for(int i=0, i<max, i++) {  // stuff that implies i}

Let's not think about loop unrolling (an optimisations that the JVM heavily performs on your code). If max is equal to Integer.MAX_VALUE, you loop might take some time to execute. However, the i variable will never escape the loop-block. Therefore the JVM will put that variable in a CPU register, regularly increment it but will never send it back to the main memory.

So, creating millions of objects are not a big deal if they are used only locally. They will be dead before being stored in Eden, so the GC won't even notice them.

2 - Is it useful to reduce the overhead of the GC ?

As usual, it depends.

First, you should enable GC logging to have a clear view about what is going on. You can enable it with -Xloggc:gc.log -XX:+PrintGCDetails.

If your application is spending a lot of time in a GC cycle, then, yes, tune the GC, otherwise, it might not be really worth it.

For instance, if you have a young GC every 100ms that takes 10ms, you spend 10% of your time in the GC, and you have 10 collections per second (which is huuuuuge). In such a case, I would not spend any time in GC tuning, since those 10 GC/s would still be there.

3 - Some experience

I had a similar problem on an application that was creating a huge amount of a given class. In the GC logs, I noticed that the creation rate of the application was around 3 GB/s, which is way too much (come on... 3 gigabytes of data every second ?!).

The problem : Too many frequent GC caused by too many objects being created.

In my case, I attached a memory profiler and noticed that a class represented a huge percentage of all my objects. I tracked down the instantiations to find out that this class was basically a pair of booleans wrapped in an object. In that case, two solutions were available :

  • Rework the algorithm so that I do not return a pair of booleans but instead I have two methods that return each boolean separately

  • Cache the objects, knowing that there were only 4 different instances

I chose the second one, as it had the least impact on the application and was easy to introduce. It took me minutes to put a factory with a not-thread-safe cache (I did not need thread safety since I would eventually have only 4 different instances).

The allocation rate went down to 1 GB/s, and so did the frequency of young GC (divided by 3).

Hope that helps !