Performance and memory usage in Java arrays vs C++ arrays Performance and memory usage in Java arrays vs C++ arrays database database

Performance and memory usage in Java arrays vs C++ arrays


why C/C++ is always preferable on large database/data-structure over Java ? Because, C may be, but C++ is also an OOP. So, how it get advantage over Java ?

Remember that a java array (of objects)1 is actually an array of references. For simplicity let's look at a 1D array:

java:

[ref1,ref2,ref3,...,refN]ref1 -> object1ref2 -> object2...refN -> objectN

c++:

[object1,object2,...,objectN]

The overhead of references is not needed in the array when using the C++ version, the array holds the objects themselves - and not only their references. If the objects are small - this overhead might indeed be significant.

Also, as I already stated in comments - there is another issue when allocating small objects in C++ in arrays vs java. In C++, you allocate an array of objects - and they are contiguous in the memory, while in java - the objects themselves aren't. In some cases, it might cause the C++ to have much better performance, because it is much more cache efficient then the java program. I once addressed this issue in this thread

2) Should I stay on Java or their suggestion (switch to C++) will be helpful in future on large database/data-structure environment ? Any suggestion ?

I don't believe we can answer it for you. You should be aware of all pros and cons (memory efficiency, libraries you can use, development time, ...) of each for your purpose and make a decision. Don't be afraid to get advises from seniors developers in your company who have more information about the system then we are.
If there was a simple easy and generic answer to this questions - we engineers were not needed, wouldn't we?

You can also profile your code with the expected array size and a stub algorithm before implementing the core and profile it to see what the real difference is expected to be. (Assuming the array is indeed the expected main space consumer)


1: The overhead I am describing next is not relevant for arrays of primitives. In these cases (primitives) the arrays are arrays of values, and not of references, same as C++, with minor overhead for the array itself (length field, for example).


It sounds like you are in inexperienced programmer in a new job. The chances are that "they" have been in the business a long time, and know (or at least think they know) more about the domain and its programming requirements than you do.

My advice is to just do what they insist that you do. If they want the code in C or C++, just write it in C or C++. If you think you are going to have difficulties because you don't know much C / C++ ... warn them up front. If they still insist, they can wear the responsibility for any problems and delays their insistence causes. Just make sure that you do your best ... and try not to be a "squeaky wheel".


1) They have seen Array [Int-Max] [Int-Max] in Java will take nearly 1.5 times more memory than C and C++ takes some what reasonable memory footprint than Java.

That is feasible, though it depends on what is in the arrays.

  • Java can represent large arrays of most primitive types using close to optimal amounts of memory.

  • On the other hand, arrays of objects in Java can take considerably more space than in C / C++. In C++ for example, you would typically allocate a large array using new Foo[largeNumber] so that all of the Foo instances are part of the array instance. In Java, new Foo[largeNumber] is actually equivalent to new Foo*[largeNumber]; i.e. an array of pointers, where each pointer typically refers to a different object / heap node. It is easy to see how this can take a lot more space.

2) C/C++ can handle arbitrarily large file where as Java can't.

There is a hard limit to the number of elements in a single 1-D Java array ... 2^31. (You can work around this limit, but it will make your code more complicated.)

On the other hand if you are talking about simply reading and writing files, Java can handle individual files up to 2^63 bytes ... which is more than you could possibly ever want.

1) why C/C++ is always preferable on large database/data-structure over Java ? Because, C may be, but C++ is also an OOP. So, how it get advantage over Java ?

Because of the hard limit. The limit is part of the JLS and the JVM specification. It is nothing to do with OOP per se.

2) Should I stay on Java or their suggestion (switch to C++) will be helpful in future on large database/data-structure environment ? Any suggestion ?

Go with their suggestion. If you are dealing with in-memory datasets that are that large, then their concerns are valid. And even if their concerns are (hypothetically) a bit overblown it is not a good thing to be battling your superiors / seniors ...


1) They have seen Array [Int-Max] [Int-Max] in Java will take nearly 1.5 times more memory than C and C++ takes some what reasonable memory footprint than Java.

That depends on the situation. If you create an new int[1] or new int[1000] there is almost no difference in Java or C++. If you allocate data on the stack, it has a high relative difference as Java doesn't use the stack for such data.

I would first ensure this is not micro-tuning the application. Its worth remembering that one day of your time is worth (assuming you get minimum wage) is about 2.5 GB. So unless you are saving 2.5 GB per day by doing this, suspect its not worth chasing.

2) C/C++ can handle arbitrarily large file where as Java can't.

I have memory mapped a 8 TB file in a pure Java program, so I have no idea what this is about.

There is a limit where you cannot map more than 2 GB or have more than 2 billion elements in an array. You can work around this by having more than one (e.g. up to 2 billion of those)

As we have to work on such large database/data-structure, C/C++ is always preferable.

I regularly load 200 - 800 GB of data with over 5 billion entries into a single Java process (sometime more than one at a time on the same machine)

1) why C/C++ is always preferable on large database/data-structure over Java ?

There is more experience on how to do this in C/C++ than there is in Java, and their experience of how to do this is only in C/C++.

Because, C may be, but C++ is also an OOP. So, how it get advantage over Java ?

When using large datasets, its more common to use a separate database in the Java world (embedded databases are relatively rare)

Java just calls the same system calls you can in C, so there is no real difference in terms of what you can do.

2) Should I stay on Java or their suggestion (switch to C++) will be helpful in future on large database/data-structure environment ? Any suggestion ?

At the end of the day, they pay you and sometimes technical arguments are not really what matters. ;)