Process Json Array concurrently as well as in order as fast in Java Process Json Array concurrently as well as in order as fast in Java multithreading multithreading

Process Json Array concurrently as well as in order as fast in Java


This seems like a very good use case for parallel streams. Java will do all the hard work of splitting into separate threads and reassembling in order and you don't need to work on concurrency or threading at all.

Your code could be as simple as:

inputList.parallelStream()    .flatMap(in -> createOutputLines(in))    .forEach(out -> output(out));

Having said that, I would be very surprised if anything other than your IO has a material impact on performance. You would need to be doing very complex processing of your input for it to be more than a rounding error.


As other people noticed, you cannot gain much parallel performance (if any) from processing a sequential stream. What I'd naively do to improve your current solution:

  • prefer byte arrays wherever possible (may affect if intermediate strings are created: int -> String -> char[] -> byte[] -> output);
  • avoid intermediate conversions wherever possible and save on to-string conversions (e.g., String.valueOf may affect the item above, probably a "save-into-byte-array" version of Integer.toString (like sprintf in C) would be great);
  • avoid intermediate strings while producing a result, especially for concatenation: string concatenation may be replaced with a more efficient StringBuilder by javac (for simple ... + ... + ... expressions if I'm not wrong);
  • write elements to the output stream / writer directly, without intermediate buffering that costs as objects in the heap;
  • use corresponding overload methods (for instance, print(...) and println(...));
  • probably worth making the output generation unrolled, not in loops (don't know if a certain JVM can optimize small loops);
  • replace Gson JsonReader with a more efficient JSON parser.

Here is an example supposing you've provided the most realistic example you can:

public static void main(final String... args)        throws IOException {    // generate a sample ZIP file first    try ( final ZipOutputStream zipOutputStream = new ZipOutputStream(new FileOutputStream("./in.zip"));            final JsonWriter jsonWriter = new JsonWriter(new OutputStreamWriter(zipOutputStream)) ) {        zipOutputStream.putNextEntry(new ZipEntry("n_array.json"));        jsonWriter.beginArray();        for ( int i = 1; i <= 1_000_000; i++ ) {            jsonWriter.value(i);        }        jsonWriter.endArray();    }    // process the file    final Stopwatch stopwatch = Stopwatch.createStarted();    try ( final ZipInputStream zipInputStream = new ZipInputStream(new FileInputStream("./in.zip"));            final ZipOutputStream zipOutputStream = new ZipOutputStream(new FileOutputStream("./out.zip")) ) {        @Nullable        final ZipEntry nextEntry = zipInputStream.getNextEntry();        if ( nextEntry == null || !nextEntry.getName().equals("n_array.json") ) {            throw new AssertionError();        }        zipOutputStream.putNextEntry(new ZipEntry("n_array.lst"));        processJsonArray(zipInputStream, zipOutputStream);    }    System.out.println("Done in " + stopwatch.elapsed(TimeUnit.MILLISECONDS) + "ms");}private static final byte[] newLine = System.getProperty("line.separator")        .getBytes();private static void processJsonArray(@WillNotClose final InputStream in, @WillNotClose final OutputStream out)        throws IOException {    final JsonReader jsonReader = new JsonReader(new InputStreamReader(in));    jsonReader.beginArray();    final byte[] nBuffer = new byte[16];    final byte[] seqBuffer = new byte[16];    for ( int seq = 0; jsonReader.hasNext(); ) {        final int n = jsonReader.nextInt();        final int nLength = toBytes(nBuffer, String.valueOf(n));        // #1 of twice/three times        out.write(seqBuffer, 0, toBytes(seqBuffer, String.valueOf(++seq)));        out.write('_');        out.write(nBuffer, 0, nLength);        out.write(newLine);        // #2 of twice/three times        out.write(seqBuffer, 0, toBytes(seqBuffer, String.valueOf(++seq)));        out.write('_');        out.write(nBuffer, 0, nLength);        out.write(newLine);        if ( n % 2 == 1 ) {            // #3 of three times            out.write(seqBuffer, 0, toBytes(seqBuffer, String.valueOf(++seq)));            out.write('_');            out.write(nBuffer, 0, nLength);            out.write(newLine);        }    }    jsonReader.endArray();}private static int toBytes(final byte[] buffer, final String s) {    final int length = s.length();    for ( int i = 0; i < length; i++ ) {        buffer[i] = (byte) s.charAt(i);    }    return length;}

The code above takes ~5s without proper benchmarking and warming up at my machine (whilst your version without intermediate ByteArrayOutputStream takes about 25s).