Issues when reading a string from TCP socket in Node.js Issues when reading a string from TCP socket in Node.js json json

Issues when reading a string from TCP socket in Node.js


Thanks everyone for the explanations, they helped me to better understand the way in which data is sent and received via TCP sockets. Below is a brief overview of the code that I used in the end:

var chunk = "";client.on('data', function(data) {    chunk += data.toString(); // Add string on the end of the variable 'chunk'    d_index = chunk.indexOf(';'); // Find the delimiter    // While loop to keep going until no delimiter can be found    while (d_index > -1) {                 try {            string = chunk.substring(0,d_index); // Create string up until the delimiter            json = JSON.parse(string); // Parse the current string            process(json); // Function that does something with the current chunk of valid json.                }        chunk = chunk.substring(d_index+1); // Cuts off the processed chunk        d_index = chunk.indexOf(';'); // Find the new delimiter    }      });

Comments welcome...


You're on the right track with using a delimiter. However, you can't just extract the stuff before the delimiter, process it, and then discard what came after it. You have to buffer up whatever you got after the delimiter and then concatenate what comes next to it. This means that you could end up with any number (including 0) of JSON "chunks" after a given data event.

Basically you keep a buffer, which you initialize to "". On each data event you concatenate whatever you receive to the end of the buffer and then split it the buffer on the delimiter. The result will be one or more entries, but the last one might not be complete so you need to test the buffer to make sure it ends with your delimiter. If not, you pop the last result and set your buffer to it. You then process whatever results remain (which might not be any).


Be aware that TCP does not make any guarantees about where it divides the chunks of data you recieve. All it guarantees is that all the bytes you send will be received in order, unless the connection fails entirely.

I believe Node data events come in whenever the socket says it has data for you. Technically you could get separate data events for each byte in your JSON data and it would still be within the limits of what the OS is allowed to do. Nobody does that, but your code needs to be written as if it could suddenly start happening at any time to be robust. It's up to you to combine data events and then re-split the data stream along boundaries that make sense to you.

To do that, you need to buffer any data that isn't "complete", including data appended to the end of a chunk of "complete" data. If you're using a delimiter, never throw away any data after the delimiter -- always keep it around as a prefix until you see either more data and eventually either another delimiter or the end event.

Another common choice is to prefix all data with a length field. Say you use a fixed 64-bit binary value. Then you always wait for 8 bytes, plus however many more the value in those bytes indicate, to arrive. Say you had a chunk of ten bytes of data incoming. You might get 2 bytes in one event, then 5, then 4 -- at which point you can parse the length and know you need 7 more, since the last 3 bytes of the third chunk were payload. If the next event actually contains 25 bytes, you'd take the first 7 along with the 3 from before and parse that, and look for another length field in bytes 8-16.

That's a contrived example, but be aware that at low traffic rates, the network layer will generally send your data out in whatever chunks you give it, so this sort of thing only really starts to show up as you increase the load. Once the OS starts building packets from multiple writes at once, it will start splitting on a granularity that is convenient for the network and not for you, and you have to deal with that.