Fastest way to store and retrieve a large stream of small unstructured messages Fastest way to store and retrieve a large stream of small unstructured messages json json

Fastest way to store and retrieve a large stream of small unstructured messages


I assume that messages only contain few named attributes of basic types (defined at runtime) and that these basic types are for example strings, integers and floating-point numbers.

For the implementation to be fast, it is better to:

  • avoid text parsing (slow because sequential and full of conditionals);
  • avoid checking if messages are ill-formed (not needed here as they should all be well-formed);
  • avoid allocations as much as possible;
  • work on message chunks.

Thus, we first need to design a simple and fast binary message protocol:

A binary message contains the number of its attributes (encoded on 1 byte) followed by the list of attributes. Each attribute contains a string prefixed by its size (encoded on 1 byte) followed by the type of the attribute (the index of the type in the std::variant, encoded on 1 byte) as well as the attribute value (a size-prefixed string, a 64-bit integer or a 64-bit floating-point number).

Each encoded message is a stream of bytes that can fit in a large buffer (allocated once and reused for multiple incoming messages).

Here is a code to decode a message from a raw binary buffer:

#include <unordered_map>#include <variant>#include <climits>// Define the possible types hereusing AttrType = std::variant<std::string_view, int64_t, double>;// Decode the `msgData` buffer and write the decoded message into `result`.// Assume the message is not ill-formed!// msgData must not be freed or modified while the resulting map is being used.void decode(const char* msgData, std::unordered_map<std::string_view, AttrType>& result){    static_assert(CHAR_BIT == 8);    const size_t attrCount = msgData[0];    size_t cur = 1;    result.clear();    for(size_t i=0 ; i<attrCount ; ++i)    {        const size_t keyLen = msgData[cur];        std::string_view key(msgData+cur+1, keyLen);        cur += 1 + keyLen;        const size_t attrType = msgData[cur];        cur++;        // A switch could be better if there is more types        if(attrType == 0) // std::string_view        {            const size_t valueLen = msgData[cur];            std::string_view value(msgData+cur+1, valueLen);            cur += 1 + valueLen;            result[key] = std::move(AttrType(value));        }        else if(attrType == 1) // Native-endian 64-bit integer        {            int64_t value;            // Required to not break the strict aliasing rule            std::memcpy(&value, msgData+cur, sizeof(int64_t));            cur += sizeof(int64_t);            result[key] = std::move(AttrType(value));        }        else // IEEE-754 double        {            double value;            // Required to not break the strict aliasing rule            std::memcpy(&value, msgData+cur, sizeof(double));            cur += sizeof(double);            result[key] = std::move(AttrType(value));        }    }}

You probably need to write the encoding function too (based on the same idea).

Here is an example of usage (based on your json-related code):

const char* message = "\x01\x05value\x00\x05hello";void bench(){    std::unordered_map<std::string_view, AttrType> decodedMsg;    decodedMsg.reserve(16);    decode(message, decodedMsg);    for(size_t i=0; i<1000*1000; ++i)    {        decode(message, decodedMsg);    }    visit([](const auto& v) { cout << "Result: " << v << endl; }, decodedMsg["value"]);}

On my machine (with an Intel i7-9700KF processor) and based on your benchmark, I get 2.7M message/s with the code using the nlohmann json library and 35.4M message/s with the new code.

Note that this code can be much faster. Indeed, most of the time is spent in efficient hashing and allocations. You can mitigate the problem by using a faster hash-map implementation (eg. boost::container::flat_map or ska::bytell_hash_map) and/or by using a custom allocator. An alternative is to build your own carefully tuned hash-map implementation. Another alternative is to use a vector of key-value pairs and use a linear search to perform lookups (this should be fast because your messages should not have a lot of attributes and because you said that you need a small fraction of the attributes per message).However, the larger the messages, the slower the decoding. Thus, you may need to leverage parallelism to decode message chunks faster.With all of that, this is possible to reach more than 100 M message/s.