Synchronize Data across multiple occasionally-connected-clients using EventSourcing (NodeJS, MongoDB, JSON) Synchronize Data across multiple occasionally-connected-clients using EventSourcing (NodeJS, MongoDB, JSON) mongodb mongodb

Synchronize Data across multiple occasionally-connected-clients using EventSourcing (NodeJS, MongoDB, JSON)


This is a very complex subject, but I'll attempt some form of answer.

My first reflex upon seeing your diagram is to think of how distributed databases replicate data between themselves and recover in the event that one node goes down. This is most often accomplished via gossiping.

Gossip rounds make sure that data stays in sync. Time-stamped revisions are kept on both ends merged on demand, say when a node reconnects, or simply at a given interval (publishing bulk updates via socket or the like).

Database engines like Cassandra or Scylla use 3 messages per merge round.

Demonstration:

Data in Node A

{ id: 1, timestamp: 10, data: { foo: '84' } }{ id: 2, timestamp: 12, data: { foo: '23' } }{ id: 3, timestamp: 12, data: { foo: '22' } }

Data in Node B

{ id: 1, timestamp: 11, data: { foo: '50' } }{ id: 2, timestamp: 11, data: { foo: '31' } }{ id: 3, timestamp: 8, data: { foo: '32' } }

Step 1: SYN

It lists the ids and last upsert timestamps of all it's documents (feel free to change the structure of these data packets, here I'm using verbose JSON to better illustrate the process)

Node A -> Node B

[ { id: 1, timestamp: 10 }, { id: 2, timestamp: 12 }, { id: 3, timestamp: 12 } ]

Step 2: ACK

Upon receiving this packet, Node B compares the received timestamps with it's own. For each documents, if it's timestamp is older, just place it in the ACK payload, if it's newer place it along with it's data. And if timestamps are the same, do nothing- obviously.

Node B -> Node A

[ { id: 1, timestamp: 11, data: { foo: '50' } }, { id: 2, timestamp: 11 }, { id: 3, timestamp: 8 } ]

Step 3: ACK2

Node A updates it's document if ACK data is provided, then sends back the latest data to Node B for those where no ACK data was provided.

Node A -> Node B

[ { id: 2, timestamp: 12, data: { foo: '23' } }, { id: 3, timestamp: 12, data: { foo: '22' } } ]

That way, both node now have the latest data merged both ways (in case the client did offline work) - without having to send all your documents.

In your case, your source of truth is your server, but you could easily implement peer-to-peer gossiping between your clients with WebRTC, for example.

Hope this helps in some way.

Cassandra training video

Scylla explanation


I think that the best solution to avoid all the event order and duplication issues are to use the pull method. In this way every client maintains its last imported event state (with a tracker for example) and ask the server for the events generated after that last one.

An interesting problem will be to detect the breaking of business invariants. For that you could store on the client the log of applied commands also and in case of a conflict (events were generated by other clients) you could retry the execution of commands from the command log. You need to do that because some commands will not succeed after re-execution; for example, a client saves a document after other user deleted that document in the same time.