How to ensure ordered processing of events using spark streaming? How to ensure ordered processing of events using spark streaming? hadoop hadoop

How to ensure ordered processing of events using spark streaming?


Amazon Kinesis uses shards in the stream as data containers. And inside a shard, it is guaranteed that the values are processed sequentially.

You can exploit this feature for your use case: So use predefined "Partition Key" values while putting records in the stream.

For example, if you are dealing with user values, you can use the id for a user's event as partition key on the producer side.

  • User #1: First makes purchase, then updates score, after that browses to page X etc.
  • User #2: First does X, then does Y, after that Z event occurs etc.

That way, you'll be sure that the events for a single user will be processed in a timely manner. And you'll have your parallelism for different user's events (i.e. Kinesis Records).


You can have just one partition and by that stop parallelism.

Also from my opinion, for scenario like this Apache kafka is a better choice.