Token balancing in a blank new Cassandra-Cluster Token balancing in a blank new Cassandra-Cluster docker docker

Token balancing in a blank new Cassandra-Cluster


As described in this link https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.htmlit seems to be the solution, at least for the distribution of tokens and data for a keyspace.The following steps i take to get a balanced system:

  1. Setup cassandra.yaml for the seed-node with (for my testcase num_tokens=8) let the other parameter as default
  2. startup the seednode, wait until ready
  3. connect via cqlsh or programmatic solution and create the keyspace (for my test-case with replication-factor=1).
  4. shutdown the seed-node
  5. edit the cassandra.yaml of the seed-node and outcomment/add the parameter for allocate_tokens_for_keyspace: [your_keyspace_name_from_step_3]
  6. startup the seed-node and wait until the node is ready
  7. edit the cassandra.yaml for the second node in the cluster take the step 5. in this file and the num_token equal to num_token of the seed-node.
  8. run the second node an wait until it is ready
  9. do the steps 7-8 for any other node in your cluster.

With that and e.g. a testrun with adding 2.000.000 datarows in a testtable in the keyspace i see the following result:

docker exec -ti docker_cassandra-seed_1 nodetool statusDatacenter: tc1===============Status=Up/Down|/ State=Normal/Leaving/Joining/Moving--  Address      Load       Tokens       Owns (effective)  Host ID                               RackUN  172.30.10.4  36.03 MiB  8            33.3%             1e0d781f-d71f-4704-bcd1-efb5d4caff0e  rack1UN  172.30.10.2  36.75 MiB  8            33.3%             56287b3c-b0f1-489f-930e-c7b00df896f3  rack1UN  172.30.10.3  36.03 MiB  8            33.3%             943acc5f-7257-414a-b36c-c06dcb53e67d  rack1

Even the tokendistribution ist better then before:

172.30.10.2                         6.148.914.691.236.510.000172.30.10.3                         6.148.914.691.236.520.000172.30.10.4                         5.981.980.531.853.070.000

At the moment, there is some clarification about the problem with the uneven distribution, so thank you again Chris Lohfink for the link with the solution.


I´ve testing a little bit around the above scenario.My testcluster consists of 5 nodes (1 seed, 4 normal nodes).

The first 5 steps from above remains valid:

  1. Setup cassandra.yaml for the seed-node with (for my testcasenum_tokens=8) let the other parameter as default
  2. startup the seednode, wait until ready
  3. connect via cqlsh or programmaticsolution and create the keyspace (for my test-case withreplication-factor=1).
  4. shutdown the seed-node, edit the cassandra.yaml of the seed-node and outcomment/add the parameter for allocate_tokens_for_keyspace: [your_keyspace_name_from_step_3]

  5. startup the seed-node and wait until the node is ready

Then, you can startup all the other nodes (in my case 4) at same time (or with 1 minute delay between startup of each node), but automated. Important is, that all nodes have the allocate_tokens_for_keyspace: [your_keyspace....] set.

After all nodes are up, and fill with 1.000.000 rows, there´s an even balance of 20%.

That scenario makes the life easier, if you start a cluster with a lot of nodes.