Token balancing in a blank new Cassandra-Cluster
As described in this link https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.htmlit seems to be the solution, at least for the distribution of tokens and data for a keyspace.The following steps i take to get a balanced system:
- Setup cassandra.yaml for the seed-node with (for my testcase num_tokens=8) let the other parameter as default
- startup the seednode, wait until ready
- connect via cqlsh or programmatic solution and create the keyspace (for my test-case with replication-factor=1).
- shutdown the seed-node
- edit the cassandra.yaml of the seed-node and outcomment/add the parameter for
allocate_tokens_for_keyspace: [your_keyspace_name_from_step_3]
- startup the seed-node and wait until the node is ready
- edit the cassandra.yaml for the second node in the cluster take the step 5. in this file and the num_token equal to num_token of the seed-node.
- run the second node an wait until it is ready
- do the steps 7-8 for any other node in your cluster.
With that and e.g. a testrun with adding 2.000.000 datarows in a testtable in the keyspace i see the following result:
docker exec -ti docker_cassandra-seed_1 nodetool statusDatacenter: tc1===============Status=Up/Down|/ State=Normal/Leaving/Joining/Moving-- Address Load Tokens Owns (effective) Host ID RackUN 172.30.10.4 36.03 MiB 8 33.3% 1e0d781f-d71f-4704-bcd1-efb5d4caff0e rack1UN 172.30.10.2 36.75 MiB 8 33.3% 56287b3c-b0f1-489f-930e-c7b00df896f3 rack1UN 172.30.10.3 36.03 MiB 8 33.3% 943acc5f-7257-414a-b36c-c06dcb53e67d rack1
Even the tokendistribution ist better then before:
172.30.10.2 6.148.914.691.236.510.000172.30.10.3 6.148.914.691.236.520.000172.30.10.4 5.981.980.531.853.070.000
At the moment, there is some clarification about the problem with the uneven distribution, so thank you again Chris Lohfink for the link with the solution.
I´ve testing a little bit around the above scenario.My testcluster consists of 5 nodes (1 seed, 4 normal nodes).
The first 5 steps from above remains valid:
- Setup cassandra.yaml for the seed-node with (for my testcasenum_tokens=8) let the other parameter as default
- startup the seednode, wait until ready
- connect via cqlsh or programmaticsolution and create the keyspace (for my test-case withreplication-factor=1).
shutdown the seed-node, edit the cassandra.yaml of the seed-node and outcomment/add the parameter for
allocate_tokens_for_keyspace: [your_keyspace_name_from_step_3]
startup the seed-node and wait until the node is ready
Then, you can startup all the other nodes (in my case 4) at same time (or with 1 minute delay between startup of each node), but automated. Important is, that all nodes have the allocate_tokens_for_keyspace: [your_keyspace....]
set.
After all nodes are up, and fill with 1.000.000 rows, there´s an even balance of 20%.
That scenario makes the life easier, if you start a cluster with a lot of nodes.