Row count of a column family in Cassandra
If you are working on a large data set and are okay with a pretty good approximation, I highly recommend using the command:
nodetool --host <hostname> cfstats
This will dump out a list for each column family looking like this:
Column Family: widgetsSSTable count: 11Space used (live): 4295810363Space used (total): 4295810363Number of Keys (estimate): 9709824Memtable Columns Count: 99008Memtable Data Size: 150297312Memtable Switch Count: 434Read Count: 9716802Read Latency: 0.036 ms.Write Count: 9716806Write Latency: 0.024 ms.Pending Tasks: 0Bloom Filter False Postives: 10428Bloom Filter False Ratio: 1.00000Bloom Filter Space Used: 18216448Compacted row minimum size: 771Compacted row maximum size: 263210Compacted row mean size: 1634
The "Number of Keys (estimate)" row is a good guess across the cluster and the performance is a lot faster than explicit count approaches.
I found an excellent article on this here.. http://www.planetcassandra.org/blog/post/counting-keys-in-cassandra
select count(*) from cf limit 1000000
Above statement can be used if we have an approximate upper bound known before hand. I found this useful for my case.