Can Spark Cassandra Connector resolve hostnanmes from headless service in K8S environment? Can Spark Cassandra Connector resolve hostnanmes from headless service in K8S environment? kubernetes kubernetes

Can Spark Cassandra Connector resolve hostnanmes from headless service in K8S environment?


By default, SCC resolves all provided contact points into IP addresses on the first connect, and then only uses these IP addresses for reconnection. And after initial connection happened, it discover the rest of the cluster. Usually this is not a problem as SCC should receive notifications about nodes up & down and track nodes IP addresses. But in practice, it could happen that nodes are restarted too fast, and notifications are not received, so Spark jobs that use SCC could stuck trying to connect to the IP addresses that aren't valid anymore - I hit this multiple times on the DC/OS.

This problem is solved with the release of SCC 2.5.0 that includes a fix for SPARKC-571. It introduced a new configuration parameter - spark.cassandra.connection.resolveContactPoints that when it's set to false (true by default) will always use hostnames of the contact points for both initial connection & reconnection, avoiding the problems with changed IP addresses.

So on K8S I would try to use this configuration parameter with just normal Cassandra deployment.


Yes, why not. There is a good example on the Kubernetes official documentation. You create a headless service with a selector:

apiVersion: v1kind: Servicemetadata:  labels:    app: cassandra  name: cassandraspec:  clusterIP: None  ports:  - port: 9042  selector:    app: cassandra

and basically when you specify spark.cassandra.connection.host=cassandra (in the same K8s namespace, otherwise, you have to provide Cassandra..svc.cluster.local` it will resolve to the Cassandra contact points (the Pod IP addresses where Cassandra is running)

✌️