Best practices for data storage with Elasticsearch and Kubernetes Best practices for data storage with Elasticsearch and Kubernetes kubernetes kubernetes

Best practices for data storage with Elasticsearch and Kubernetes


A good pattern to deploy an ElasticSearch cluster in kubernetes is to define a StatefulSets.

Because the StatefulSet replicates more than one Pod you cannot simply reference a persistent volume claim. Instead, you need to add a persistent volume claim template to the StatefulSet state definition.

In order for these replicated peristent volumes to work, you need to create a Dynamic Volume Provisioning and StorageClass which allows storage volumes to be created on-demand.

In the DigitalOcean guide tutorial, the persistent volume claim template is as follow:

  volumeClaimTemplates:  - metadata:      name: data      labels:        app: elasticsearch    spec:      accessModes: [ "ReadWriteOnce" ]      storageClassName: do-block-storage      resources:        requests:          storage: 100Gi

Here, the StorageClass is do-block-storage. You can replace it with your own storage class


Very interesting question,

You need to think of an Elasticsearch node in Kubernetes that would be equivalent to an Elasticsearch Pod.

And Kubernetes need to hold the identity of each pod to attach to the correct Persistent Volume claim in case of an outage, here comes the StatefulSet

A StatefulSet will ensure the same PersistentVolumeClaim stays bound to the same Pod throughout its lifetime.

A PersistentVolume (PV) is a Kubernetes abstraction for storage on the provided hardware. This can be AWS EBS, DigitalOcean Volumes, etc.

I'd recommend having a look into the Elasticsearch Offical Helm chart: https://github.com/elastic/helm-charts/tree/master/elasticsearch

Also Elasticsearch Operator: https://operatorhub.io/operator/elastic-cloud-eck