How to configure long term retention of logs for EFK stack using S3? How to configure long term retention of logs for EFK stack using S3? kubernetes kubernetes

How to configure long term retention of logs for EFK stack using S3?


If you haven't already installed efk stack you can do so like this:

helm repo add cryptexlabs https://helm.cryptexlabs.comhelm install my-efk-stack cryptexlabs/efk

Or add to your Chart.yaml dependencies

  - name: efk    version: 7.8.0    repository: https://helm.cryptexlabs.com    condition: efk.enabled

Next create a configmap which will also contain your AWS secrets

apiVersion: v1kind: ConfigMapmetadata:  name: fluentd-extra-configdata:  s3.conf: |-    <match **>      @type copy      copy_mode deep      <store>        @type s3        aws_key_id xxx        aws_sec_key xxx        s3_bucket "#{ENV['AWS_S3_BUCKET']}"        s3_region "#{ENV['AWS_REGION']}"        path "#{ENV['S3_LOGS_BUCKET_PREFIX']}"        buffer_path /var/log/fluent/s3        s3_object_key_format %{path}%{time_slice}/cluster-log-%{index}.%{file_extension}        time_slice_format %Y%m%d-%H        time_slice_wait 10m        flush_interval 60s        buffer_chunk_limit 256m      </store>    </match>

Optionally create a secret with your AWS access key and id, see below for more info. Don't forget that opaque secrets must be base64 encoded

apiVersion: v1kind: Secretmetadata:  name: s3-log-archive-secrettype: Opaquedata:  AWS_ACCESS_KEY_ID: xxx  AWS_SECRET_ACCESS_KEY: xxx

If you're wondering why I didn't use an environment variable for the aws access key and id, well its because it doesn't work: https://github.com/fluent/fluent-plugin-s3/issues/340. If you're using kube-2-iam or kiam then this wouldn't matter. See the documentation for the fluentd s3 pluging to configure it to assume a role instead of use credentials.

These values will allow you to run the s3 plugin with the config map. Some important things to note:

  • I use antiAffinity of "soft" because I run a single instance metal cluster.
  • S3_LOGS_BUCKET_PREFIX is empty because I use a separate bucket for each environment but you could share a bucket for environments and set the prefix as the environment name
  • You need a docker image that extends the fluent/fluentd-kubernetes-daemonset:v1-debian-elasticsearch image and has the s3 plugin installed on it.
  • If you skipped the step to create a secret for access key and id then you can remove the envFrom that imports the secret as environment variables.
efk:  enabled: true  elasticsearch:    antiAffinity: "soft"  fluentd:    env:      - name: FLUENT_ELASTICSEARCH_HOST        value: "elasticsearch-master"      - name: FLUENT_ELASTICSEARCH_PORT        value: "9200"      - name: AWS_REGION        value: us-east-1      - name: AWS_S3_BUCKET        value: your_buck_name_goes_here      - name: S3_LOGS_BUCKET_PREFIX        value: ""    envFrom:      - secretRef:          name: s3-log-archive-secret    extraVolumeMounts:      - name: extra-config        mountPath: /fluentd/etc/conf.d    extraVolumes:      - name: extra-config        configMap:          name: fluentd-extra-config          items:            - key: s3.conf              path: s3.conf    image:      repository: docker.io/cryptexlabs/fluentd      tag: k8s-daemonset-elasticsearch-s3

If you want to make your own docker image you can do so like so:

FROM fluent/fluentd-kubernetes-daemonset:v1-debian-elasticsearchRUN fluent-gem install \ fluent-plugin-s3

Next thing is that you probably want to set a retention period for the s3 data. Either you want to delete it after a certain period of time or move it to Glacier depending on your requirements.

Finally since we have a longer term retention of our logs in S3 we can safely set a retention period of something smaller like 30 days for the data that is sent to elasticsearch using ElasticSearch Curator.

You can install currator like so:

helm repo add stable https://kubernetes-charts.storage.googleapis.comhelm install curator stable/elasticsearch-curator

Or add to your Chart.yaml dependencies:

  - name: elasticsearch-curator    version: 2.1.5    repository: https://kubernetes-charts.storage.googleapis.com

values.yaml:

elasticsearch-curator:  configMaps:    action_file_yml: |-      1: &delete        action: delete_indices        description: "Delete selected indices"        options:          ignore_empty_list: True          continue_if_exception: True          timeout_override: 300        filters:        - filtertype: pattern          kind: prefix          value: 'logstash-'        - filtertype: age          source: name          direction: older          timestring: '%Y-%m-%d'          unit: days          unit_count: 30