How to configure long term retention of logs for EFK stack using S3?
If you haven't already installed efk stack you can do so like this:
helm repo add cryptexlabs https://helm.cryptexlabs.comhelm install my-efk-stack cryptexlabs/efk
Or add to your Chart.yaml
dependencies
- name: efk version: 7.8.0 repository: https://helm.cryptexlabs.com condition: efk.enabled
Next create a configmap which will also contain your AWS secrets
apiVersion: v1kind: ConfigMapmetadata: name: fluentd-extra-configdata: s3.conf: |- <match **> @type copy copy_mode deep <store> @type s3 aws_key_id xxx aws_sec_key xxx s3_bucket "#{ENV['AWS_S3_BUCKET']}" s3_region "#{ENV['AWS_REGION']}" path "#{ENV['S3_LOGS_BUCKET_PREFIX']}" buffer_path /var/log/fluent/s3 s3_object_key_format %{path}%{time_slice}/cluster-log-%{index}.%{file_extension} time_slice_format %Y%m%d-%H time_slice_wait 10m flush_interval 60s buffer_chunk_limit 256m </store> </match>
Optionally create a secret with your AWS access key and id, see below for more info. Don't forget that opaque secrets must be base64 encoded
apiVersion: v1kind: Secretmetadata: name: s3-log-archive-secrettype: Opaquedata: AWS_ACCESS_KEY_ID: xxx AWS_SECRET_ACCESS_KEY: xxx
If you're wondering why I didn't use an environment variable for the aws access key and id, well its because it doesn't work: https://github.com/fluent/fluent-plugin-s3/issues/340. If you're using kube-2-iam or kiam then this wouldn't matter. See the documentation for the fluentd s3 pluging to configure it to assume a role instead of use credentials.
These values will allow you to run the s3 plugin with the config map. Some important things to note:
- I use antiAffinity of "soft" because I run a single instance metal cluster.
- S3_LOGS_BUCKET_PREFIX is empty because I use a separate bucket for each environment but you could share a bucket for environments and set the prefix as the environment name
- You need a docker image that extends the fluent/fluentd-kubernetes-daemonset:v1-debian-elasticsearch image and has the s3 plugin installed on it.
- If you skipped the step to create a secret for access key and id then you can remove the
envFrom
that imports the secret as environment variables.
efk: enabled: true elasticsearch: antiAffinity: "soft" fluentd: env: - name: FLUENT_ELASTICSEARCH_HOST value: "elasticsearch-master" - name: FLUENT_ELASTICSEARCH_PORT value: "9200" - name: AWS_REGION value: us-east-1 - name: AWS_S3_BUCKET value: your_buck_name_goes_here - name: S3_LOGS_BUCKET_PREFIX value: "" envFrom: - secretRef: name: s3-log-archive-secret extraVolumeMounts: - name: extra-config mountPath: /fluentd/etc/conf.d extraVolumes: - name: extra-config configMap: name: fluentd-extra-config items: - key: s3.conf path: s3.conf image: repository: docker.io/cryptexlabs/fluentd tag: k8s-daemonset-elasticsearch-s3
If you want to make your own docker image you can do so like so:
FROM fluent/fluentd-kubernetes-daemonset:v1-debian-elasticsearchRUN fluent-gem install \ fluent-plugin-s3
Next thing is that you probably want to set a retention period for the s3 data. Either you want to delete it after a certain period of time or move it to Glacier depending on your requirements.
Finally since we have a longer term retention of our logs in S3 we can safely set a retention period of something smaller like 30 days for the data that is sent to elasticsearch using ElasticSearch Curator.
You can install currator like so:
helm repo add stable https://kubernetes-charts.storage.googleapis.comhelm install curator stable/elasticsearch-curator
Or add to your Chart.yaml
dependencies:
- name: elasticsearch-curator version: 2.1.5 repository: https://kubernetes-charts.storage.googleapis.com
values.yaml
:
elasticsearch-curator: configMaps: action_file_yml: |- 1: &delete action: delete_indices description: "Delete selected indices" options: ignore_empty_list: True continue_if_exception: True timeout_override: 300 filters: - filtertype: pattern kind: prefix value: 'logstash-' - filtertype: age source: name direction: older timestring: '%Y-%m-%d' unit: days unit_count: 30