Filebeat Kubernetes Processor and filtering Filebeat Kubernetes Processor and filtering kubernetes kubernetes

Filebeat Kubernetes Processor and filtering


The conditions need to be a list:

- drop_event.when.regexp:    or:      - kubernetes.pod.name: "weave-net.*"      - kubernetes.pod.name: "external-dns.*"      - kubernetes.pod.name: "nginx-ingress-controller.*"      - kubernetes.pod.name: "filebeat.*"

I'm not sure if your order of parameters works. One of my working examples looks like this:

- drop_event:    when:      or:        # Exclude traces from Zipkin        - contains.path: "/api/v"        # Exclude Jolokia calls        - contains.path: "/jolokia/?"        # Exclude pinging metrics        - equals.path: "/metrics"        # Exclude pinging health        - equals.path: "/health"


This worked for me in filebeat 6.1.3

        - drop_event.when:            or:            - equals:                kubernetes.container.name: "filebeat"            - equals:                kubernetes.container.name: "prometheus-kube-state-metrics"            - equals:                kubernetes.container.name: "weave-npc"            - equals:                kubernetes.container.name: "nginx-ingress-controller"            - equals:                kubernetes.container.name: "weave"


I am using a different approach, that is less efficient in terms on the number of logs that transit in the logging pipeline.

Similarly on how you did, I deployed one instance of filebeat on my nodes, using a daemonset. Nothing special here, this is the configuration I am using:

apiVersion: v1data:  filebeat.yml: |-    filebeat.config:      prospectors:        # Mounted `filebeat-prospectors` configmap:        path: ${path.config}/prospectors.d/*.yml        # Reload prospectors configs as they change:        reload.enabled: false      modules:        path: ${path.config}/modules.d/*.yml        # Reload module configs as they change:        reload.enabled: false    processors:      - add_cloud_metadata:    output.logstash:      hosts: ['logstash.elk.svc.cluster.local:5044']kind: ConfigMapmetadata:  labels:    k8s-app: filebeat    kubernetes.io/cluster-service: "true"  name: filebeat-config

And this one for the prospectors:

apiVersion: v1data:  kubernetes.yml: |-    - type: log      paths:        - /var/lib/docker/containers/*/*.log      json.message_key: log      json.keys_under_root: true      processors:        - add_kubernetes_metadata:            in_cluster: true            namespace: ${POD_NAMESPACE}kind: ConfigMapmetadata:  labels:    k8s-app: filebeat    kubernetes.io/cluster-service: "true"  name: filebeat-prospectors

The Daemonset spec:

apiVersion: extensions/v1beta1kind: DaemonSetmetadata:  labels:    k8s-app: filebeat    kubernetes.io/cluster-service: "true"  name: filebeatspec:  selector:    matchLabels:      k8s-app: filebeat      kubernetes.io/cluster-service: "true"  template:    metadata:      labels:        k8s-app: filebeat        kubernetes.io/cluster-service: "true"    spec:      containers:      - args:        - -c        - /etc/filebeat.yml        - -e        command:        - /usr/share/filebeat/filebeat        env:        - name: POD_NAMESPACE          valueFrom:            fieldRef:              apiVersion: v1              fieldPath: metadata.namespace        image: docker.elastic.co/beats/filebeat:6.0.1        imagePullPolicy: IfNotPresent        name: filebeat        resources:          limits:            memory: 200Mi          requests:            cpu: 100m            memory: 100Mi        securityContext:          runAsUser: 0        volumeMounts:        - mountPath: /etc/filebeat.yml          name: config          readOnly: true          subPath: filebeat.yml        - mountPath: /usr/share/filebeat/prospectors.d          name: prospectors          readOnly: true        - mountPath: /usr/share/filebeat/data          name: data        - mountPath: /var/lib/docker/containers          name: varlibdockercontainers          readOnly: true      restartPolicy: Always      terminationGracePeriodSeconds: 30      volumes:      - configMap:          name: filebeat-config        name: config      - hostPath:          path: /var/lib/docker/containers          type: ""        name: varlibdockercontainers      - configMap:          defaultMode: 384          name: filebeat-prospectors        name: prospectors      - emptyDir: {}        name: data

Basically, all data from all logs from all containers gets forwarded to logstash, reachable at the service endpoint: logstash.elk.svc.cluster.local:5044 (service called "logstash" in the "elk" namespace).

For brevity, I'm gonna give you only the configuration for logstash (if you need more specific help with kubernetes, please ask in the comments):

The logstash.yml file is very basic:

http.host: "0.0.0.0"path.config: /usr/share/logstash/pipeline

Just indicating the mountpoint of the directory where I mounted the pipeline config files, which are the following:

10-beats.conf:declares an input for filebeat (port 5044 has to be exposed with a service called "logstash")

input {  beats {    port => 5044    ssl => false  }}

49-filter-logs.conf:this filter basically drops logs coming from pods that don't have the "elk" label. For the pods that do have the "elk" label, it keeps the logs from containers named in the "elk" label of the pod. For instance, if a Pod has two containers, called "nginx" and "python", putting a label "elk" with value "nginx" will only keep the logs coming from the nginx container and drop the python ones. The type of the log is set as the namespace the pod is running in.This might not be a good fit for everybody (you're going to have a single index in elasticsearch for all logs belonging to a namespace) but it works for me because my logs are homologous.

filter {    if ![kubernetes][labels][elk] {        drop {}    }    if [kubernetes][labels][elk] {        # check if kubernetes.labels.elk contains this container name        mutate {          split => { "kubernetes[labels][elk]" => "." }        }        if [kubernetes][container][name] not in [kubernetes][labels][elk] {          drop {}        }        mutate {          replace => { "@metadata[type]" => "%{kubernetes[namespace]}" }          remove_field => [ "beat", "host", "kubernetes[labels][elk]", "kubernetes[labels][pod-template-hash]", "kubernetes[namespace]", "kubernetes[pod][name]", "offset", "prospector[type]", "source", "stream", "time" ]          rename => { "kubernetes[container][name]" => "container"  }          rename => { "kubernetes[labels][app]" => "app"  }        }    }}

The rest of the configuration is about log parsing and is not relevant in this context.The only other important part is the output:

99-output.conf: Send data to elasticsearch:

output {  elasticsearch {    hosts => ["http://elasticsearch.elk.svc.cluster.local:9200"]    manage_template => false    index => "%{[@metadata][type]}-%{+YYYY.MM.dd}"    document_type => "%{[@metadata][type]}"  }}

Hope you got the point here.

PROs of this approach

  • Once deployed filebeat and logstash, as long as you don't need to parse a new type of log, you don't need to update filebeat nor logstash configuration in order to get a new log in kibana. You just need to add a label in the pod template.
  • All log files get dropped by default, as long as you don't explicitly put the labels.

CONs of this approach

  • ALL logs from ALL pods come through filebeat and logstash, and get dropped only in logstash. This is a lot of work for logstash and can be resource consuming depending on the number of pods you have in your cluster.

I am sure there are better approaches to this problem, but I think this solution is quite handy, at least for my use case.