Parse nginx ingress logs in fluentd
Pipelines are quite different in logstash and fluentd. And it took some time to build working Kubernetes -> Fluentd -> Elasticsearch -> Kibana solution.
Short answer to my question is to install fluent-plugin-parser plugin (I wonder why it doesn't ship within standard package) and put this rule after kubernetes_metadata filter:
<filter kubernetes.var.log.containers.nginx-ingress-controller-**.log> type parser format /^(?<host>[^ ]*) (?<domain>[^ ]*) \[(?<x_forwarded_for>[^\]]*)\] (?<server_port>[^ ]*) (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+[^\"])(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")? (?<request_length>[^ ]*) (?<request_time>[^ ]*) (?:\[(?<proxy_upstream_name>[^\]]*)\] )?(?<upstream_addr>[^ ]*) (?<upstream_response_length>[^ ]*) (?<upstream_response_time>[^ ]*) (?<upstream_status>[^ ]*)$/ time_format %d/%b/%Y:%H:%M:%S %z key_name log types server_port:integer,code:integer,size:integer,request_length:integer,request_time:float,upstream_response_length:integer,upstream_response_time:float,upstream_status:integer reserve_data yes</filter>
Long answer with lots of examples is here: https://github.com/kayrus/elk-kubernetes/
<match fluent.**> @type null</match><source> @type tail path /var/log/containers/nginx*.log pos_file /data/fluentd/pos/fluentd-nginxlog1.log.pos tag nginxlogs format none read_from_head true</source><filter nginxlogs> @type parser format json key_name message</filter><filter nginxlogs> @type parser format /^(?<host>[^ ]*) (?<domain>[^ ]*) \[(?<x_forwarded_for>[^\]]*)\] (?<server_port>[^ ]*) (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+[^\"])(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*) "(?<referer>[^\"]*)" "(?<agent>[^\"]*)" (?<request_length>[^ ]*) (?<request_time>[^ ]*) (?:\[(?<proxy_upstream_name>[^\]]*)\] )?(?<upstream_addr>[^ ]*) (?<upstream_response_length>[^ ]*) (?<upstream_response_time>[^ ]*) (?<upstream_status>[^ ]*) \w*$/ time_format %d/%b/%Y:%H:%M:%S %z key_name log# types server_port:integer,code:integer,size:integer,request_length:integer,request_time:float,upstream_response_length:integer,upstream_response_time:float,upstream_status:integer</filter><match nginxlogs> @type stdout</match>
You can use multi-format-parser plugin, https://github.com/repeatedly/fluent-plugin-multi-format-parser
<match> format multi_format <pattern> format json </pattern> <pattern> format regexp... time_key timestamp </pattern> <pattern> format none </pattern> </match>
Note: I'm curious to what was the final conf looks like including the filter parser.