Logstash, split event from an xml file in multiples documents keeping information from root tags
Try this filter:
filter { xml { source => "message" target => "xml_content" } split { field => "xml_content[EVENTLIST]" } split { field => "xml_content[EVENTLIST][EVENT]" } mutate { add_field => { "number" => "%{xml_content[number]}" } add_field => { "name" => "%{xml_content[EVENTLIST][EVENT][name]}" } remove_field => ['xml_content', 'message', 'path'] }}output { stdout { codec => rubydebug }}
It returns this events:
{ "number" => "34", "@timestamp" => 2016-12-23T12:01:17.888Z, "@version" => "1", "host" => "xubuntu", "name" => "hey" ]}{ "number" => "34", "@timestamp" => 2016-12-23T12:01:17.888Z, "@version" => "1", "host" => "xubuntu", "name" => "you" ]}
If your structure is as simple as you show, you can use a memorize
plugin that I wrote.
Your configuration would look something like this:
filter { if ([message] =~ /<ROOT/) { grok { match => [ "message", 'number="(?<number>\d+)" number2="(?<number1>\d+)"' ] } } else if ([message] =~ /<EVENT /) { grok { match => [ "message", 'name="(?<name>[^"]+)"'] } } memorize { fields => ["number","number1"] } if ([message] !~ /<EVENT /) { drop {} } else { mutate { remove_field => ["message"] } }}
My example shows looking for multiple things in the ROOT
element based on your comments below. And here's the version of the plugin that supports memorizing multiple fields:
# encoding: utf-8require "logstash/filters/base"require "logstash/namespace"require "set"## This filter will look for fields from an event and record the last value# of them. If any are not present, their last value will be added to the# event## The config looks like this:## filter {# memorize {# fields => ["time"]# default => { "time" => "00:00:00.000" }# }# }## The `fields` is an array of the field NAMES that you want to memorize# The `default` is a map of field names to field values that you want# to use if the field isn't present and has no memorized value (optional)class LogStash::Filters::Memorize < LogStash::Filters::Base config_name "memorize" milestone 2 # An array of the field names to to memorize config :fields, :validate => :array, :required => true # a map for default values to use if its not seen before we need it config :default, :validate => :hash, :required => false # The stream identity is how the filter determines which stream an # event belongs to. See the multiline plugin if you want more details on how # this might work config :stream_identity , :validate => :string, :default => "%{host}.%{path}.%{type}" public def initialize(config = {}) super @threadsafe = false # This filter needs to keep state. @memorized = Hash.new end # def initialize public def register # nothing needed end # def register public def filter(event) return unless filter?(event) any = false @fields.each do |field| if event[field].nil? map = @memorized[@stream_identity] val = map.nil? ? nil : map[field] if val.nil? val = @default.nil? ? nil : @default[field] end if !val.nil? event[field] = val any = true end else map = @memorized[@stream_identity] if map.nil? map = @memorized[@stream_identity] = Hash.new end val = event[field] map[field] = event[field] end #if if any filter_matched(event) end end #field.each endend
For logstash 1.5 and later, this plugin is available for installation via
bin/plugin install logstash-filter-memorize