Logstash, split event from an xml file in multiples documents keeping information from root tags Logstash, split event from an xml file in multiples documents keeping information from root tags elasticsearch elasticsearch

Logstash, split event from an xml file in multiples documents keeping information from root tags


Try this filter:

filter {  xml {    source => "message"    target => "xml_content"  }  split {    field => "xml_content[EVENTLIST]"  }  split {    field => "xml_content[EVENTLIST][EVENT]"  }  mutate {    add_field => { "number" => "%{xml_content[number]}" }    add_field => { "name" => "%{xml_content[EVENTLIST][EVENT][name]}" }    remove_field => ['xml_content', 'message', 'path']  }}output {  stdout {    codec => rubydebug  }}

It returns this events:

{        "number" => "34",    "@timestamp" => 2016-12-23T12:01:17.888Z,      "@version" => "1",          "host" => "xubuntu",          "name" => "hey"    ]}{        "number" => "34",    "@timestamp" => 2016-12-23T12:01:17.888Z,      "@version" => "1",          "host" => "xubuntu",          "name" => "you"    ]}


If your structure is as simple as you show, you can use a memorize plugin that I wrote.

Your configuration would look something like this:

filter {  if ([message] =~ /<ROOT/) {    grok {      match => [ "message",         'number="(?<number>\d+)" number2="(?<number1>\d+)"'      ]     }  } else if ([message] =~ /<EVENT /) {    grok {       match => [ "message", 'name="(?<name>[^"]+)"']    }  }  memorize {    fields => ["number","number1"]  }  if ([message] !~ /<EVENT /) {    drop {}  } else {    mutate { remove_field => ["message"] }  }}

My example shows looking for multiple things in the ROOT element based on your comments below. And here's the version of the plugin that supports memorizing multiple fields:

# encoding: utf-8require "logstash/filters/base"require "logstash/namespace"require "set"## This filter will look for fields from an event and record the last value# of them.  If any are not present, their last value will be added to the# event## The config looks like this:##     filter {#       memorize {#         fields => ["time"]#         default => { "time" => "00:00:00.000" }#       }#     }## The `fields` is an array of the field NAMES that you want to memorize# The `default` is a map of field names to field values that you want# to use if the field isn't present and has no memorized value (optional)class LogStash::Filters::Memorize < LogStash::Filters::Base  config_name "memorize"  milestone 2  # An array of the field names to to memorize  config :fields, :validate => :array, :required => true  # a map for default values to use if its not seen before we need it  config :default, :validate => :hash, :required => false  # The stream identity is how the filter determines which stream an  # event belongs to. See the multiline plugin if you want more details on how  # this might work  config :stream_identity , :validate => :string, :default => "%{host}.%{path}.%{type}"  public  def initialize(config = {})    super    @threadsafe = false    # This filter needs to keep state.    @memorized = Hash.new  end # def initialize  public  def register    # nothing needed  end # def register  public  def filter(event)    return unless filter?(event)    any = false    @fields.each do |field|      if event[field].nil?    map = @memorized[@stream_identity]        val = map.nil? ? nil : map[field]        if val.nil?          val = @default.nil? ? nil : @default[field]        end    if !val.nil?          event[field] = val          any = true    end      else        map = @memorized[@stream_identity]    if map.nil?          map = @memorized[@stream_identity] = Hash.new    end    val = event[field]    map[field] = event[field]      end #if      if any        filter_matched(event)      end    end #field.each  endend

For logstash 1.5 and later, this plugin is available for installation via

bin/plugin install logstash-filter-memorize