Java - Logging best practices in multi-node environment Java - Logging best practices in multi-node environment elasticsearch elasticsearch

Java - Logging best practices in multi-node environment


Here is one of those complicated answers :)ELK stack is definitely something that can make your life much easier in distributed environments.However in order to benefit from it you can consider the following practices:

  • In log message that comes to ElasticSearch you should see the following (besides obvious time, level, a message itself):

    • The server that has produced a lot message
    • The user that has originated a request
    • If your application is multi tenant - the tenant under which the request has been proceeded
  • All messages should be of the same structure (layout)

  • Since you use Java, Exceptions can become a potential issue (they're multi-lined) so they need a special treatment. Logstash can deal with it though

  • If your flow can spread over different servers (you say you have 3 and potentially more) consider generating a special correlation id per request. Some random number that can identify a flow.

All this can help to apply filters and benefit from Elastic Search even more.

Consider using TTLs for logs. Probably you won't need to keep the logs created before more than a week or two.

Now regarding HTTP requests. Usually logging just everything can be a security issue, because you can't be sure that some "sensitive" information won't be logged. So you'll want to keep this protected (at least your security guy will want :) ). Probably logging URL, server, http method and some user identifier (or tenant if needed) will be sufficient, but its solely my opinion.

Now regarding the appender vs logstash (files) approach.Both approaches have pros and contras.For instance:If you use logstash approach you'll have to parse every single line of your log files. If your application produces many logs it can influence the performance since parsing can by CPU costy (especially if you use grok filter in logstash).On the other hand - appenders allow to avoid parsing altogether (you'll have all the information in memory in java).

On the other hand, appenders should be set up carefully. I don't have an experience with logback Elasticsearch appender, but I think it should be at least:

  • asynchronous (otherwise your business flow can stuck)
  • able to cope with failures (you won't want to throw an exception only because currently ES is not available or something. An end user shouldn't really feel this).
  • Probably maintain some queues / use disruptor under the hood, only because you can produce way more logs than your appender may be able to send to ES, so eventually the log messages will be lost. For example if you have a queue of size 1000, and there are more than 1000 messages in the log, you can image what FIFO will do.

Yet Another thing to consider:Lets imagine that for some reason there is an issue with one of the application servers. So you'll probably want to restart it (gracefully or not). If you use im-memory appender what will happen with these messages? would you like to see them in ElasticSearch to analyze post mortum?So, bottom line in-memory approach can't deal with restarts.

On the other hand, whatever will be stored in file will be happily processed by logstash process.

As for alternative approaches to appender vs. logstash, probably you may consider using Apache Flume as a transport. If you go with an appender approach you can use an embedded flume agent and write a very good appender on top of it. Flume will provide disk based persistency, transaction like api and so forth.

Having said that, many people just go with logstash approach as far as I know.

One more thing, probably the last one that come to my mind:

  • You shouldn't really write directly to ElasticSearch. Instead use some intermediate server (in logstash it can be Redis or RabbitMQ). In flume approach you can use just yet anothe flume process (with scale out option support out of the box).

This will allow you to abstract ElasticSearch out architectural-wise and apply some additional processing on log stash server (it can pull data from Redis/get receive messages from RabbitMQ). In flume the similar behaviour is achievable as well.

Hops this helps


Is there a best practice […] format [/protocol]?

I am not aware of any logging standard that already has the fields that you want. so you'll need a format that lets you store custom metadata. you can add metadata to syslog messages using the RFC5424 format. I have also seen various log services accept JSON-formatted messages over a socket connection.

should I use an elasticsearch appender?

I recommend sending directly to logstash rather than sending directly to ElasticSearch.

  1. Logstash is designed to receive & parse messages in a variety of formats, so it will be easier to send the message in a format logstash understands than in a format ElasticSearch understands.
  2. As your logging requirements evolve: you will be able to make thechange in one place — Logstash — instead of reconfiguring your every application instance.

    • This includes operational changes, such as changing the address of your ElasticSearch cluster.
  3. Logstash can do things such as censor logs (remove things that look like passwords or addresses)

  4. Logstash can send logs to a variety of downstream services. For example: it can trigger PagerDuty notifications or Slack messages if you encounter an important error.
  5. Logstash can enrich log messages with additional metadata (e.g. decipher geo-coordinates from IP addresses)

There are likely scale concerns as well. I am not knowledgeable enough to comment on these, but here's my gut feeling: I expect that Logstash is designed to handle a large number of connections well (and to gracefully handle connection failures). I don't know whether this is a similar priority in the design of an ElasticSearch cluster, or whether ElasticSearch's search performance would be impacted by having a large number of agents connected to it at once. I am more confident that Logstash is designed with this kind of use in mind.

You may also find that there are limitations of the ElasticSearch appender. The appender needs to have good support for a number of things. The first things that come to mind are:

  • choice of protocol, encryption
  • choice of compression
  • full control over the format of the log message (including custom fields)
  • control over special messages such as exceptions are sent

You can avoid any limitations of a technology-specific appender by sticking to a well-supported standard (i.e. like the syslog appender).

Are there valid alternatives to logback and elasticsearch technlogies to fulfill my requirement ?

Do you mean to say logstash (i.e. "is there an alternative to the ELK stack"?) If that's your intention, then I don't have an answer.

But in terms of alternatives to logback… I use log4j2. It provides async logging, to reduce performance burden on your application. Maybe logback has this feature too. Sending custom fields in log4j2 log messages is hard (currently there is poor support for escaping JSON. plugins are available, but your build needs to be setup correctly to support these). The most easy route for me was to use the RFC5424 syslog appender.

Consider designing your Java application to invoke a logging facade (i.e. SLF4J), rather than directly invoking logback. This enables you to trivially switch to a different logging provider in future.


I have same problem as you and decided to avoid any intermediate log gatherer (like Logstash/Flume).

https://github.com/internetitem/logback-elasticsearch-appender is not ideal at present but it config is more elastic then https://github.com/logstash/logstash-logback-encoder

For example logstash-logback-encoder fixes names for standard fields of https://logback.qos.ch/apidocs/ch/qos/logback/classic/spi/ILoggingEvent.html

logback-elasticsearch-appender currently lacks persistence to local FS storage if ring is full and lacks iteration over available ES servers (only one is possible to specify).

Please note that Logstash is not fail safe by default, from https://www.elastic.co/guide/en/logstash/current/persistent-queues.html

By default, Logstash uses in-memory bounded queues between pipeline stages (inputs → pipeline workers) to buffer events. The size of these in-memory queues is fixed and not configurable

So you need to invent some schema with Redis, RabbitMQ, or Kafka. In my view ES cluster is much safer then Logstash (ES safety is in Elasic.io advertisement).

Also note that Logstash implemented in Ruby and so single threaded app! We can't talk about scalability here. Expect up to 10000 req/s (that is typical number from performance reports I found over Internet).

Flume have better performance. I saw that it lacks documentation. Get ready to ask questions on mail lists ))

There are a lot of commercial offers:

  • Splunk <http://www.splunk.com/en_us/products/splunk-light.html>
  • Scalyr <https://www.scalyr.com/pricing>
  • Graylog <https://www.graylog.org/support-packages/>
  • Loggly <https://www.loggly.com/product/>
  • Motadata <https://www.motadata.com/elk-stack-alternative/>

They costs thousands dollars per year for wise reasons.

You can check how is hard to design good appender from one of the log gathering vendor: https://logz.io/blog/lessons-learned-writing-new-logback-appender/

With centralized logging solution you should change the way you log:

  • Add context to https://www.slf4j.org/api/org/slf4j/MDC.html That can be client phone number, IP address, ticket number or whatever else you have. You need a way to quickly filter important data.

  • Start using https://www.slf4j.org/api/org/slf4j/Marker.html for unexpected incidents that require immediate reaction. Don't hide or ignore problems!

  • Plan how to name MDC params and Markers and document them so operational team would know what happen without calling you at midnight.

  • Set replication in ES cluster. That allows you to shut down part of ES nodes for maintainace.