Configuring Hadoop logging to avoid too many log files

I had this same problem. Set the environment variable "HADOOP_ROOT_LOGGER=WARN,console" before starting Hadoop.

export HADOOP_ROOT_LOGGER="WARN,console"hadoop jar start.jar

java log4j hadoop mapreduce

Unfortunately, there isn't a configurable way to prevent that. Every task for a job gets one directory in history/userlogs, which will hold the stdout, stderr, and syslog task log output files. The retain hours will help keep too many of those from accumulating, but you'd have to write a good log rotation tool to auto-tar them.

We had this problem too when we were writing to an NFS mount, because all nodes would share the same history/userlogs directory. This means one job with 30,000 tasks would be enough to break the FS. Logging locally is really the way to go when your cluster actually starts processing a lot of data.

If you are already logging locally and still manage to process 30,000+ tasks on one machine in less than a week, then you are probably creating too many small files, causing too many mappers to spawn for each job.

java log4j hadoop mapreduce

Configuring hadoop to use log4j and setting

log4j.appender.FILE_AP1.MaxFileSize=100MBlog4j.appender.FILE_AP1.MaxBackupIndex=10

like described on this wiki page doesn't work?

Looking at the LogLevel source code, seems like hadoop uses commons logging, and it'll try to use log4j by default, or jdk logger if log4j is not on the classpath.

Btw, it's possible to change log levels at runtime, take a look at the commands manual.

CodeHunter

Configuring Hadoop logging to avoid too many log files

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last