Hadoop MapReduce log4j - log messages to a custom file in userlogs/job_ dir? Hadoop MapReduce log4j - log messages to a custom file in userlogs/job_ dir? hadoop hadoop

Hadoop MapReduce log4j - log messages to a custom file in userlogs/job_ dir?


You can configure log4j directly in your code. For example you can call PropertyConfigurator.configure(properties); e.g. in mapper/reducer setup method.

This is example with properties stored on hdfs:

        InputStream is = fs.open(log4jPropertiesPath);        Properties properties = new Properties();        properties.load(is);        PropertyConfigurator.configure(properties);

where fs is FileSystem object and log4jPropertiesPath is path on hdfs.

With this you can also output logs to a dir with job_id. For example you can modify our properities before calling PropertyConfigurator.configure(properties);

Enumeration propertiesNames = properties.propertyNames();            while (propertiesNames.hasMoreElements()) {                String propertyKey = (String) propertiesNames.nextElement();                String propertyValue = properties.getProperty(propertyKey);                if (propertyValue.indexOf(JOB_ID_PATTERN) != -1) {                    properties.setProperty(propertyKey, propertyValue.replace(JOB_ID_PATTERN, context.getJobID().toString()));                }            }


  1. There is no straight forward way to override the log4j properties at each job level.

  2. Map Reduce job itself doesn't store the logs in Hadoop,it writes logs in local file system(${hadoop.log.dir}/userlogs) of the datanodes. There is a separate process from Yarn called log-aggregation which collect those logs and combines.

Use yarn logs --applicationId <appId> to fetch the full log, then use unix command to parse and extract the part of the log you need.