YARN API: Getting Yarn Aggregated Logs for application by API YARN API: Getting Yarn Aggregated Logs for application by API hadoop hadoop

YARN API: Getting Yarn Aggregated Logs for application by API


As you can on the code source https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/LogsCLI.java, this is not trivial, clearly, a log API is missing from YARN API.

Via the API (https://hadoop.apache.org/docs/r2.7.4/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Application_API)

curl http://yarn.infra/ws/v1/cluster/apps/application_1502112083252_1001...<amContainerLogs>http://node-1.infra:8042/node/containerlogs/container_e41_1502112083252_1001_01_000001/hdfs</amContainerLogs>...

And the application attempts (if useful for you):

curl http://yarn.infra/ws/v1/cluster/apps/application_1502112083252_1001/appattempts..<logsLink>http://node-3.infra:8042/node/containerlogs/container_e41_1502112083252_1001_01_000001/hdfs</logsLink>..

Let's re-curl these links, this will let you download local logs. But this is not the full log, (I didnt find exactly how to get it, feel free to complete my answer if you find it.)


As far as I know, YARN writes the logs a file-system, possibly HDFS (in my case: hdfs:hadoopsrv:9000/var/log/hadoop/app-logs/), and user with access rights to these files can get them directly. And from what I understand, yarn logs -applicationId simply gets them from there.


One way to collect the logs is to simply call LogsCLI.main() in the Java code.This call outputs the application logs to the stdout.

import org.apache.hadoop.yarn.client.cli.LogsCLI;      private void collectLogs()      {          String[] args = {"-applicationId", appId.toString()};          try {            LogsCLI.main(args);        } catch (Exception e) {            LOG.warn("Error when collecting Yarn Application logs");            LOG.debug(e);        }      }