Is is possible to use a Conda environment as "virtualenv" for a Hadoop Streaming Job (in Python)?

python hadoop anaconda mrjob

I don't know a way of packaging a conda environment in a tar/zip for then untar it in a different box and get it ready to use like in the example you mention, that might not be possible. At least not without Anaconda in all the worker nodes, there might be also issues moving between different OS.

Anaconda Cluster was created to solve that problem (Disclaimer: I am an Anaconda Cluster developer) but it is uses a more complicated approach, basically we use a configuration management system (salt) to install anaconda in all the nodes in the cluster and control the conda environments.

We use a configuration management system because we also deploy the hadoop stack (spark and its friends) and we need to target big clusters, but in reality if you only need to deploy anaconda and have not to many nodes you should be able to do that just with fabric (that Anaconda Cluster also uses in some parts) and run it on a regular laptop.

If you are interested Anaconda Cluster docs are here: http://continuumio.github.io/conda-cluster/

python hadoop anaconda mrjob

Update 2019:

The answer is yes and the way of doing it is using conda-pack

https://conda.github.io/conda-pack/

CodeHunter

Is is possible to use a Conda environment as "virtualenv" for a Hadoop Streaming Job (in Python)?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last