Packages availability on Anaconda upgrade Packages availability on Anaconda upgrade hadoop hadoop

Packages availability on Anaconda upgrade


I assume you installed Anaconda on hadoop nodes using Cloudera parcels?

https://www.cloudera.com/downloads/partner/anaconda.html

If yes, then you're correct, you'd have to reinstall everything you installed on top of hadoop nodes.

Cloudera looks at parcels as "immutable" - their state shouldn't change. So when you install something on top of parcels, or change them in any way, expect you changes can be lost. (e.g. parcel redeployment, as it'll untar the parcels again) Same applies for upgrades - new Anaconda version comes with just a new tar file (that's what a parcel basically is, with some metadata information).

If you're interested in managing Python environments - look at conda virtual environment - https://conda.io/docs/user-guide/overview.html

, or conda-pack specifically as an example for Spark on YARN - https://conda.github.io/conda-pack/spark.html

We're currently migrating all of our Spark jobs to conda environments, instead of relying on Anaconda parcels.

PS. I noticed you're using python-2.7 tag for this topic. Notice that free Anaconda Cloudera parcels (starting with Anaconda 5 release) no longer provide Python2. It'll come with Python3 there. Beware! That change brought us off guard. And that was another reason to migrate to conda as we can switch between Python2 and Python3 now easily by project.