Packages availability on Anaconda upgrade
I assume you installed Anaconda on hadoop nodes using Cloudera parcels?
https://www.cloudera.com/downloads/partner/anaconda.html
If yes, then you're correct, you'd have to reinstall everything you installed on top of hadoop nodes.
Cloudera looks at parcels as "immutable" - their state shouldn't change. So when you install something on top of parcels, or change them in any way, expect you changes can be lost. (e.g. parcel redeployment, as it'll untar the parcels again) Same applies for upgrades - new Anaconda version comes with just a new tar file (that's what a parcel basically is, with some metadata information).
If you're interested in managing Python environments - look at conda
virtual environment - https://conda.io/docs/user-guide/overview.html
, or conda-pack
specifically as an example for Spark on YARN - https://conda.github.io/conda-pack/spark.html
We're currently migrating all of our Spark jobs to conda
environments, instead of relying on Anaconda parcels.
PS. I noticed you're using python-2.7
tag for this topic. Notice that free Anaconda Cloudera parcels (starting with Anaconda 5 release) no longer provide Python2
. It'll come with Python3
there. Beware! That change brought us off guard. And that was another reason to migrate to conda
as we can switch between Python2
and Python3
now easily by project.