Why are packages installed rather than just linked to a specific environment?
Conda already does this. However, because it leverages hardlinks, it is easy to overestimate the space really being used, especially if one only looks at the size of a single env at a time.
To illustrate the case, let's use du
to inspect the real disk usage. First, if I count each environment directory individually, I get the uncorrected per env usage
$ for d in envs/*; do du -sh $d; done2.4G envs/pymc361.7G envs/pymc3_271.4G envs/r-keras1.7G envs/stan1.2G envs/velocyto
which is what it might look like from a GUI.
Instead, if I let du
count them together (i.e., correcting for the hardlinks), we get
$ du -sh envs/*2.4G envs/pymc36326M envs/pymc3_27820M envs/r-keras927M envs/stan548M envs/velocyto
One can see that a significant amount of space is already being saved here.
Most of the hardlinks go back to the pkgs
directory, so if we include that as well:
$ du -sh pkgs envs/*8.2G pkgs400M envs/pymc36116M envs/pymc3_27 92M envs/r-keras 62M envs/stan162M envs/velocyto
one can see that outside of the shared packages, the envs are fairly light. If you're concerned about the size of my pkgs
, note that I have never run conda clean
on this system, so my pkgs
directory is full of tarballs and superseded packages, plus some infrastructure I keep in base (e.g., Jupyter, Git, etc).