Why are packages installed rather than just linked to a specific environment? Why are packages installed rather than just linked to a specific environment? python python

Why are packages installed rather than just linked to a specific environment?


Conda already does this. However, because it leverages hardlinks, it is easy to overestimate the space really being used, especially if one only looks at the size of a single env at a time.

To illustrate the case, let's use du to inspect the real disk usage. First, if I count each environment directory individually, I get the uncorrected per env usage

$ for d in envs/*; do du -sh $d; done2.4G    envs/pymc361.7G    envs/pymc3_271.4G    envs/r-keras1.7G    envs/stan1.2G    envs/velocyto

which is what it might look like from a GUI.

Instead, if I let du count them together (i.e., correcting for the hardlinks), we get

$ du -sh envs/*2.4G    envs/pymc36326M    envs/pymc3_27820M    envs/r-keras927M    envs/stan548M    envs/velocyto

One can see that a significant amount of space is already being saved here.

Most of the hardlinks go back to the pkgs directory, so if we include that as well:

$ du -sh pkgs envs/*8.2G    pkgs400M    envs/pymc36116M    envs/pymc3_27 92M    envs/r-keras 62M    envs/stan162M    envs/velocyto

one can see that outside of the shared packages, the envs are fairly light. If you're concerned about the size of my pkgs, note that I have never run conda clean on this system, so my pkgs directory is full of tarballs and superseded packages, plus some infrastructure I keep in base (e.g., Jupyter, Git, etc).