How to avoid reinstalling packages when building Docker image for Python projects? How to avoid reinstalling packages when building Docker image for Python projects? docker docker

How to avoid reinstalling packages when building Docker image for Python projects?


Try to build a Dockerfile which looks something like this:

FROM my/baseWORKDIR /srvADD ./requirements.txt /srv/requirements.txtRUN pip install -r requirements.txtADD . /srvRUN python setup.py installENTRYPOINT ["run_server"]

Docker will use cache during pip install as long as you do not make any changes to the requirements.txt, irrespective of the fact whether other code files at . were changed or not. Here's an example.


Here's a simple Hello, World! program:

$ tree.├── Dockerfile├── requirements.txt└── run.py   0 directories, 3 file# DockerfileFROM dockerfile/pythonWORKDIR /srvADD ./requirements.txt /srv/requirements.txtRUN pip install -r requirements.txtADD . /srvCMD python /srv/run.py# requirements.txtpytest==2.3.4# run.pyprint("Hello, World")

The output of docker build:

Step 1 : WORKDIR /srv---> Running in 22d725d22e10---> 55768a00fd94Removing intermediate container 22d725d22e10Step 2 : ADD ./requirements.txt /srv/requirements.txt---> 968a7c3a4483Removing intermediate container 5f4e01f290fdStep 3 : RUN pip install -r requirements.txt---> Running in 08188205e92bDownloading/unpacking pytest==2.3.4 (from -r requirements.txt (line 1))  Running setup.py (path:/tmp/pip_build_root/pytest/setup.py) egg_info for package pytest....Cleaning up...---> bf5c154b87c9Removing intermediate container 08188205e92bStep 4 : ADD . /srv---> 3002a3a67e72Removing intermediate container 83defd1851d0Step 5 : CMD python /srv/run.py---> Running in 11e69b887341---> 5c0e7e3726d6Removing intermediate container 11e69b887341Successfully built 5c0e7e3726d6

Let's modify run.py:

# run.pyprint("Hello, Python")

Try to build again, below is the output:

Sending build context to Docker daemon  5.12 kBSending build context to Docker daemon Step 0 : FROM dockerfile/python---> f86d6993fc7bStep 1 : WORKDIR /srv---> Using cache---> 55768a00fd94Step 2 : ADD ./requirements.txt /srv/requirements.txt---> Using cache---> 968a7c3a4483Step 3 : RUN pip install -r requirements.txt---> Using cache---> bf5c154b87c9Step 4 : ADD . /srv---> 9cc7508034d6Removing intermediate container 0d7cf71eb05eStep 5 : CMD python /srv/run.py---> Running in f25c21135010---> 4ffab7bc66c7Removing intermediate container f25c21135010Successfully built 4ffab7bc66c7

As you can see above, this time docker uses cache during the build. Now, let's update requirements.txt:

# requirements.txtpytest==2.3.4ipython

Below is the output of docker build:

Sending build context to Docker daemon  5.12 kBSending build context to Docker daemon Step 0 : FROM dockerfile/python---> f86d6993fc7bStep 1 : WORKDIR /srv---> Using cache---> 55768a00fd94Step 2 : ADD ./requirements.txt /srv/requirements.txt---> b6c19f0643b5Removing intermediate container a4d9cb37dff0Step 3 : RUN pip install -r requirements.txt---> Running in 4b7a85a64c33Downloading/unpacking pytest==2.3.4 (from -r requirements.txt (line 1))  Running setup.py (path:/tmp/pip_build_root/pytest/setup.py) egg_info for package pytestDownloading/unpacking ipython (from -r requirements.txt (line 2))Downloading/unpacking py>=1.4.12 (from pytest==2.3.4->-r requirements.txt (line 1))  Running setup.py (path:/tmp/pip_build_root/py/setup.py) egg_info for package pyInstalling collected packages: pytest, ipython, py  Running setup.py install for pytestInstalling py.test script to /usr/local/binInstalling py.test-2.7 script to /usr/local/bin  Running setup.py install for pySuccessfully installed pytest ipython pyCleaning up...---> 23a1af3df8edRemoving intermediate container 4b7a85a64c33Step 4 : ADD . /srv---> d8ae270eca35Removing intermediate container 7f003ebc3179Step 5 : CMD python /srv/run.py---> Running in 510359cf9e12---> e42fc9121a77Removing intermediate container 510359cf9e12Successfully built e42fc9121a77

Notice how docker didn't use cache during pip install. If it doesn't work, check your docker version.

Client version: 1.1.2Client API version: 1.13Go version (client): go1.2.1Git commit (client): d84a070Server version: 1.1.2Server API version: 1.13Go version (server): go1.2.1Git commit (server): d84a070


I understand this question has some popular answers already. But there is a newer way to cache files for package managers. I think it could be a good answer in the future when BuildKit becomes more standard.

As of Docker 18.09 there is experimental support for BuildKit. BuildKit adds support for some new features in the Dockerfile including experimental support for mounting external volumes into RUN steps. This allows us to create caches for things like $HOME/.cache/pip/.

We'll use the following requirements.txt file as an example:

Click==7.0Django==2.2.3django-appconf==1.0.3django-compressor==2.3django-debug-toolbar==2.0django-filter==2.2.0django-reversion==3.0.4django-rq==2.1.0pytz==2019.1rcssmin==1.0.6redis==3.3.4rjsmin==1.1.0rq==1.1.0six==1.12.0sqlparse==0.3.0

A typical example Python Dockerfile might look like:

FROM python:3.7WORKDIR /usr/src/appCOPY requirements.txt /usr/src/app/RUN pip install -r requirements.txtCOPY . /usr/src/app

With BuildKit enabled using the DOCKER_BUILDKIT environment variable we can build the uncached pip step in about 65 seconds:

$ export DOCKER_BUILDKIT=1$ docker build -t test .[+] Building 65.6s (10/10) FINISHED                                                                                                                                              => [internal] load .dockerignore                                                                                                                                          0.0s => => transferring context: 2B                                                                                                                                            0.0s => [internal] load build definition from Dockerfile                                                                                                                       0.0s => => transferring dockerfile: 120B                                                                                                                                       0.0s => [internal] load metadata for docker.io/library/python:3.7                                                                                                              0.5s => CACHED [1/4] FROM docker.io/library/python:3.7@sha256:6eaf19442c358afc24834a6b17a3728a45c129de7703d8583392a138ecbdb092                                                 0.0s => [internal] load build context                                                                                                                                          0.6s => => transferring context: 899.99kB                                                                                                                                      0.6s => CACHED [internal] helper image for file operations                                                                                                                     0.0s => [2/4] COPY requirements.txt /usr/src/app/                                                                                                                              0.5s => [3/4] RUN pip install -r requirements.txt                                                                                                                             61.3s => [4/4] COPY . /usr/src/app                                                                                                                                              1.3s => exporting to image                                                                                                                                                     1.2s => => exporting layers                                                                                                                                                    1.2s => => writing image sha256:d66a2720e81530029bf1c2cb98fb3aee0cffc2f4ea2aa2a0760a30fb718d7f83                                                                               0.0s => => naming to docker.io/library/test                                                                                                                                    0.0s

Now, let us add the experimental header and modify the RUN step to cache the Python packages:

# syntax=docker/dockerfile:experimentalFROM python:3.7WORKDIR /usr/src/appCOPY requirements.txt /usr/src/app/RUN --mount=type=cache,target=/root/.cache/pip pip install -r requirements.txtCOPY . /usr/src/app

Go ahead and do another build now. It should take the same amount of time. But this time it is caching the Python packages in our new cache mount:

$ docker build -t pythontest .[+] Building 60.3s (14/14) FINISHED                                                                                                                                              => [internal] load build definition from Dockerfile                                                                                                                       0.0s => => transferring dockerfile: 120B                                                                                                                                       0.0s => [internal] load .dockerignore                                                                                                                                          0.0s => => transferring context: 2B                                                                                                                                            0.0s => resolve image config for docker.io/docker/dockerfile:experimental                                                                                                      0.5s => CACHED docker-image://docker.io/docker/dockerfile:experimental@sha256:9022e911101f01b2854c7a4b2c77f524b998891941da55208e71c0335e6e82c3                                 0.0s => [internal] load .dockerignore                                                                                                                                          0.0s => [internal] load build definition from Dockerfile                                                                                                                       0.0s => => transferring dockerfile: 120B                                                                                                                                       0.0s => [internal] load metadata for docker.io/library/python:3.7                                                                                                              0.5s => CACHED [1/4] FROM docker.io/library/python:3.7@sha256:6eaf19442c358afc24834a6b17a3728a45c129de7703d8583392a138ecbdb092                                                 0.0s => [internal] load build context                                                                                                                                          0.7s => => transferring context: 899.99kB                                                                                                                                      0.6s => CACHED [internal] helper image for file operations                                                                                                                     0.0s => [2/4] COPY requirements.txt /usr/src/app/                                                                                                                              0.6s => [3/4] RUN --mount=type=cache,target=/root/.cache/pip pip install -r requirements.txt                                                                                  53.3s => [4/4] COPY . /usr/src/app                                                                                                                                              2.6s => exporting to image                                                                                                                                                     1.2s => => exporting layers                                                                                                                                                    1.2s => => writing image sha256:0b035548712c1c9e1c80d4a86169c5c1f9e94437e124ea09e90aea82f45c2afc                                                                               0.0s => => naming to docker.io/library/test                                                                                                                                    0.0s

About 60 seconds. Similar to our first build.

Make a small change to the requirements.txt (such as adding a new line between two packages) to force a cache invalidation and run again:

$ docker build -t pythontest .[+] Building 15.9s (14/14) FINISHED                                                                                                                                              => [internal] load build definition from Dockerfile                                                                                                                       0.0s => => transferring dockerfile: 120B                                                                                                                                       0.0s => [internal] load .dockerignore                                                                                                                                          0.0s => => transferring context: 2B                                                                                                                                            0.0s => resolve image config for docker.io/docker/dockerfile:experimental                                                                                                      1.1s => CACHED docker-image://docker.io/docker/dockerfile:experimental@sha256:9022e911101f01b2854c7a4b2c77f524b998891941da55208e71c0335e6e82c3                                 0.0s => [internal] load build definition from Dockerfile                                                                                                                       0.0s => => transferring dockerfile: 120B                                                                                                                                       0.0s => [internal] load .dockerignore                                                                                                                                          0.0s => [internal] load metadata for docker.io/library/python:3.7                                                                                                              0.5s => CACHED [1/4] FROM docker.io/library/python:3.7@sha256:6eaf19442c358afc24834a6b17a3728a45c129de7703d8583392a138ecbdb092                                                 0.0s => CACHED [internal] helper image for file operations                                                                                                                     0.0s => [internal] load build context                                                                                                                                          0.7s => => transferring context: 899.99kB                                                                                                                                      0.7s => [2/4] COPY requirements.txt /usr/src/app/                                                                                                                              0.6s => [3/4] RUN --mount=type=cache,target=/root/.cache/pip pip install -r requirements.txt                                                                                   8.8s => [4/4] COPY . /usr/src/app                                                                                                                                              2.1s => exporting to image                                                                                                                                                     1.1s => => exporting layers                                                                                                                                                    1.1s => => writing image sha256:fc84cd45482a70e8de48bfd6489e5421532c2dd02aaa3e1e49a290a3dfb9df7c                                                                               0.0s => => naming to docker.io/library/test                                                                                                                                    0.0s

Only about 16 seconds!

We are getting this speedup because we are no longer downloading all the Python packages. They were cached by the package manager (pip in this case) and stored in a cache volume mount. The volume mount is provided to the run step so that pip can reuse our already downloaded packages. This happens outside any Docker layer caching.

The gains should be much better on larger requirements.txt.

Notes:

  • This is experimental Dockerfile syntax and should be treated as such. You may not want to build with this in production at the moment.
  • The BuildKit stuff doesn't work under Docker Compose or other tools that directly use the Docker API at the moment. There is now support for this in Docker Compose as of 1.25.0. See How do you enable BuildKit with docker-compose?
  • There isn't any direct interface for managed the cache at the moment. It is purged when you do a docker system prune -a.

Hopefully, these features will make it into Docker for building and BuildKit will become the default. If / when that happens I will try to update this answer.


To minimise the network activity, you could point pip to a cache directory on your host machine.

Run your docker container with your host's pip cache directory bind mounted into your container's pip cache directory. docker run command should look like this:

docker run -v $HOME/.cache/pip-docker/:/root/.cache/pip image_1

Then in your Dockerfile install your requirements as a part of ENTRYPOINT statement (or CMD statement) instead of as a RUN command. This is important, because (as pointed out in comments) the mount is not available during image building (when RUN statements are executed). Docker file should look like this:

FROM my/baseADD . /srvENTRYPOINT ["sh", "-c", "pip install -r requirements.txt && python setup.py install && run_server"]