Installing pandas in docker Alpine Installing pandas in docker Alpine python python

Installing pandas in docker Alpine


If you're not bound to Alpine 3.6, using Alpine 3.7 (or later) should work.

On Alpine 3.6, installing matplotlib failed for me with the following:

Collecting matplotlib  Downloading https://files.pythonhosted.org/packages/26/04/8b381d5b166508cc258632b225adbafec49bbe69aa9a4fa1f1b461428313/matplotlib-3.0.3.tar.gz (36.6MB)    Complete output from command python setup.py egg_info:    Download error on https://pypi.org/simple/numpy/: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:833) -- Some packages may not be found!    Couldn't find index page for 'numpy' (maybe misspelled?)    Download error on https://pypi.org/simple/: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:833) -- Some packages may not be found!    No local packages or working download links found for numpy>=1.10.0

However, on Alpine 3.7, it worked. This may be due to a numpy versioning issue (see here), but I'm not able to tell for sure. Past that problem, packages were built and installed successfully - taking a good while, about 30 minutes (since Alpine's musl-libc is not compatible to Python's Wheels format, all packages installed with pip have to be built from source).

Note that one important change is needed: you should only remove the build-runtime virtual package (apk del build-runtime) after pip install. Also, if applicable, you could replace numpy 1.16.1 with 1.16.2, which is the shipped version (otherwise 1.16.2 will be uninstalled and 1.16.1 built from source, further increasing the build time) - I haven't tried this, though.

For reference, here's my slightly modified Dockerfile and docker build output.

Note:

Usually Alpine is chosen as the base for minimizing the image size (Alpine is also otherwise very slick, but has compatibility issues with mainland Linux apps due to glibc/musl). Having to build Python packages from source kind of beats that purpose, since you get a very bloated image - 900MB before any cleanup, which also takes ages to build. The image could be greatly compacted by removing all intermediate compilation artifacts, build dependencies etc., but still.

If you can't get the Python package versions you need to work on Alpine, without having to build them from source, I would suggest trying other small and more compatible base images such as debian-slim, or even ubuntu.

Edit:

Following "Edit 3" with added requirements, here are updated Dockerfile and Docker build output.The following packages were added for satisfying build dependencies:

postgresql-dev libffi-dev libressl-dev libxml2 libxml2-dev libxslt libxslt-dev libjpeg-turbo-dev zlib-dev

For packages that failed to build due to specific headers, I used Alpine's package contents search to locate the missing package.Specifically for cffi, the ffi.h header was missing, which needs the libffi-dev package: https://pkgs.alpinelinux.org/contents?file=ffi.h&path=&name=&branch=v3.7.

Alternatively, when a package build failure is not very clear, the installation instructions of the specific package could be referred to, for example, Pillow.

The new image size, before any compaction, is 1.04GB. For cutting it down a bit, you could remove the Python and pip caches:

RUN apk del build-runtime && \    find -type d -name __pycache__ -prune -exec rm -rf {} \; && \    rm -rf ~/.cache/pip

This will bring image size down to 661MB, when using docker build --squash.


Try adding this to your requirements.txt file:

numpy==1.16.0pandas==0.23.4

I've been facing the same error since yesterday and this change solved it for me.


This may not be completely relevant, since this the first answer that pops up when searching for numpy/pandas installation failed in Alpine, I am adding this answer.

The following fix worked for me(But it takes longer to install pandas/numpy)

apk updateapk --no-cache add curl gcc g++ln -s /usr/include/locale.h /usr/include/xlocale.h

Now try installing pandas/numpy