TesseractNotFound issue when containerizing in docker
Edit 3:
Some of the python packages in requirements.txt
have other prerequisites.With this Dockerfile
it went successfully through the entire build process.
The trickiest part was to build opencv
.
Credits to https://github.com/janza/docker-python3-opencv/blob/master/Dockerfile
.├── Dockerfile└── requirements.txt
Dockerfile:
FROM python:3.7RUN apt-get update \ && apt-get install -y \ build-essential \ cmake \ git \ wget \ unzip \ yasm \ pkg-config \ libswscale-dev \ libtbb2 \ libtbb-dev \ libjpeg-dev \ libpng-dev \ libtiff-dev \ libavformat-dev \ libpq-dev \ && rm -rf /var/lib/apt/lists/*RUN pip install numpyWORKDIR /ENV OPENCV_VERSION="4.1.1"RUN wget https://github.com/opencv/opencv/archive/${OPENCV_VERSION}.zip \&& unzip ${OPENCV_VERSION}.zip \&& mkdir /opencv-${OPENCV_VERSION}/cmake_binary \&& cd /opencv-${OPENCV_VERSION}/cmake_binary \&& cmake -DBUILD_TIFF=ON \ -DBUILD_opencv_java=OFF \ -DWITH_CUDA=OFF \ -DWITH_OPENGL=ON \ -DWITH_OPENCL=ON \ -DWITH_IPP=ON \ -DWITH_TBB=ON \ -DWITH_EIGEN=ON \ -DWITH_V4L=ON \ -DBUILD_TESTS=OFF \ -DBUILD_PERF_TESTS=OFF \ -DCMAKE_BUILD_TYPE=RELEASE \ -DCMAKE_INSTALL_PREFIX=$(python3.7 -c "import sys; print(sys.prefix)") \ -DPYTHON_EXECUTABLE=$(which python3.7) \ -DPYTHON_INCLUDE_DIR=$(python3.7 -c "from distutils.sysconfig import get_python_inc; print(get_python_inc())") \ -DPYTHON_PACKAGES_PATH=$(python3.7 -c "from distutils.sysconfig import get_python_lib; print(get_python_lib())") \ .. \&& make install \&& rm /${OPENCV_VERSION}.zip \&& rm -r /opencv-${OPENCV_VERSION}RUN ln -s \ /usr/local/python/cv2/python-3.7/cv2.cpython-37m-x86_64-linux-gnu.so \ /usr/local/lib/python3.7/site-packages/cv2.soRUN apt-get --fix-missing update && apt-get --fix-broken install && apt-get install -y poppler-utils && apt-get install -y tesseract-ocr && \ apt-get install -y libtesseract-dev && apt-get install -y libleptonica-dev && ldconfig && apt install -y libsm6 libxext6 && apt install -y python-opencvCOPY ./requirements.txt ./ RUN pip3 install --upgrade pip# install dependencies RUN pip3 install -r requirements.txt
Build:
docker image build -t my-awesome-py .
Run:
docker run --rm my-awesome-py tesseractUsage: tesseract --help | --help-extra | --version tesseract --list-langs tesseract imagename outputbase [options...] [configfile...]OCR options: -l LANG[+LANG] Specify language(s) used for OCR.NOTE: These options must occur before any configfile.Single options: --help Show this help message. --help-extra Show extra help for advanced users. --version Show version information. --list-langs List available languages for tesseract engine.