How to deal with persistent storage (e.g. databases) in Docker How to deal with persistent storage (e.g. databases) in Docker docker docker

How to deal with persistent storage (e.g. databases) in Docker


Docker 1.9.0 and above

Use volume API

docker volume create --name hellodocker run -d -v hello:/container/path/for/volume container_image my_command

This means that the data-only container pattern must be abandoned in favour of the new volumes.

Actually the volume API is only a better way to achieve what was the data-container pattern.

If you create a container with a -v volume_name:/container/fs/path Docker will automatically create a named volume for you that can:

  1. Be listed through the docker volume ls
  2. Be identified through the docker volume inspect volume_name
  3. Backed up as a normal directory
  4. Backed up as before through a --volumes-from connection

The new volume API adds a useful command that lets you identify dangling volumes:

docker volume ls -f dangling=true

And then remove it through its name:

docker volume rm <volume name>

As @mpugach underlines in the comments, you can get rid of all the dangling volumes with a nice one-liner:

docker volume rm $(docker volume ls -f dangling=true -q)# Or using 1.13.xdocker volume prune

Docker 1.8.x and below

The approach that seems to work best for production is to use a data only container.

The data only container is run on a barebones image and actually does nothing except exposing a data volume.

Then you can run any other container to have access to the data container volumes:

docker run --volumes-from data-container some-other-container command-to-execute
  • Here you can get a good picture of how to arrange the different containers.
  • Here there is a good insight on how volumes work.

In this blog post there is a good description of the so-called container as volume pattern which clarifies the main point of having data only containers.

Docker documentation has now the DEFINITIVE description of the container as volume/s pattern.

Following is the backup/restore procedure for Docker 1.8.x and below.

BACKUP:

sudo docker run --rm --volumes-from DATA -v $(pwd):/backup busybox tar cvf /backup/backup.tar /data
  • --rm: remove the container when it exits
  • --volumes-from DATA: attach to the volumes shared by the DATA container
  • -v $(pwd):/backup: bind mount the current directory into the container; to write the tar file to
  • busybox: a small simpler image - good for quick maintenance
  • tar cvf /backup/backup.tar /data: creates an uncompressed tar file of all the files in the /data directory

RESTORE:

# Create a new data container$ sudo docker run -v /data -name DATA2 busybox true# untar the backup files into the new container᾿s data volume$ sudo docker run --rm --volumes-from DATA2 -v $(pwd):/backup busybox tar xvf /backup/backup.tardata/data/sven.txt# Compare to the original container$ sudo docker run --rm --volumes-from DATA -v `pwd`:/backup busybox ls /datasven.txt

Here is a nice article from the excellent Brian Goff explaining why it is good to use the same image for a container and a data container.


In Docker release v1.0, binding a mount of a file or directory on the host machine can be done by the given command:

$ docker run -v /host:/container ...

The above volume could be used as a persistent storage on the host running Docker.


As of Docker Compose 1.6, there is now improved support for data volumes in Docker Compose. The following compose file will create a data image which will persist between restarts (or even removal) of parent containers:

Here is the blog announcement: Compose 1.6: New Compose file for defining networks and volumes

Here's an example compose file:

version: "2"services:  db:    restart: on-failure:10    image: postgres:9.4    volumes:      - "db-data:/var/lib/postgresql/data"  web:    restart: on-failure:10    build: .    command: gunicorn mypythonapp.wsgi:application -b :8000 --reload    volumes:      - .:/code    ports:      - "8000:8000"    links:      - dbvolumes:  db-data:

As far as I can understand: This will create a data volume container (db_data) which will persist between restarts.

If you run: docker volume ls you should see your volume listed:

local               mypthonapp_db-data...

You can get some more details about the data volume:

docker volume inspect mypthonapp_db-data[  {    "Name": "mypthonapp_db-data",    "Driver": "local",    "Mountpoint": "/mnt/sda1/var/lib/docker/volumes/mypthonapp_db-data/_data"  }]

Some testing:

# Start the containersdocker-compose up -d# .. input some data into the databasedocker-compose run --rm web python manage.py migratedocker-compose run --rm web python manage.py createsuperuser...# Stop and remove the containers:docker-compose stopdocker-compose rm -f# Start it back up againdocker-compose up -d# Verify the data is still there...(it is)# Stop and remove with the -v (volumes) tag:docker-compose stopdocker=compose rm -f -v# Up again ..docker-compose up -d# Check the data is still there:...(it is).

Notes:

  • You can also specify various drivers in the volumes block. For example, You could specify the Flocker driver for db_data:

    volumes:  db-data:    driver: flocker
  • As they improve the integration between Docker Swarm and Docker Compose (and possibly start integrating Flocker into the Docker eco-system (I heard a rumor that Docker has bought Flocker), I think this approach should become increasingly powerful.

Disclaimer: This approach is promising, and I'm using it successfully in a development environment. I would be apprehensive to use this in production just yet!