Skip to content

Docker Data Management

This chapter will provide an in-depth explanation of Docker data management, including data volumes, bind mounts, temporary file systems, and data persistence strategies, helping you effectively manage data within containers.

Data Management Overview

Characteristics of Container Data

Docker containers are stateless by default - data inside containers is lost when containers are deleted. To achieve data persistence and sharing, Docker provides various data management solutions:

┌─────────────────────────────────────────────────────────┐
│                    Host Filesystem                     │
├─────────────────────────────────────────────────────────┤
│  Data Volume    │  Bind Mounts    │  tmpfs Mounts    │
│  (Volumes)      │  (Bind Mounts)  │  (tmpfs Mounts)  │
│                  │                  │                  │
│  Docker managed │  Host path       │  Memory storage  │
│  Good portability│  Good performance│  Temporary data  │
│  High security  │  Host dependent  │  Deleted on stop│
└─────────────────────────────────────────────────────────┘

Data Management Method Comparison

FeatureData VolumesBind Mountstmpfs Mounts
ManagementDocker managedUser managedSystem managed
Storage locationDocker directoryAny host pathMemory
PerformanceGoodBestBest
PortabilityHighLowN/A
SecurityHighMediumHigh
PersistenceYesYesNo

Data Volumes

Basic Volume Operations

bash
# Create data volume
docker volume create my-volume

# List all data volumes
docker volume ls

# View data volume details
docker volume inspect my-volume

# Delete data volume
docker volume rm my-volume

# Delete all unused data volumes
docker volume prune

# Force delete all data volumes
docker volume prune -f

Using Data Volumes

bash
# Use data volume in container
docker run -d --name web-server -v my-volume:/usr/share/nginx/html nginx

# Use anonymous data volume
docker run -d --name app -v /app/data nginx

# Multiple containers sharing data volume
docker run -d --name app1 -v shared-data:/data nginx
docker run -d --name app2 -v shared-data:/data nginx

# Read-only data volume
docker run -d --name app -v my-volume:/data:ro nginx

# Use data volume container pattern
docker create --name data-container -v /data busybox
docker run -d --volumes-from data-container --name app1 nginx
docker run -d --volumes-from data-container --name app2 nginx

Data Volume Drivers

bash
# Use local driver (default)
docker volume create --driver local my-local-volume

# Use NFS driver
docker volume create --driver local \
  --opt type=nfs \
  --opt o=addr=192.168.1.100,rw \
  --opt device=:/path/to/nfs/share \
  nfs-volume

# Use CIFS/SMB driver
docker volume create --driver local \
  --opt type=cifs \
  --opt o=username=user,password=pass,uid=1000,gid=1000 \
  --opt device=//192.168.1.100/share \
  cifs-volume

# View available drivers
docker info | grep "Volume:"

Data Volume Configuration Options

bash
# Create labeled data volume
docker volume create --label environment=production --label team=backend my-volume

# Create data volume with driver options
docker volume create \
  --driver local \
  --opt type=none \
  --opt o=bind \
  --opt device=/host/path \
  my-bind-volume

# Set data volume size limit (requires supported storage driver)
docker volume create \
  --driver local \
  --opt type=tmpfs \
  --opt device=tmpfs \
  --opt o=size=100m \
  tmp-volume

Bind Mounts

Basic Bind Mounts

bash
# Bind mount host directory
docker run -d --name web -v /host/path:/container/path nginx

# Use absolute path
docker run -d --name web -v $(pwd)/html:/usr/share/nginx/html nginx

# Read-only bind mount
docker run -d --name web -v /host/path:/container/path:ro nginx

# Bind mount single file
docker run -d --name app -v /host/config.json:/app/config.json nginx

# Use --mount syntax (recommended)
docker run -d --name web \
  --mount type=bind,source=/host/path,target=/container/path \
  nginx

Bind Mount Options

bash
# Read-only mount
docker run -d --name app \
  --mount type=bind,source=/host/path,target=/container/path,readonly \
  nginx

# Bind propagation settings
docker run -d --name app \
  --mount type=bind,source=/host/path,target=/container/path,bind-propagation=shared \
  nginx

# Consistency settings (macOS)
docker run -d --name app \
  --mount type=bind,source=/host/path,target=/container/path,consistency=cached \
  nginx

Development Environment Examples

bash
# Node.js development environment
docker run -it --rm \
  --name node-dev \
  -v $(pwd):/workspace \
  -v node_modules:/workspace/node_modules \
  -w /workspace \
  -p 3000:3000 \
  node:16 \
  bash

# Python development environment
docker run -it --rm \
  --name python-dev \
  -v $(pwd):/app \
  -w /app \
  -p 8000:8000 \
  python:3.9 \
  bash

# Database development environment
docker run -d \
  --name postgres-dev \
  -v $(pwd)/data:/var/lib/postgresql/data \
  -v $(pwd)/init.sql:/docker-entrypoint-initdb.d/init.sql \
  -e POSTGRES_PASSWORD=password \
  -p 5432:5432 \
  postgres:13

tmpfs Mounts

Basic tmpfs Usage

bash
# Create tmpfs mount
docker run -d --name app --tmpfs /tmp nginx

# Specify tmpfs options
docker run -d --name app \
  --tmpfs /tmp:rw,size=100m,mode=1777 \
  nginx

# Use --mount syntax
docker run -d --name app \
  --mount type=tmpfs,destination=/tmp,tmpfs-size=100m \
  nginx

# Multiple tmpfs mounts
docker run -d --name app \
  --tmpfs /tmp \
  --tmpfs /var/run \
  nginx

tmpfs Use Cases

bash
# Temporary file processing
docker run -d --name processor \
  --tmpfs /tmp:size=1g \
  --tmpfs /var/tmp:size=500m \
  my-data-processor

# Cache directories
docker run -d --name web-app \
  --tmpfs /app/cache:size=200m \
  --tmpfs /app/sessions:size=100m \
  my-web-app

# Sensitive data processing
docker run -d --name secure-app \
  --tmpfs /secure:noexec,nosuid,size=50m \
  my-secure-app

Data Management in Docker Compose

Data Volume Configuration

yaml
version: '3.8'

services:
  web:
    image: nginx
    volumes:
      # Named data volume
      - web-content:/usr/share/nginx/html
      # Bind mount
      - ./nginx.conf:/etc/nginx/nginx.conf:ro
      # Anonymous data volume
      - /var/log/nginx

  db:
    image: postgres:13
    volumes:
      # Named data volume
      - postgres-data:/var/lib/postgresql/data
      # Initialization script
      - ./init.sql:/docker-entrypoint-initdb.d/init.sql:ro
    environment:
      POSTGRES_PASSWORD: password

  app:
    image: myapp
    volumes:
      # Development code mount
      - .:/app
      # Prevent node_modules from being overwritten
      - /app/node_modules
    tmpfs:
      # Temporary files
      - /tmp
      - /app/cache

# Define named data volumes
volumes:
  web-content:
    driver: local
  postgres-data:
    driver: local
    driver_opts:
      type: none
      o: bind
      device: /host/postgres/data

External Data Volumes

yaml
version: '3.8'

services:
  app:
    image: myapp
    volumes:
      - existing-volume:/data

volumes:
  existing-volume:
    external: true
    # Or specify external data volume name
    # external:
    #   name: my-existing-volume

Data Volume Configuration Options

yaml
version: '3.8'

services:
  app:
    image: myapp
    volumes:
      # Long format configuration
      - type: volume
        source: app-data
        target: /data
        read_only: false
        volume:
          nocopy: true

      # Bind mount long format
      - type: bind
        source: ./config
        target: /app/config
        read_only: true
        bind:
          propagation: shared

      # tmpfs long format
      - type: tmpfs
        target: /tmp
        tmpfs:
          size: 100M
          mode: 1777

volumes:
  app-data:
    driver: local
    driver_opts:
      type: nfs
      o: addr=nfs-server,rw
      device: ":/path/to/nfs/share"
    labels:
      - "environment=production"
      - "backup=daily"

Data Backup and Recovery

Data Volume Backup

bash
# Backup data volume to tar file
docker run --rm \
  -v my-volume:/data \
  -v $(pwd):/backup \
  ubuntu \
  tar czf /backup/backup-$(date +%Y%m%d-%H%M%S).tar.gz -C /data .

# Use dedicated backup container
docker run --rm \
  -v my-volume:/source:ro \
  -v $(pwd):/backup \
  --name backup-container \
  alpine \
  sh -c "cd /source && tar czf /backup/volume-backup.tar.gz ."

# Backup to remote storage
docker run --rm \
  -v my-volume:/data:ro \
  -e AWS_ACCESS_KEY_ID=your-key \
  -e AWS_SECRET_ACCESS_KEY=your-secret \
  amazon/aws-cli \
  s3 sync /data s3://your-bucket/backup/

Data Volume Recovery

bash
# Restore from tar file
docker run --rm \
  -v my-volume:/data \
  -v $(pwd):/backup \
  ubuntu \
  tar xzf /backup/backup.tar.gz -C /data

# Copy from another data volume
docker run --rm \
  -v source-volume:/source:ro \
  -v target-volume:/target \
  ubuntu \
  cp -a /source/. /target/

# Restore from remote storage
docker run --rm \
  -v my-volume:/data \
  -e AWS_ACCESS_KEY_ID=your-key \
  -e AWS_SECRET_ACCESS_KEY=your-secret \
  amazon/aws-cli \
  s3 sync s3://your-bucket/backup/ /data

Automated Backup Scripts

bash
#!/bin/bash
# backup-volumes.sh

BACKUP_DIR="/backup"
DATE=$(date +%Y%m%d-%H%M%S)

# Get all data volumes
VOLUMES=$(docker volume ls -q)

for volume in $VOLUMES; do
    echo "Backing up data volume: $volume"

    # Create backup directory
    mkdir -p "$BACKUP_DIR/$volume"

    # Backup data volume
    docker run --rm \
        -v "$volume":/source:ro \
        -v "$BACKUP_DIR/$volume":/backup \
        alpine \
        tar czf "/backup/$volume-$DATE.tar.gz" -C /source .

    # Keep backups from last 7 days
    find "$BACKUP_DIR/$volume" -name "*.tar.gz" -mtime +7 -delete

    echo "Backup complete: $volume-$DATE.tar.gz"
done

echo "All data volumes backup complete"

Data Synchronization and Migration

Container-to-Container Data Synchronization

bash
# Use rsync to synchronize data
docker run --rm \
  -v source-volume:/source:ro \
  -v target-volume:/target \
  instrumentisto/rsync \
  rsync -av --delete /source/ /target/

# Real-time synchronization (using inotify)
docker run -d \
  -v source-volume:/source:ro \
  -v target-volume:/target \
  --name sync-container \
  alpine \
  sh -c "
    apk add --no-cache inotify-tools rsync
    while inotifywait -r -e modify,create,delete /source; do
      rsync -av --delete /source/ /target/
    done
  "

Cross-Host Data Migration

bash
# Export data volume
docker run --rm \
  -v my-volume:/data:ro \
  alpine \
  tar czf - -C /data . > volume-export.tar.gz

# Transfer to target host
scp volume-export.tar.gz user@target-host:/tmp/

# Import on target host
docker volume create my-volume
docker run --rm \
  -v my-volume:/data \
  -i alpine \
  tar xzf - -C /data < /tmp/volume-export.tar.gz

Performance Optimization

Storage Driver Optimization

json
// /etc/docker/daemon.json
{
  "storage-driver": "overlay2",
  "storage-opts": [
    "overlay2.override_kernel_check=true",
    "overlay2.size=20G"
  ]
}

Data Volume Performance Tuning

bash
# Use local SSD storage
docker volume create \
  --driver local \
  --opt type=none \
  --opt o=bind \
  --opt device=/ssd/path \
  fast-volume

# Memory file system
docker volume create \
  --driver local \
  --opt type=tmpfs \
  --opt device=tmpfs \
  --opt o=size=1G \
  memory-volume

# Network storage optimization
docker volume create \
  --driver local \
  --opt type=nfs \
  --opt o=addr=nfs-server,rw,tcp,hard,intr,timeo=600 \
  --opt device=:/fast/nfs/share \
  nfs-volume

Monitor Storage Usage

bash
# View Docker storage usage
docker system df

# Detailed view
docker system df -v

# View data volume usage
docker volume ls --format "table {{.Name}}\t{{.Driver}}\t{{.Scope}}"

# Monitoring script
#!/bin/bash
while true; do
    echo "=== Docker Storage Usage ==="
    docker system df
    echo ""
    echo "=== Data Volume List ==="
    docker volume ls
    echo ""
    sleep 60
done

Security Considerations

Data Volume Security

bash
# Set data volume permissions
docker run --rm \
  -v my-volume:/data \
  alpine \
  chown -R 1000:1000 /data

# Encrypt data volume
docker volume create \
  --driver local \
  --opt type=ext4 \
  --opt o=loop,encryption=aes256 \
  --opt device=/encrypted/volume/file \
  encrypted-volume

# Read-only mount sensitive data
docker run -d \
  -v /host/secrets:/secrets:ro \
  --security-opt no-new-privileges \
  myapp

Access Control

bash
# Limit container user permissions
docker run -d \
  --user 1000:1000 \
  -v app-data:/data \
  myapp

# Use SELinux labels
docker run -d \
  --security-opt label:type:container_file_t \
  -v /host/data:/data:Z \
  myapp

# AppArmor configuration
docker run -d \
  --security-opt apparmor:docker-default \
  -v app-data:/data \
  myapp

Troubleshooting

Common Issue Diagnosis

bash
# Check data volume mount
docker inspect container_name | grep -A 10 "Mounts"

# View data volume content
docker run --rm -v my-volume:/data alpine ls -la /data

# Check permission issues
docker run --rm -v my-volume:/data alpine \
  sh -c "ls -la /data && id"

# Test read/write permissions
docker run --rm -v my-volume:/data alpine \
  sh -c "echo 'test' > /data/test.txt && cat /data/test.txt"

# View storage driver information
docker info | grep -A 20 "Storage Driver"

Performance Issue Troubleshooting

bash
# Monitor I/O performance
docker run --rm -v my-volume:/data alpine \
  sh -c "dd if=/dev/zero of=/data/test bs=1M count=100 oflag=direct"

# Check disk space
docker run --rm -v my-volume:/data alpine df -h /data

# View inode usage
docker run --rm -v my-volume:/data alpine df -i /data

Chapter Summary

This chapter comprehensively introduced various aspects of Docker data management:

Key Points:

  • Data volumes: Docker-managed persistent storage, recommended for production
  • Bind mounts: Direct host path mounting, suitable for development
  • tmpfs mounts: Memory storage, suitable for temporary data
  • Backup and recovery: Regularly backup important data, establish recovery strategies
  • Performance optimization: Choose appropriate storage drivers and configurations
  • Security considerations: Permission control and access restrictions

Best Practices:

  • Prioritize data volumes in production environments
  • Regularly backup important data
  • Monitor storage usage
  • Set reasonable permissions and security policies
  • Choose appropriate storage drivers

In the next chapter, we will learn about Docker networking configuration, including network modes, custom networks, and service discovery.

Further Reading

Content is for learning and research only.