This document captures fundamental questions and answers about Docker concepts, organized by topic.
UNIX Socket (like /var/run/docker.sock):
- File-based communication on the same machine
- Lives in the filesystem (you can
lsit!) - Fast, local-only (no network involved)
- Used for inter-process communication (IPC)
- Example: Docker CLI talks to Docker daemon via socket
Network Port (like port 80, 8080):
- Number-based communication over network
- Can be local (localhost:8080) or remote (server.com:80)
- Uses TCP/IP protocol
- Used for network communication
- Example: Web browser talks to nginx via port 80
Analogy:
- Socket = internal phone line between offices
- Port = external phone number for customers
Yes! The flow is:
WSL Bash
↓
docker CLI (client binary in WSL)
↓ (via socket: /var/run/docker.sock)
Docker Desktop VM
↓
dockerd (Docker daemon)
↓
Executes command (creates container, builds image, etc.)
The docker command in bash is just a remote control - all the real work happens inside Docker Desktop's VM.
The Docker Engine API is EXPOSED BY the daemon:
docker CLI
↓ (uses Docker Engine API)
dockerd (daemon)
↓ (exposes Docker Engine API)
↓ (uses lower-level APIs)
containerd
↓
runc (creates actual containers)
↓
Linux kernel (namespaces, cgroups)
On Windows/Mac with Docker Desktop:
- Images and containers exist inside the Docker Desktop VM
- Stored at
/var/lib/docker/within the VM - NOT on your Windows filesystem
- Managed by Docker
What docker system df shows:
- Images: Stored in VM (layered filesystem)
- Containers: Writable layer on top of images, in VM
- Local Volumes: Docker-managed, inside VM at
/var/lib/docker/volumes/ - Build Cache: Inside VM at
/var/lib/docker/buildkit/
Exception: Bind mounts directly link to host folders.
Yes! That's Docker's key design principle - abstraction.
As CLI users, we think in terms of:
- Image names:
ubuntu,nginx,python:3.9 - Container names/IDs:
my-web-server,abc123def456 - Logical concepts: ports, volumes, networks
Docker handles the "where" for us - same commands work on Windows, Mac, Linux.
Exception: When you mount volumes (-v /host/path:/container/path), you DO specify host locations.
The Image is an intermediate template between Dockerfile and Container:
- Dockerfile = Recipe (text instructions)
- Image = Frozen meal (ready to cook, but not cooking yet)
- Container = Hot meal on your plate (actively running)
Key benefits:
- Build the Image once (slow)
- Run many Containers from it (fast)
- Share the Image - everyone gets identical environments
- The Image remains unchanged when you run containers
Each Dockerfile instruction creates a layer:
FROM python:3.9 # Layer 1
RUN pip install flask # Layer 2 (cached!)
COPY app.py . # Layer 3If you change only app.py and rebuild:
- Layers 1 and 2 are reused from cache
- Only Layer 3 is rebuilt
- Result: Much faster development!
From user experience: Yes, it feels like a VM.
Technically: No, fundamentally different.
What makes it FEEL like a VM:
- Own filesystem
- Own process tree
- Own network interface
- Can install packages, run commands
What reveals it's NOT a VM:
- Shared kernel: Container and host use the same kernel
- No boot process: Starts in milliseconds
- Can't load kernel modules: VMs can, containers can't
- Resource sharing: No hardware emulation layer
The illusion: Docker creates namespaces that make it appear isolated:
- PID namespace: Process 1 inside, but actually process 54321 on host
- Mount namespace: Own
/filesystem view - Network namespace: Own IP address
Analogy:
- VM: Separate computer running inside your computer
- Container: Clever trick making one process think it's alone
Port Mapping (forwarding):
-p 8080:8080Host port 8080 → Container port 8080
Traffic is forwarded/proxied between two separate network spaces.
The mapping chain:
Browser (Windows)
↓
localhost:8080 (Windows)
↓ (Docker Desktop forwards)
Docker VM port 8080
↓ (Docker daemon routes)
Container port 8080
↓
Flask app listening on 0.0.0.0:8080
Yes! It uses an ephemeral (temporary) port.
How it works:
- OS assigns a random high-numbered port (typically 32768-65535)
- prime-web uses this as the source port
- Connection:
prime-web:54321 → prime-api:5000(54321 is example) - prime-api responds back to:
prime-api:5000 → prime-web:54321 - After response, ephemeral port is released
Port Roles:
- Listening ports (servers): Fixed (8080, 5000)
- Ephemeral ports (clients): Random, temporary
Analogy:
- Listening port = business phone number (fixed, published)
- Ephemeral port = your cell phone when you call them (temporary)
The virtual network exists in the Linux kernel's network namespace managed by the Docker daemon.
On Windows + WSL2:
- Docker Desktop runs a hidden Linux VM
- The network exists inside that VM's kernel
- Managed by Linux kernel networking features (bridge, iptables, network namespaces)
What it physically is:
- A software bridge (like a virtual network switch)
- Network namespace isolation
- Routing rules in the kernel
You can inspect it:
docker network inspect prime-netKey point: It's not a "place" you can navigate to - it's a kernel data structure that routes packets between containers.
Yes! Docker has 5 network types:
- Bridge (default): Containers on same bridge can talk, isolated from others
- Host: Container uses host's network directly (no isolation)
- None: No network at all (completely isolated)
- Overlay: Spans multiple Docker hosts (for Swarm/Kubernetes)
- Macvlan: Container gets its own MAC address (appears as physical device)
Most common: Bridge networks for 99% of use cases.
Exactly right! Perfect way to think about it.
Bind mount:
docker run -v /home/rob/mydata:/app/data ubuntu- Host folder:
/home/rob/mydata - Container sees it as:
/app/data - Same folder, two names (alias)
- Write to
/app/data/file.txtinside → appears at/home/rob/mydata/file.txtinstantly
It's like a symbolic link - the container path is just another name for the host folder.
Yes, very similar concept with an important distinction:
Port Mapping: Data is copied/forwarded
- Packet arrives at host:8080
- Docker forwards it to container:8080
- Two separate network stacks
Volume Mapping: Data is shared directly
- File written in container at
/app/data/file.txt - Immediately visible on host at
/home/rob/data/file.txt - Same underlying filesystem
Analogy:
- Port mapping = Mail forwarding (copy and send)
- Volume mapping = Shared folder (same files, different doors)
Without version numbers:
flask
torchGets the latest version available at build time.
The Problem:
- Today: Builds with
flask==3.1.0- works! - 6 months later: Rebuilds with
flask==4.0.0- breaking changes! - "But it worked before!"
Best Practice for Docker:
flask==3.0.0 # Lock version, reproducibleTrade-offs:
| Approach | Pros | Cons |
|---|---|---|
| No versions | Always latest | Breaks unexpectedly |
| Exact versions | Reproducible | Miss security updates |
pip freeze |
Fully reproducible | Very rigid |
For Docker: Exact versions are best - containers should be reproducible!
G4dn.xlarge specs:
- 4 vCPUs, 16 GB RAM
- 1x NVIDIA T4 GPU (16 GB VRAM)
Capacity by workload:
- Light containers (APIs): 20-40 containers
- Medium containers (ML inference): 4-8 containers
- GPU-accelerated ML: 1-2 containers (GPU can't be easily shared)
Calculation for ML:
Available RAM: 16 GB - 2 GB (OS) = 14 GB
Per container: 2 GB
Max: 14 / 2 = 7 containers
But GPU is the bottleneck - only 1 container can use GPU at a time typically.
Best practice: Run 1 GPU container + 5-10 CPU-only support containers.
N indicates the number of layers in the network architecture, NOT categories.
- ResNet-18: 18 layers deep
- ResNet-50: 50 layers deep
- ResNet-152: 152 layers deep
The number of categories is always 1000 (for ImageNet pre-trained models).
The depth (N) affects:
- Model capacity
- Accuracy (deeper usually = more accurate)
- Speed (deeper = slower)
- Parameters (deeper = more weights)
Yes! "Res" = Residual.
The Problem ResNet Solved: Before ResNet (2015), networks deeper than ~20 layers got worse, not better (vanishing gradients).
Residual Connections - The Solution:
Traditional network:
Input → Layer 1 → Layer 2 → Output
Each layer learns: Output = F(Input)
ResNet with skip connection:
Input ──────────────────┐
│ │
└→ Layer 1 → Layer 2 ─┴→ Add → Output
Each block learns: Output = F(Input) + Input
The + Input is the skip connection (or residual connection).
Why This Works:
- Learning the residual (difference) is easier
- Skip connection provides "highway" for gradients
- Even if F(x) learns poorly, you still have input passing through
Analogy:
- Traditional: "Describe this person completely"
- Residual: "Describe how this person differs from average" (easier!)
It's reasonable, but the answer varies wildly - and "neurons" means something different in CNNs.
Convolutional layers have:
- Filters (channels): Number of feature maps
- Spatial dimensions: Height × Width
ResNet-50 layer sizes (examples):
- Input: 224×224×3 = ~150K pixels
- Early: 112×112×64 = ~800K activations
- Middle: 28×28×512 = ~400K activations
- Late: 7×7×2048 = ~100K activations
- Final: 2048 → 1000 (fully-connected layer)
Better question: "How many channels per layer?"
- ResNet-50: 64 → 256 → 512 → 1024 → 2048 channels as you go deeper
Yes to both!
Standard preprocessing for ResNet:
- Convert to RGB: Grayscale → 3-channel RGB, RGBA → drop alpha
- Resize to 256×256: Maintains aspect ratio
- Center crop to 224×224: Standard ImageNet input size
- Normalize: Using ImageNet dataset statistics
Result: Any input image → 224×224×3 RGB, normalized
This is why you can upload any size/format image and it works!
Key Resource Constraints:
1. Memory (usually the bottleneck):
Max containers = (Available RAM - OS overhead) / Memory per container
Example:
- Host: 16 GB RAM
- OS + Docker: 2 GB
- Each container: 512 MB
- Max: (16 - 2) / 0.5 = 28 containers
2. CPU:
Max containers = CPU cores × oversubscription / CPU per container
Rule of Thumb:
- Light containers (nginx, APIs): 50-100+ per host
- Medium containers (web apps): 10-30 per host
- Heavy containers (databases, ML): 5-10 per host
Best practice: Leave 20-30% headroom for spikes.
When you need the build number:
- Troubleshooting: Reporting bugs or asking for help
- Compatibility: Some features require specific versions
- Production: Ensure consistent environments across teams
- Security: Check if you have patched versions
For learning Docker basics, you rarely need it.
Excellent explanation structure:
"There are three related, central nouns in the Docker ecosystem: Dockerfile, Image, and Container.
- Dockerfile is the starting point of a recipe for the Container
- Image is an intermediate template that does not execute code. It is a snapshot that includes everything needed to create and run the Container
- Container will execute a task when it runs
The Image is built using docker build. The subsequent command docker run creates a Container from the Image and then runs it. The image remains unchanged so the process can be repeated.
An Image is actually a layered filesystem. If we change only the Python code and do a new build: Docker will rebuild just that layer, not the entire Image: Faster development."
- Abstraction is key: Docker hides complexity (filesystem locations, networking details)
- Images are templates: Build once, run many times
- Containers are isolated processes: Not VMs, but feel like them
- Networks enable communication: Container-to-container via names
- Volumes persist data: Bind mounts for development, named volumes for production
- Layers enable caching: Put stable dependencies first, changing code last
- Reproducibility matters: Lock versions in requirements.txt
-p stands for "parents".
What it does:
- Creates parent directories as needed
- Doesn't error if directory already exists
mkdir -p ~/a/b/c/d # Creates all intermediate directories
mkdir -p ~/dwdata # Safe even if dwdata already existsWithout -p:
mkdir ~/a/b/c/d # Fails if ~/a/b/c doesn't exist
mkdir ~/dwdata # Errors if dwdata already existsFor workshops: mkdir -p is safer, but plain mkdir is simpler and teaches students to handle errors.
Flask is a Python web application framework for building web apps.
nginx is a production web server for serving web traffic.
| Aspect | Flask | nginx |
|---|---|---|
| Purpose | Build web apps | Serve web traffic |
| Language | Python | C |
| Performance | ~1000 req/sec | ~50,000+ req/sec |
| Static files | Slow | Fast |
| Concurrency | Limited | Excellent |
| Use case | Development, APIs | Production serving |
In production: nginx sits in front of Flask:
Browser → nginx:80 → Flask:5000
For workshops: Flask alone is perfect—students see the full stack in Python.
Both use Flask:
- prime-checker (backend API): Flask exposes
/check/<number>endpoint, returns JSON - prime-frontend (proxy + web): Flask serves HTML page AND forwards requests to prime-checker
Flask is used for both HTTP servers—one serves an API, the other serves web pages and proxies requests.
It's not needed in the current setup.
CORS (Cross-Origin Resource Sharing) would be needed if the browser's JavaScript directly called prime-checker's API from a different origin. But since prime-frontend acts as a proxy, the browser only talks to prime-frontend (same origin). The backend call from prime-frontend to prime-checker is server-to-server, which doesn't trigger CORS restrictions.
Result: flask-cors was removed to reduce complexity.
EXPOSE is documentation only. It tells users "this container listens on port X" but doesn't actually open or publish the port.
EXPOSE 8080The -p 8080:8080 flag does the actual port mapping regardless of EXPOSE.
EXPOSE is useful for:
- Documentation (what ports does this app use?)
docker run -P(publish all exposed ports to random host ports)
For workshops: You could remove all EXPOSE lines and everything would still work with -p.
Container port. EXPOSE 8080 means "the application inside this container listens on port 8080."
It's about the container's internal network namespace, not the Docker VM or host.
COPY copies files from the build context (host) to the image.
WORKDIR /app
COPY app.py .- Source:
app.pyin build context (the.fromdocker build .) - Destination: Current WORKDIR in the image (
.means/app)
Result: app.py ends up at /app/app.py in the image.
RUN: Executes during docker build. Modifies the image. Creates a new layer.
CMD: Specifies what runs when container starts. Doesn't execute during build. Only one CMD per Dockerfile.
RUN pip install flask # Happens at build time
CMD ["python", "app.py"] # Happens at container startSummary:
- RUN = build the image
- CMD = run the container
Execute the following string as Python code, then exit.
python -c "print('hello')"Runs the code without needing a .py file.
Used in ResNet Dockerfile to download model during build:
RUN python -c "from transformers import AutoModelForImageClassification; AutoModelForImageClassification.from_pretrained('microsoft/resnet-50')"The ~98MB download includes:
- Model architecture definition (small, KB range)
- Trained weights (the bulk, ~98MB)
- Configuration files (small)
The weights ARE the model for inference purposes. There's no separate "model" file—the weights contain the learned parameters from training on ImageNet.
Not with !ls — that's IPython/Jupyter syntax, not standard Python REPL.
From Python REPL, use:
import os
os.system('ls')or
import subprocess
subprocess.run(['ls'])The ! shortcut only works in IPython/Jupyter notebooks.
exit() or quit() or Ctrl+D will all exit the Python REPL and stop the container.
The CMD from the base image is inherited.
For example, FROM python:3.11-slim has CMD ["python3"] in its Dockerfile. So docker run fu-image (without specifying bash) would start a Python REPL.
If there's no CMD anywhere in the chain: The container starts and immediately exits with nothing to do.
docker run [options] IMAGE [COMMAND]
Two expected arguments:
- Image name or identifier (required)
- Command (optional—uses CMD from Dockerfile if omitted)
Example:
docker run -it fu-image bash-it: Interactive terminal flagsfu-image: Image namebash: Command to run (overrides CMD)
The command is optional—if omitted, Docker uses the CMD from the Dockerfile.
Not quite. bash doesn't run "in perpetuity"—it runs interactively because of the -it flags.
Without -it: bash would start and immediately exit (no stdin to read from).
With -it: bash stays alive by giving it a terminal to interact with.
When you type exit: bash terminates, and the container stops.
docker run -p 8080:8080 prime-frontendFormat: -p HOST_PORT:CONTAINER_PORT
- Left (8080): Port on your host machine
- Right (8080): Port inside container where Flask is listening
Traffic flow:
- Browser connects to
localhost:8080(host) - Docker forwards to container port 8080
- Flask app inside container receives request
- Response flows back the same path
If different: -p 9090:8080 means host port 9090 maps to container port 8080.
Docker creates three default networks:
-
bridge: Default network. Containers can communicate by IP, but not by name. Used when you don't specify
--network. -
host: Container uses host's network directly. No isolation. Container ports are host ports (no
-pmapping needed). -
none: No networking. Completely isolated container.
Custom networks (like prime-net) provide DNS resolution—containers can reach each other by name.
Same speed—both are bridge networks under the hood.
To get IPs on default bridge:
docker inspect prime-api | grep IPAddressThen hardcode that IP: http://172.17.0.2:5000
Bad idea because:
- IPs change between runs
- Requires manual lookup
- Hardcoded IPs in code
Custom network with DNS (container names as hostnames) is simpler and more robust.
Use cases:
- Network performance critical (eliminates NAT overhead)
- Need to bind to specific host interfaces
- Network monitoring/scanning tools
- Legacy apps that can't handle port mapping
Command:
docker run --network host my-imageCode inside container:
app.run(host='0.0.0.0', port=8080)App binds to port 8080, immediately available on host at localhost:8080. No -p flag needed.
Downside: Port conflicts. If host already uses 8080, container fails.
-c means "command" (execute the following string as a command).
bash -c "echo hello"It's not "clobber"—though the name does sound incongruously violent!
/app is a convention for where you put application code in production containers.
In Dockerfiles:
WORKDIR /app
COPY app.py .WORKDIR /app creates the directory and sets it as the current working directory. When the container starts, you're in /app by default.
It's not required—just a widely-adopted convention. You could use /code, /src, or anything else, but /app is the de facto standard.
This Q&A document captures the conceptual foundation of Docker from our workshop session.