🐳 docker-model-runner-lab

Run LLMs locally with Docker Model Runner. No cloud, no API keys, no data leaving your machine.

What is Docker Model Runner

Docker Model Runner (DMR) pulls and runs LLMs locally with the same workflow you already use for containers. Models are OCI artifacts on Docker Hub under the ai/ namespace, served through an OpenAI-compatible API, so your existing SDKs and tools work unchanged. Everything stays on your machine.

flowchart LR
    subgraph machine["Your machine"]
        app["Your app or curl<br/>localhost:12434"]
        subgraph desktop["Docker Desktop"]
            dmr["Docker Model Runner<br/>OpenAI-compatible API"]
            engine["llama.cpp / vLLM<br/>Metal, CUDA, Vulkan"]
            container["Containerized app<br/>model-runner.docker.internal"]
        end
    end
    hub[("Docker Hub<br/>ai/ models<br/>OCI artifacts")]

    app -->|"/engines/v1"| dmr
    container -->|"/engines/v1"| dmr
    dmr --> engine
    dmr -.->|"docker model pull"| hub

Quickstart in 3 commands

Prerequisite: Docker Desktop 4.40+ with Model Runner enabled (Settings -> AI -> Enable Docker Model Runner, plus host-side TCP support).

docker model pull ai/gemma3
docker model run ai/gemma3 "Hello in one sentence"
curl http://localhost:12434/engines/v1/models

The lab

Five hands-on modules, from zero to a Blazor web chat backed by a local model. Each one is self-contained.

Module	What you build	Time
01-quickstart	Pull, list and run a model with idempotent bash and PowerShell scripts	5 min
02-openai-api	Raw API calls with curl and the VS Code REST Client, including streaming	5 min
03-dotnet-chat	A .NET 10 streaming chat console app using the official OpenAI SDK	10 min
04-compose	Model + web API provisioned together with the Compose `models` element	10 min
05-blazor-chat	A .NET 10 Blazor Server web UI that streams chat responses from the model	10 min

All examples default to ai/gemma3 and accept a MODEL environment variable to swap models without touching code.

Every module was tested end to end with .NET 10 and ai/gemma3 on Docker Desktop, running on the llama.cpp backend.

Running in Docker

The model and the app run as containers side by side. The model is provisioned by Docker Model Runner and the apps reach it at model-runner.docker.internal, all from a single docker compose up.

The web apps are also published as ready-to-run images on GitHub Container Registry, built by the publish workflow:

Image	Module	Pull
	04-compose	`docker pull ghcr.io/ppiova/docker-model-runner-lab/compose-api:latest`
	05-blazor-chat	`docker pull ghcr.io/ppiova/docker-model-runner-lab/blazor-chat:latest`

Endpoints

Where your code runs	Base URL
On the host	`http://localhost:12434/engines/v1`
Inside a container	`http://model-runner.docker.internal/engines/v1`

No API key is required. If your client library demands one, pass any placeholder string.

`docker model` cheat sheet

Command	What it does
`docker model pull ai/gemma3`	Download a model from Docker Hub
`docker model list`	List local models
`docker model run ai/gemma3 "prompt"`	One-shot prompt
`docker model run ai/gemma3`	Interactive chat (`/bye` to exit)
`docker model ps`	Show models loaded in memory
`docker model status`	Check the runner and its backends
`docker model inspect ai/gemma3`	Show model details
`docker model rm ai/gemma3`	Remove a local model

llama.cpp vs vLLM

DMR ships two inference engines:

	llama.cpp	vLLM
Model format	GGUF	Safetensors
Sweet spot	Laptops, CPU or mixed CPU/GPU, single user	High-throughput GPU serving, many concurrent requests
GPU support	Metal, CUDA, Vulkan	CUDA (Linux)

Rule of thumb: start with llama.cpp for local development, reach for vLLM when you need throughput at scale.

Troubleshooting

docker: 'model' is not a docker command

Your Docker Desktop is older than 4.40. Update it, then enable Model Runner under Settings -> AI.

Docker Model Runner is not running

Enable it from the CLI with docker desktop enable model-runner, or in Docker Desktop under Settings -> AI -> Enable Docker Model Runner.

Connection refused on localhost:12434

Host-side TCP support is disabled. Turn it on in Docker Desktop under Settings -> AI (Enable host-side TCP support), keeping the default port 12434.

The first response takes a long time

The model loads into memory on the first request. Subsequent requests are much faster. Check what is loaded with docker model ps.

Resources

Pablo Piovano | Docker Captain · Microsoft MVP

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.github/workflows		.github/workflows
01-quickstart		01-quickstart
02-openai-api		02-openai-api
03-dotnet-chat		03-dotnet-chat
04-compose		04-compose
05-blazor-chat		05-blazor-chat
assets		assets
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🐳 docker-model-runner-lab

What is Docker Model Runner

Quickstart in 3 commands

The lab

Running in Docker

Endpoints

`docker model` cheat sheet

llama.cpp vs vLLM

Troubleshooting

Resources

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🐳 docker-model-runner-lab

What is Docker Model Runner

Quickstart in 3 commands

The lab

Running in Docker

Endpoints

docker model cheat sheet

llama.cpp vs vLLM

Troubleshooting

Resources

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`docker model` cheat sheet

Packages