Keep GPU keeps shared GPUs from being reclaimed while you prep data, debug, or coordinate multi-stage pipelines. It allocates just enough VRAM and issues lightweight CUDA work so schedulers observe an “active” device—without running a full training job.
- 🧾 License: MIT
- 📚 Docs: https://keepgpu.readthedocs.io
On many clusters, idle GPUs are reaped or silently shared after a short grace period. The cost of losing your reservation (or discovering another job has taken your card) can dwarf the cost of a tiny keep-alive loop. KeepGPU is a minimal, auditable guardrail:
- Predictable – Single-purpose controller with explicit resource knobs (VRAM size, interval, utilization backoff).
- Polite – Uses telemetry to read utilization and backs off when the GPU is busy or utilization is unavailable.
- Portable – Typer/Rich CLI for humans; Python API for orchestrators and notebooks.
- Observable – Structured logging and optional file logs for auditing what kept the GPU alive.
- Power-aware – Uses intervalled elementwise ops instead of heavy matmul floods to present “busy” utilization while keeping power and thermals lower (see
CudaGPUController._run_relu_batchfor the loop). - Telemetry-aware – GPU telemetry comes from
nvidia-ml-py(thepynvmlmodule), optionalrocm-smi, and best-effort MPS memory counters on Mac M series.
pip install keep-gpu
# Hold GPU 0 with 1 GiB VRAM and throttle if utilization exceeds 25%
keep-gpu --gpu-ids 0 --vram 1GiB --busy-threshold 25 --interval 60
# Non-blocking mode for agent workflows (auto-starts local service)
keep-gpu start --gpu-ids 0 --vram 1GiB --busy-threshold 25 --interval 60
keep-gpu status
keep-gpu stop --all
keep-gpu service-stopOpen the dashboard while service mode is running:
http://127.0.0.1:8765/
- CUDA (example: cu121)
pip install --index-url https://download.pytorch.org/whl/cu121 torch pip install keep-gpu
- ROCm (example: rocm6.1)
pip install --index-url https://download.pytorch.org/whl/rocm6.1 torch pip install keep-gpu[rocm]
- CPU-only
pip install torch pip install keep-gpu
- Mac M series (M1/M2/M3/M4)
Uses Metal Performance Shaders (MPS) backend on Apple Silicon.
pip install torch pip install keep-gpu[macm]
Flags that matter:
- Blocking mode knobs:
--vram(1GiB,750MB, or bare bytes like1073741824): how much memory to pin.--interval(finite positive seconds): sleep between keep-alive bursts.--busy-threshold: defaults to25;0..100skips work when telemetry reports higher utilization or cannot report utilization;-1disables utilization backoff.--gpu-ids: target unique non-negative visible device ordinals after user-supplied visibility filtering (CUDA_VISIBLE_DEVICESon CUDA,ROCR_VISIBLE_DEVICES/HIP_VISIBLE_DEVICES/CUDA_VISIBLE_DEVICESon ROCm); otherwise all visible GPUs are guarded. Empty, duplicate, or out-of-range selections are invalid, and startup fails if no GPUs resolve.
- Service mode commands:
keep-gpu serve: run local service (HTTP + dashboard).keep-gpu start: create keep session and return immediately.keep-gpu status: inspect tracked sessions, including in-progress or failed releases.keep-gpu stop --job-id <id>orkeep-gpu stop --all: release sessions.keep-gpu service-stop: stop the ownership-verified auto-started local daemon.keep-gpu list-gpus: fetch telemetry from local service. Each listedidis the visible ordinal accepted by--gpu-ids; optionalphysical_id/uuidfields are metadata only.status,stop, andlist-gpusprint structured JSON objects, including{"error": "..."}for service/runtime errors after CLI parsing succeeds, that tools such asjqcan parse directly.
from keep_gpu.single_gpu_controller.cuda_gpu_controller import CudaGPUController
with CudaGPUController(rank=0, interval=0.5, vram_to_keep="1GiB", busy_threshold=20):
preprocess_dataset() # GPU is marked busy while you run CPU-heavy code
train_model() # GPU freed after exiting the contextNeed multiple devices?
from keep_gpu.global_gpu_controller.global_gpu_controller import GlobalGPUController
with GlobalGPUController(gpu_ids=[0, 1], vram_to_keep="750MB", interval=90, busy_threshold=30):
run_pipeline_stage()Pass gpu_ids=None to use all visible GPUs. Explicit values are visible device
ordinals, not physical NVML/ROCm SMI IDs. Passing an empty, duplicate, or
out-of-range list is invalid, and startup raises an error if discovery resolves
to zero devices.
- Battle-tested keep-alive loop built on PyTorch.
- NVML-based utilization monitoring (by way of
nvidia-ml-py) to avoid hogging busy GPUs; optional ROCm SMI support by way ofpip install keep-gpu[rocm]. Public entry points defaultbusy_thresholdto25. Valid values are-1or0..100; if utilization is unavailable and the threshold is non-negative, KeepGPU sleeps before allocating keep tensors or running compute. CUDA utilization checks use visible CUDA ordinals, so withCUDA_VISIBLE_DEVICES=3,5, rank1reads NVML telemetry for physical GPU5; duplicate or ambiguous CUDA masks are treated as unavailable telemetry. ROCm utilization similarly resolves visible ranks throughROCR_VISIBLE_DEVICESand one matchingHIP_VISIBLE_DEVICES/CUDA_VISIBLE_DEVICESoverlay before querying ROCm SMI. Ambiguous mappings are treated as unavailable telemetry. - CLI + API parity: same controllers power both code paths.
- Continuous docs + CI: mkdocs + mkdocstrings build in CI to keep guidance up to date.
- Install dev extras:
pip install -e ".[dev]"(add.[rocm]if you need ROCm SMI). - Fast CUDA checks:
pytest tests/cuda_controller tests/global_controller tests/utilities/test_platform_manager.py tests/test_cli_thresholds.py - ROCm visibility tests use mocks and run without hardware; ROCm-only hardware tests carry
@pytest.mark.rocmand run withpytest --run-rocm tests/rocm_controller. - Markers:
rocm(needs ROCm stack) andlarge_memory(opt-in locally).
- Start an MCP server on stdin/stdout (default):
keep-gpu-mcp-server
- Or expose it over HTTP (JSON-RPC + REST + dashboard):
keep-gpu-mcp-server --mode http --host 0.0.0.0 --port 8765
- MCP clients use the standard
initialize,tools/list, andtools/callprotocol methods over stdio. KeepGPU exposes four tools:start_keep,stop_keep,status, andlist_gpus. - Legacy direct JSON-RPC method calls remain supported for scripts:
{"id": 1, "method": "start_keep", "params": {"gpu_ids": [0], "vram": "512MB", "interval": 60, "busy_threshold": 20}} - Stdio stdout is reserved for JSON protocol messages; diagnostics and logs are written to stderr.
- HTTP mode is KeepGPU's local JSON-RPC/REST/dashboard service. It accepts the
same JSON-RPC messages at
/rpc, but it is not a Streamable HTTP MCP endpoint. - REST examples:
curl http://127.0.0.1:8765/health curl http://127.0.0.1:8765/api/sessions
- Methods:
start_keep,stop_keep(optionaljob_id, default stops all),status(optionaljob_id),list_gpus(basic info). REST session creation accepts a JSON object body, not arrays or scalar values. Omittinggpu_idsuses all visible GPUs, and omittingbusy_thresholduses the eco-safe default25; explicit values must be unique visible ordinals in the service process environment.list_gpusreturns those same start-compatible ordinals asid/visible_id;physical_idanduuidare informational metadata, not valid substitutes forgpu_ids. Empty, duplicate, or out-of-range lists are invalid and startup fails if no GPUs resolve. Customjob_idvalues must be unique across active and starting sessions, and onlynull/omitted means generated or all-sessions; custom IDs must be non-empty strings containing only letters, digits,.,_,-, or~. Status responses include reserved jobs asstate="starting"while controller startup is still in progress. - Supported REST route/method failures remain machine-readable: validation
errors use JSON
400responses, unknown API routes use JSON404, and unexpected service/runtime failures use JSON500instead of closing the connection without a response. - Stop responses distinguish completed cleanup from partial cleanup:
stoppedmeans released, whiletimed_outsessions remain visible asstoppinguntil background cleanup completes andfailedsessions remain visible withstateandlast_error. - Status and stop requests both account for in-progress starts: status reports
them as
starting, and stop waits for startup to settle so a session is not reported as missing or skipped by stop-all. - Stop-all only covers sessions active or already starting when that request begins; later concurrent starts belong to a later stop request.
- Stop-all releases independent sessions concurrently and reports outcomes in
deterministic snapshot order with the same
stopped,timed_out,failed, anderrorsfields. - Dashboard GPU cards show the visible ordinal to type into the start form first, with physical/vendor metadata shown only as secondary context.
- Dashboard cards mirror lifecycle state so a retained session shows
ReleasingorRelease failedinstead of being presented as a fully active keepalive. - Dashboard:
http://127.0.0.1:8765/ - Mac M series limitations:
- GPU utilization monitoring is not available on macOS.
- The default
busy_threshold=25keeps MPS in conservative sleep-only mode; setbusy_threshold=-1to opt into unconditional keepalive compute. list-gpusreports best-effort MPS memory counters andnullfor unsupported telemetry fields.
- Minimal client config (stdio MCP):
servers: keepgpu: command: ["keep-gpu-mcp-server"] adapter: stdio
- Remote/SSH tunnel example (HTTP):
Use
keep-gpu-mcp-server --mode http --host 0.0.0.0 --port 8765
http://gpu-box.example.com:8765/for the dashboard andhttp://gpu-box.example.com:8765/rpcfor JSON-RPC scripts. For untrusted networks, put the server behind your own auth/reverse-proxy or tunnel by way of SSH (for example,ssh -L 8765:localhost:8765 gpu-box).
Contributions are welcome—especially around ROCm support, platform fallbacks, and scheduler-specific recipes. Open an issue or PR if you hit edge cases on your cluster. See docs/contributing.md for dev setup, test commands, and PR tips.
This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.
If you find KeepGPU useful in your research or work, please cite it as:
@software{Wangmerlyn_KeepGPU_2025,
author = {Wang, Siyuan and Shi, Yaorui and Liu, Yida and Yin, Yuqi},
title = {KeepGPU: a simple CLI app that keeps your GPUs running},
year = {2025},
publisher = {Zenodo},
doi = {10.5281/zenodo.17129114},
url = {https://github.com/Wangmerlyn/KeepGPU},
note = {GitHub repository},
keywords = {ai, hpc, gpu, cluster, cuda, torch, debug}
}