The only proxy you need for Claude Code CLI
Quick Start โข Features โข Web Dashboard โข Crosstalk โข Compression Stack โข Roadmap โข Changelog
Route Claude Code to any provider. Save 90% on API costs. Run locally for free.
The Ultimate Proxy sits between Claude Code CLI and your chosen API provider. It translates Anthropic's API format to OpenAI-compatible format, letting you use any model with Claude Code:
Claude Code CLI โ The Ultimate Proxy โ Any Provider
โโ OpenRouter
โโ Gemini / VibeProxy
โโ OpenAI
โโ Azure
โโ Ollama (local)
โโ LM Studio (local)
Single command installs everything with automatic GPU detection:
curl -fsSL https://raw.githubusercontent.com/aaaronmiller/claude-code-proxy/main/install-all.sh | bashThe installer will:
- ๐ Auto-detect your GPU (NVIDIA CUDA, Intel Arc/iGPU, AMD ROCm, or CPU-only)
- ๐ฌ Prompt you to confirm or override the detected GPU backend
- โ Clone claude-code-proxy
- โ Install Headroom (compression layer)
- โ Install RTK (command compression)
- โ Install GPU compute stack (Level Zero for Intel, CUDA for NVIDIA, ROCm for AMD)
- โ
Configure environment variables (
ONEAPI_DEVICE_SELECTOR,CUDA_VISIBLE_DEVICES, etc.) - โ
Add compression aliases (
cc,qw,qw-resume) - โ Start all services
- โ Show health status
Supported GPU backends:
| Backend | Hardware | Env Vars Set |
|---|---|---|
| NVIDIA CUDA | RTX 30/40/50 series, datacenter GPUs | CUDA_VISIBLE_DEVICES=0 |
| Intel Arc / iGPU | Arc A370M/A580/A770, Iris Xe, UHD | ONEAPI_DEVICE_SELECTOR=level_zero:0, LIBVA_DRIVER_NAME=iHD |
| AMD ROCm | RX 6000/7000, Instinct MI series | HSA_OVERRIDE_GFX_VERSION=10.3.0 |
| CPU-only | No GPU available | HEADROOM_DEVICE=cpu |
Manual installation:
# Clone & install
git clone https://github.com/aaaronmiller/claude-code-proxy.git ~/code/claude-code-proxy
cd ~/code/claude-code-proxy
./install-all.shIn a new terminal, run:
export ANTHROPIC_BASE_URL=http://localhost:8082
export ANTHROPIC_API_KEY=pass
claudeIf you want the proxy itself to require a client key, set PROXY_AUTH_KEY.
ANTHROPIC_API_KEY is for the Claude Code client side and no longer enables proxy auth by default.
๐ก Pro Tip: Add aliases to your
~/.zshrcfor easier access (see Setup Guide)
The easiest way to use Claude Code with premium models for free:
- Install VibeProxy: Download
- Authenticate: Sign in with Google (Antigravity OAuth)
- Setup:
python start_proxy.py --setupโ Select "VibeProxy/Antigravity"
What you get:
- ๐ง Claude Opus 4.5 with 128k thinking tokens
- โก Gemini 3 Pro/Flash
- ๐ BIG/MIDDLE/SMALL model routing
- ๐ Usage tracking & analytics
- ๐ธ No API keys or billing required
| Feature | Description |
|---|---|
| Multi-Provider | OpenRouter, Gemini, OpenAI, Azure, Ollama, LM Studio |
| Model Routing | BIG/MIDDLE/SMALL tiers with intelligent routing |
| Web Dashboard | Real-time monitoring with 2025 glassmorphism UI |
| Crosstalk | Model-to-model conversations (up to 8 models) |
| Usage Tracking | Cost analytics and token metrics |
| Extended Thinking | Up to 128k thinking tokens for reasoning models |
| Prompt Injection | Add custom prompts showing routing info |
| Model Cascade | Automatic fallback on provider errors |
Access at http://localhost:8082 when running.
- 2025 Glassmorphism UI with aurora gradients
- Real-time request monitoring
- Provider configuration
- Model selection with hybrid routing
- WebSocket live log streaming
- Profile management
Multi-model conversations where AI models talk to each other:
# Launch visual TUI
python start_proxy.py --crosstalk-studio
# Quick setup
python start_proxy.py --crosstalk "claude-opus,gemini-pro" --topic "Explore consciousness"Features:
- Up to 8 models in a conversation
- Multiple paradigms: relay, debate, memory, report
- Jinja templates for message formatting
- Backrooms-compatible import/export
- MCP integration for programmatic access
# Configuration
python start_proxy.py --setup # First-time wizard
python start_proxy.py --settings # Unified settings TUI
python start_proxy.py --doctor # Health check + auto-fix
# Model management
python start_proxy.py --select-models # Interactive model selector
python start_proxy.py --set-big MODEL # Quick set BIG model
python start_proxy.py --show-models # List available models
# Diagnostics
python start_proxy.py --config # Show configuration
python start_proxy.py --dry-run # Validate without starting
python start_proxy.py --analytics # View usage statsthe-ultimate-proxy/
โโโ start_proxy.py # Main entry point
โโโ .env # Configuration
โโโ CROSSTALK.md # Multi-model chat docs
โ
โโโ src/
โ โโโ core/ # Proxy core logic
โ โโโ api/ # FastAPI routes + WebSocket
โ โโโ services/ # Providers, models, prompts
โ โโโ cli/ # CLI tools and TUIs
โ
โโโ configs/crosstalk/ # Crosstalk presets & templates
โโโ web-ui/ # Svelte + bits-ui dashboard
โโโ docs/ # Extended documentation
401 Unauthorized
python start_proxy.py --doctor # Auto-fix API keysModel Not Found
python start_proxy.py --show-models # List available modelsConnection Refused
- Ensure proxy is running:
python start_proxy.py - Check port 8082 is not in use
NEW (April 2026): Integrated compression layer for 95-99% cost savings
# Start compression stack
cs-start
# Use Claude with auto-compression
csi # or csr to resume
# Quick stats
cs-stats-quick- 97% compression rate (900โ26 tokens)
- GPU acceleration โ NVIDIA CUDA, Intel Arc (Level Zero), AMD ROCm, or CPU-only
- Multi-CLI support (Claude, Qwen, Codex, OpenCode, OpenClaw, Hermes)
- Real-time dashboard (terminal + web at
:8899) - Multi-tier compression โ default (
:8787) + small model (:8790) - Semantic caching โ deduplicates repeated context across turns
- RTK command compression โ 2-line summaries for CLI output
Claude Code โ Proxy (:8082) โ Headroom (:8787) โ OpenRouter
โ
GPU: 92% VRAM
97% compression
- Full Guide: docs/COMPRESSION-STACK.md
- Performance Analysis: docs/PERFORMANCE-ANALYSIS.md
- GPU Optimization: docs/GPU-OPTIMIZATION.md
- Roadmap: ROADMAP.md
- Changelog: changelog.md
The installer handles this automatically:
- Intel Arc A370M (4GB) โ Level Zero runtime,
ONEAPI_DEVICE_SELECTOR=level_zero:0 - CPU-only โ fallback when no GPU is detected, runs entirely on CPU
- WSL2 passthrough โ GPU exposed via
/dev/dxgfrom Windows host
For manual configuration, see docs/GPU-OPTIMIZATION.md
See full roadmap: ROADMAP.md
- Single install script with GPU auto-detection (
install-all.shv2.0) - Unified start/stop commands
- Low-VRAM mode (Intel Arc A370M 4GB via Level Zero)
- No-GPU mode (CPU-only fallback)
- Network proxy access (HTTP/HTTPS)
- Compression monitoring dashboard (
:8899) - RTK integration (v0.34.3)
- Shared state management
- Unified health monitoring
- Log aggregation
- Headroom as proxy middleware
- Single unified proxy process
- Resource optimization
The Ultimate Proxy โข Made with โค๏ธ for the Claude Code community