Skip to content

aaaronmiller/claude-code-proxy

ย 
ย 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

227 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

The Ultimate Proxy

โšก The Ultimate Proxy

The only proxy you need for Claude Code CLI

Python 3.9+ License: MIT 2025 Ready

Quick Start โ€ข Features โ€ข Web Dashboard โ€ข Crosstalk โ€ข Compression Stack โ€ข Roadmap โ€ข Changelog


The Ultimate Proxy Dashboard

Route Claude Code to any provider. Save 90% on API costs. Run locally for free.


๐ŸŒŸ What Is It?

The Ultimate Proxy sits between Claude Code CLI and your chosen API provider. It translates Anthropic's API format to OpenAI-compatible format, letting you use any model with Claude Code:

Claude Code CLI  โ†’  The Ultimate Proxy  โ†’  Any Provider
                                           โ”œโ”€ OpenRouter
                                           โ”œโ”€ Gemini / VibeProxy
                                           โ”œโ”€ OpenAI
                                           โ”œโ”€ Azure
                                           โ”œโ”€ Ollama (local)
                                           โ””โ”€ LM Studio (local)

๐Ÿš€ Quick Start

Option 1: Unified Installation (Recommended)

Single command installs everything with automatic GPU detection:

curl -fsSL https://raw.githubusercontent.com/aaaronmiller/claude-code-proxy/main/install-all.sh | bash

The installer will:

  • ๐Ÿ” Auto-detect your GPU (NVIDIA CUDA, Intel Arc/iGPU, AMD ROCm, or CPU-only)
  • ๐Ÿ’ฌ Prompt you to confirm or override the detected GPU backend
  • โœ… Clone claude-code-proxy
  • โœ… Install Headroom (compression layer)
  • โœ… Install RTK (command compression)
  • โœ… Install GPU compute stack (Level Zero for Intel, CUDA for NVIDIA, ROCm for AMD)
  • โœ… Configure environment variables (ONEAPI_DEVICE_SELECTOR, CUDA_VISIBLE_DEVICES, etc.)
  • โœ… Add compression aliases (cc, qw, qw-resume)
  • โœ… Start all services
  • โœ… Show health status

Supported GPU backends:

Backend Hardware Env Vars Set
NVIDIA CUDA RTX 30/40/50 series, datacenter GPUs CUDA_VISIBLE_DEVICES=0
Intel Arc / iGPU Arc A370M/A580/A770, Iris Xe, UHD ONEAPI_DEVICE_SELECTOR=level_zero:0, LIBVA_DRIVER_NAME=iHD
AMD ROCm RX 6000/7000, Instinct MI series HSA_OVERRIDE_GFX_VERSION=10.3.0
CPU-only No GPU available HEADROOM_DEVICE=cpu

Manual installation:

# Clone & install
git clone https://github.com/aaaronmiller/claude-code-proxy.git ~/code/claude-code-proxy
cd ~/code/claude-code-proxy
./install-all.sh

Using with Claude Code

In a new terminal, run:

export ANTHROPIC_BASE_URL=http://localhost:8082
export ANTHROPIC_API_KEY=pass
claude

If you want the proxy itself to require a client key, set PROXY_AUTH_KEY. ANTHROPIC_API_KEY is for the Claude Code client side and no longer enables proxy auth by default.

๐Ÿ’ก Pro Tip: Add aliases to your ~/.zshrc for easier access (see Setup Guide)


๐ŸŒŒ VibeProxy + Antigravity (Free Premium Models)

The easiest way to use Claude Code with premium models for free:

  1. Install VibeProxy: Download
  2. Authenticate: Sign in with Google (Antigravity OAuth)
  3. Setup: python start_proxy.py --setup โ†’ Select "VibeProxy/Antigravity"

What you get:

  • ๐Ÿง  Claude Opus 4.5 with 128k thinking tokens
  • โšก Gemini 3 Pro/Flash
  • ๐Ÿ“Š BIG/MIDDLE/SMALL model routing
  • ๐Ÿ“ˆ Usage tracking & analytics
  • ๐Ÿ’ธ No API keys or billing required

โœจ Features

Feature Description
Multi-Provider OpenRouter, Gemini, OpenAI, Azure, Ollama, LM Studio
Model Routing BIG/MIDDLE/SMALL tiers with intelligent routing
Web Dashboard Real-time monitoring with 2025 glassmorphism UI
Crosstalk Model-to-model conversations (up to 8 models)
Usage Tracking Cost analytics and token metrics
Extended Thinking Up to 128k thinking tokens for reasoning models
Prompt Injection Add custom prompts showing routing info
Model Cascade Automatic fallback on provider errors

๐ŸŽจ Web Dashboard

Access at http://localhost:8082 when running.

  • 2025 Glassmorphism UI with aurora gradients
  • Real-time request monitoring
  • Provider configuration
  • Model selection with hybrid routing
  • WebSocket live log streaming
  • Profile management

๐Ÿ—ฃ๏ธ Crosstalk

Multi-model conversations where AI models talk to each other:

# Launch visual TUI
python start_proxy.py --crosstalk-studio

# Quick setup
python start_proxy.py --crosstalk "claude-opus,gemini-pro" --topic "Explore consciousness"

Features:

  • Up to 8 models in a conversation
  • Multiple paradigms: relay, debate, memory, report
  • Jinja templates for message formatting
  • Backrooms-compatible import/export
  • MCP integration for programmatic access

Read the Crosstalk Guide โ†’


๐Ÿ› ๏ธ CLI Commands

# Configuration
python start_proxy.py --setup           # First-time wizard
python start_proxy.py --settings        # Unified settings TUI
python start_proxy.py --doctor          # Health check + auto-fix

# Model management
python start_proxy.py --select-models   # Interactive model selector
python start_proxy.py --set-big MODEL   # Quick set BIG model
python start_proxy.py --show-models     # List available models

# Diagnostics
python start_proxy.py --config          # Show configuration
python start_proxy.py --dry-run         # Validate without starting
python start_proxy.py --analytics       # View usage stats

๐Ÿ“ Project Structure

the-ultimate-proxy/
โ”œโ”€โ”€ start_proxy.py          # Main entry point
โ”œโ”€โ”€ .env                    # Configuration
โ”œโ”€โ”€ CROSSTALK.md           # Multi-model chat docs
โ”‚
โ”œโ”€โ”€ src/
โ”‚   โ”œโ”€โ”€ core/              # Proxy core logic
โ”‚   โ”œโ”€โ”€ api/               # FastAPI routes + WebSocket
โ”‚   โ”œโ”€โ”€ services/          # Providers, models, prompts
โ”‚   โ””โ”€โ”€ cli/               # CLI tools and TUIs
โ”‚
โ”œโ”€โ”€ configs/crosstalk/     # Crosstalk presets & templates
โ”œโ”€โ”€ web-ui/                # Svelte + bits-ui dashboard
โ””โ”€โ”€ docs/                  # Extended documentation

๐Ÿ› Troubleshooting

401 Unauthorized

python start_proxy.py --doctor  # Auto-fix API keys

Model Not Found

python start_proxy.py --show-models  # List available models

Connection Refused

  • Ensure proxy is running: python start_proxy.py
  • Check port 8082 is not in use

๐Ÿ—œ๏ธ Compression Stack

NEW (April 2026): Integrated compression layer for 95-99% cost savings

Quick Start

# Start compression stack
cs-start

# Use Claude with auto-compression
csi  # or csr to resume

# Quick stats
cs-stats-quick

Features

  • 97% compression rate (900โ†’26 tokens)
  • GPU acceleration โ€” NVIDIA CUDA, Intel Arc (Level Zero), AMD ROCm, or CPU-only
  • Multi-CLI support (Claude, Qwen, Codex, OpenCode, OpenClaw, Hermes)
  • Real-time dashboard (terminal + web at :8899)
  • Multi-tier compression โ€” default (:8787) + small model (:8790)
  • Semantic caching โ€” deduplicates repeated context across turns
  • RTK command compression โ€” 2-line summaries for CLI output

Architecture

Claude Code โ†’ Proxy (:8082) โ†’ Headroom (:8787) โ†’ OpenRouter
                           โ†“
                    GPU: 92% VRAM
                    97% compression

Documentation

Low-VRAM / No-GPU Support

The installer handles this automatically:

  • Intel Arc A370M (4GB) โ€” Level Zero runtime, ONEAPI_DEVICE_SELECTOR=level_zero:0
  • CPU-only โ€” fallback when no GPU is detected, runs entirely on CPU
  • WSL2 passthrough โ€” GPU exposed via /dev/dxg from Windows host

For manual configuration, see docs/GPU-OPTIMIZATION.md


๐Ÿ”ฎ Roadmap

See full roadmap: ROADMAP.md

Phase 1: Parallel Installation (April 2026) โœ… COMPLETE

  • Single install script with GPU auto-detection (install-all.sh v2.0)
  • Unified start/stop commands
  • Low-VRAM mode (Intel Arc A370M 4GB via Level Zero)
  • No-GPU mode (CPU-only fallback)
  • Network proxy access (HTTP/HTTPS)
  • Compression monitoring dashboard (:8899)
  • RTK integration (v0.34.3)

Phase 2: Tight Integration (May 2026)

  • Shared state management
  • Unified health monitoring
  • Log aggregation

Phase 3: Full Merger (Q3 2026)

  • Headroom as proxy middleware
  • Single unified proxy process
  • Resource optimization

The Ultimate Proxy โ€ข Made with โค๏ธ for the Claude Code community

Report Bug โ€ข Request Feature

About

ENHANCED Claude Code to OpenAI API Proxy

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 72.9%
  • Svelte 13.8%
  • Shell 6.5%
  • HTML 2.5%
  • CSS 2.5%
  • JavaScript 1.3%
  • Other 0.5%