docs: major update - models, DOSafe detection, changelog, smart routing

JOY · claude · JOY · commit e7e29169cd24 · 2026-04-12T21:05:03.000+07:00
- Fix changelog.md case (CHANGELOG.md -&gt; changelog.md) for GitBook Linux
- Add 2026-04-09 to 2026-04-12 changelog entries (DOSRouter, cache-aware routing, logout fix)
- Update model catalog: add Llama 4 Maverick/Scout, dos-auto smart routing, embeddings
- Update pricing to include all current models
- Update DOSafe overview: add video/audio detection, face/voice verification endpoints
- Update DOSafe partner-api: add detect-video, detect-audio endpoints with schemas, update data source stats (1.2M -&gt; 3.93M, 11 -&gt; 19 scrapers)
- Rewrite README: add DOSClaw agents, DOSafe section, smart routing, full model table

Co-Authored-By: Claude Opus 4.6 &lt;noreply@anthropic.com&gt;
diff --git a/README.md b/README.md
@@ -1,12 +1,13 @@
 # DOS AI
 
-**Fast, affordable AI inference for open-source models.**
+**Fast, affordable AI inference and agent platform for open-source models.**
 
-DOS AI is an inference platform that lets you run leading open-source language models through a simple, OpenAI-compatible API. No GPU management, no infrastructure headaches -- just an API key and a few lines of code.
+DOS AI is an inference platform that lets you run leading open-source language models through a simple, OpenAI-compatible API. Deploy AI agents with DOSClaw, protect your users with DOSafe, and route intelligently with smart model selection -- all from a single platform.
 
 ## Why DOS AI?
 
 - **OpenAI-compatible** -- Swap your base URL and you're done. Works with the OpenAI Python SDK, Node.js SDK, LangChain, LlamaIndex, and any HTTP client.
+- **Smart routing** -- Use `dos-auto` to let our 15-dimension classifier pick the best model for each request automatically.
 - **Low latency** -- Models served on dedicated GPUs with optimized inference (vLLM). No cold starts, no queues.
 - **Pay-as-you-go** -- Only pay for the tokens you use. Every new account gets **$5 in free credits** to get started.
 - **Open-source models** -- Access the best open-source models without managing your own infrastructure.
@@ -24,7 +25,7 @@ client = OpenAI(
 )
 
 response = client.chat.completions.create(
-    model="dos-ai",
+    model="dos-auto",  # Smart routing picks the best model
     messages=[
         {"role": "user", "content": "Explain quantum computing in one paragraph."}
     ],
@@ -33,24 +34,58 @@ response = client.chat.completions.create(
 print(response.choices[0].message.content)
 ```
 
-## Documentation
+## Platform
+
+### LLM Inference API
+
+OpenAI-compatible API with smart routing, streaming, function calling, and structured outputs.
 
 | Section | Description |
 | --- | --- |
 | [Quickstart](getting-started/quickstart.md) | Create an account, get an API key, and make your first request |
 | [Authentication](getting-started/authentication.md) | API key management, rate limits, and security best practices |
+| [Available Models](models/available-models.md) | Full model catalog with pricing |
 | [OpenAI Compatibility](getting-started/openai-compatibility.md) | Migration guide and compatibility details |
 
+### DOSClaw Agents
+
+Deploy AI agents powered by [OpenClaw](https://github.com/nicejoy/openclaw) with Telegram, Discord, and WhatsApp integration. Each agent runs in its own container with web search, memory, video/music generation, and 5,000+ installable skills.
+
+- Create agents from the [dashboard](https://app.dos.ai/agents)
+- Choose from templates: Personal Assistant, Sales, Customer Support, Content Creator
+- Credit-based pricing with a free trial
+
+### DOSafe
+
+Safety and threat intelligence engine with AI detection capabilities.
+
+| Feature | Description |
+| --- | --- |
+| [Entity/URL Check](dosafe/overview.md) | Risk assessment against 3.93M+ threat intelligence entries |
+| [AI Text Detection](dosafe/partner-api.md) | Detect AI-generated text |
+| [AI Image Detection](dosafe/partner-api.md) | Detect AI-generated or manipulated images |
+| [AI Video Detection](dosafe/partner-api.md) | 7-layer pipeline for AI video detection |
+| [AI Audio Detection](dosafe/partner-api.md) | Detect AI-generated speech and voice clones |
+| [Face/Voice Verification](dosafe/partner-api.md) | Liveness detection and biometric matching |
+
 ## Available models
 
-| Model ID | Base model | Context length | Pricing |
+| Model ID | Base model | Context | Pricing |
 | --- | --- | --- | --- |
-| `dos-ai` | Qwen3.5-35B-A3B | 32,768 tokens | See [dashboard](https://app.dos.ai) |
+| `dos-auto` | Smart routing (auto-select) | varies | varies |
+| `dos-ai` | Qwen3.5-35B-A3B | 128K | $0.15 / 1M tokens |
+| `llama-4-maverick` | Llama 4 Maverick 17B-128E | 1M | $0.17 / 1M input |
+| `llama-4-scout` | Llama 4 Scout 17B-16E | 640K | $0.11 / 1M input |
+| `deepseek-v3` | DeepSeek V3 | 128K | $0.25 / 1M tokens |
+| `llama-3.3-70b` | Llama 3.3 70B | 128K | $0.20 / 1M tokens |
+| `llama-3.1-8b` | Llama 3.1 8B | 128K | $0.05 / 1M tokens |
 
-More models are added regularly. Check the [models endpoint](https://api.dos.ai/v1/models) or your dashboard for the latest list.
+More models are added regularly. Check the [catalog endpoint](https://api.dos.ai/v1/catalog) or the [dashboard](https://app.dos.ai/models) for the latest list.
 
 ## Links
 
 - **Dashboard**: [app.dos.ai](https://app.dos.ai)
 - **API base URL**: `https://api.dos.ai/v1`
+- **DOSafe**: [dosafe.io](https://dosafe.io)
 - **Status**: [status.dos.ai](https://status.dos.ai)
+- **Community**: [Telegram](https://t.me/dosai_community) | [Discord](https://discord.gg/dosai)
diff --git a/changelog.md b/changelog.md
@@ -9,6 +9,18 @@ Products: `dosclaw`, `dashboard`, `gateway`, `dosafe`, `inference`
 
 ---
 
+## 2026-04-12
+
+- **feature** [gateway] Cache-Aware Sticky Routing -- DOSRouter pins model to session when context exceeds 3K tokens (single message) or 5K tokens (cumulative) to maximize provider-side prefix cache hits; sticky TTL is per-provider (5min for API providers, 10min for self-hosted vLLM)
+- **feature** [gateway] Per-Provider Cache TTL -- Sticky routing TTL matches each provider's prefix cache lifetime: Anthropic/OpenAI/DeepSeek (5 min), vLLM/self-hosted (10 min); configurable via `providerCacheTTLMs` map
+- **fix** [dashboard] Cross-Account Logout Loop -- Logout now passes `prompt=login` to id.dos.me to force login form display instead of auto-SSO, preventing cross-account session loops
+
+## 2026-04-11
+
+- **feature** [gateway] DOSRouter Upstream Sync to v0.12.146 -- 17/19 ClawRouter releases ported; includes usage cost breakdown, eco/premium tier fallback, session pinning, agentic 3-state, model roster updates
+- **feature** [gateway] DOSRouter Full Port Expansion -- Wallet module (EVM + Solana), payment module (x402 protocol), image generation endpoint, full CLI (serve, classify, models, stats, logs, cache, report, wallet, chain, doctor)
+- **feature** [gateway] DOSRouter Open-Sourced -- Standalone Go LLM router at github.com/DOS/DOSRouter with 15-dimension scoring, tier-based routing, structured fallback chains
+
 ## 2026-04-08
 
 - **feature** [dosclaw] OpenClaw v2026.4.5 — Major engine upgrade with video/music generation, enhanced memory, and improved channel experience
diff --git a/dosafe/overview.md b/dosafe/overview.md
@@ -9,6 +9,9 @@ DOSafe is the safety and threat intelligence engine for the DOS ecosystem. It ag
 - **URL check** -- Analyze a URL for phishing, scam, and malware indicators.
 - **AI text detection** -- Determine whether a piece of text was generated by AI.
 - **AI image detection** -- Determine whether an image was generated or manipulated by AI.
+- **AI video detection** -- Analyze video for AI-generated content using a 7-layer pipeline (frame analysis, temporal consistency, audio-visual sync, LLM visual reasoning).
+- **AI audio detection** -- Detect AI-generated speech and voice clones using BEATs + mHuBERT ensemble (AUROC 0.88).
+- **Face verification** -- Liveness detection and face matching for identity verification.
 
 ## Supported Entity Types
 
@@ -33,6 +36,12 @@ All DOSafe endpoints use the base URL `https://api.dos.ai/v1/dosafe`.
 | POST | `/v1/dosafe/url-check` | URL/domain safety check |
 | POST | `/v1/dosafe/detect` | AI text detection |
 | POST | `/v1/dosafe/detect-image` | AI image detection |
+| POST | `/v1/dosafe/detect-video` | AI video detection |
+| POST | `/v1/dosafe/detect-audio` | AI audio/voice detection |
+| POST | `/v1/dosafe/face/enroll` | Face enrollment for verification |
+| POST | `/v1/dosafe/face/verify` | Face liveness + match verification |
+| POST | `/v1/dosafe/voice/enroll` | Voice enrollment for speaker ID |
+| POST | `/v1/dosafe/voice/verify` | Voice speaker verification |
 
 ## Authentication
 
diff --git a/dosafe/partner-api.md b/dosafe/partner-api.md
@@ -8,15 +8,16 @@
 
 ## Overview
 
-The DOSafe API is the unified safety gateway for the DOS ecosystem. A single API key grants access to all DOSafe services — entity/URL safety checks, AI text/image detection, and community reporting — with scopes controlling which capabilities are available.
+The DOSafe API is the unified safety gateway for the DOS ecosystem. A single API key grants access to all DOSafe services — entity/URL safety checks, AI text/image/video/audio detection, face and voice verification, and community reporting — with scopes controlling which capabilities are available.
 
 ### Data Sources (Safety Check)
 
 | Source | Weight | Description |
 |--------|--------|-------------|
-| DOSafe DB | Highest | 1.2M+ entries from 11 scrapers (phishing, scam, malware, wallets) |
+| DOSafe DB | Highest | 3.93M+ entries from 19 scrapers (phishing, scam, malware, wallets) |
 | DOS Chain | High | Immutable on-chain attestations via EAS |
 | DOS.Me Identity | Moderate | Member trust score, verified providers, flagged status |
+| Web Analysis | Moderate | Real-time web search + LLM-powered risk analysis |
 
 **Architecture:** DOSafe is the safety engine and public gateway. DOS.Me is an identity data provider — external services call DOSafe, not DOS.Me.
 
@@ -49,9 +50,11 @@ Keys are stored as SHA-256 hashes in `dosafe.api_keys`. Plaintext is never persi
 | `check` | `POST /check` |
 | `bulk` | `POST /check/bulk` |
 | `report` | `POST /report` |
-| `detect` | `POST /detect`, `POST /detect-image` |
+| `detect` | `POST /detect`, `POST /detect-image`, `POST /detect-video`, `POST /detect-audio` |
 | `url-check` | `POST /url-check` |
 | `entity-check` | `POST /entity-check` |
+| `face` | `POST /face/enroll`, `POST /face/verify` |
+| `voice` | `POST /voice/enroll`, `POST /voice/verify` |
 
 A key can have multiple scopes. Contact the DOSafe team to provision a key with required scopes.
 
@@ -273,6 +276,59 @@ AI image detection. Combines C2PA, EXIF/DCT metadata, reverse image search, and
 
 ---
 
+### `POST /detect-video`
+
+**Scope:** `detect`
+
+AI video detection. Uses a 7-layer pipeline: frame-level AI detection, temporal consistency analysis, audio-visual synchronization, and LLM visual reasoning.
+
+**Request:** `multipart/form-data` with `video` field (MP4/MOV/WEBM, max 100MB), or JSON `{ "url": "..." }`.
+
+**Response:**
+```json
+{
+  "aiProbability": 78,
+  "verdict": "AI",
+  "confidence": "medium",
+  "signals": {
+    "frameAnalysis": 0.82,
+    "temporalConsistency": 0.71,
+    "audioSync": 0.65,
+    "llmVisual": 0.85
+  },
+  "framesAnalyzed": 24,
+  "duration": 15.2
+}
+```
+
+---
+
+### `POST /detect-audio`
+
+**Scope:** `detect`
+
+AI audio/voice detection. BEATs + mHuBERT ensemble for detecting AI-generated speech and voice clones.
+
+**Request:** `multipart/form-data` with `audio` field (WAV/MP3/OGG/FLAC, max 50MB), or JSON `{ "url": "..." }`.
+
+**Response:**
+```json
+{
+  "aiProbability": 91,
+  "verdict": "AI",
+  "confidence": "high",
+  "signals": {
+    "beats": 0.93,
+    "mhubert": 0.89,
+    "ensemble": 0.91
+  },
+  "hasSpeech": true,
+  "duration": 8.5
+}
+```
+
+---
+
 ### `POST /url-check`
 
 **Scope:** `url-check`
diff --git a/models/available-models.md b/models/available-models.md
@@ -1,35 +1,69 @@
 # Available Models
 
-DOS AI serves high-quality open-source LLMs via an OpenAI-compatible API. All models run on dedicated RTX Pro 6000 GPUs with 96 GB VRAM, ensuring fast inference and low latency from our Asia-Southeast 1 region.
+DOS AI serves high-quality open-source LLMs via an OpenAI-compatible API. Self-hosted models run on dedicated RTX Pro 6000 GPUs with 96 GB VRAM in Asia-Southeast 1. Cloud models are served via partner providers for maximum coverage.
+
+## Smart Routing
+
+Use `dos-auto` as the model ID to let DOS AI automatically select the best model for each request. Smart routing uses a 15-dimension classifier to analyze your prompt and route to the optimal model based on task complexity, cost, and latency.
+
+```python
+response = client.chat.completions.create(
+    model="dos-auto",  # Smart routing picks the best model
+    messages=[{"role": "user", "content": "..."}],
+)
+```
 
 ## Model Catalog
 
-| Model | Provider | Type | Context Window | Input Price | Output Price |
-| ----- | -------- | ---- | -------------- | ----------- | ------------ |
-| **Qwen3.5-35B-A3B** | Alibaba | Chat | 128K tokens | $0.15 / 1M tokens | $0.15 / 1M tokens |
-| **Llama 3.3 70B** | Meta | Chat | 128K tokens | $0.20 / 1M tokens | $0.20 / 1M tokens |
-| **DeepSeek V3** | DeepSeek | Chat | 128K tokens | $0.25 / 1M tokens | $0.25 / 1M tokens |
-| **Llama 3.1 8B** | Meta | Chat | 128K tokens | $0.05 / 1M tokens | $0.05 / 1M tokens |
+### Self-Hosted (Lowest Latency)
+
+| Model | Provider | Context | Input | Output | Model ID |
+| ----- | -------- | ------- | ----- | ------ | -------- |
+| **Qwen3.5-35B-A3B** | Alibaba | 128K | $0.15 / 1M | $0.15 / 1M | `dos-ai` |
+
+### Cloud Models
+
+| Model | Provider | Context | Input | Output | Model ID |
+| ----- | -------- | ------- | ----- | ------ | -------- |
+| **Llama 4 Maverick 17B-128E** | Meta / DeepInfra | 1M | $0.17 / 1M | $0.66 / 1M | `llama-4-maverick` |
+| **Llama 4 Scout 17B-16E** | Meta / DeepInfra | 640K | $0.11 / 1M | $0.38 / 1M | `llama-4-scout` |
+| **DeepSeek V3** | DeepSeek | 128K | $0.25 / 1M | $0.25 / 1M | `deepseek-v3` |
+| **Llama 3.3 70B** | Meta | 128K | $0.20 / 1M | $0.20 / 1M | `llama-3.3-70b` |
+| **Llama 3.1 8B** | Meta | 128K | $0.05 / 1M | $0.05 / 1M | `llama-3.1-8b` |
 
-> All prices are in USD. See [Pricing](pricing.md) for details on billing, free tier, and volume discounts.
+> All prices are in USD. The catalog is DB-driven -- new models are added regularly. Check `GET /v1/catalog` or the [dashboard](https://app.dos.ai/models) for the latest list. See [Pricing](pricing.md) for billing details.
+
+### Embedding Models
+
+| Model | Provider | Dimensions | Model ID |
+| ----- | -------- | ---------- | -------- |
+| **Qwen3-Embedding-4B AWQ** | Alibaba / Self-hosted | 2560 | `qwen3-embedding-4b` |
 
 ## Model Details
 
-### Qwen3.5-35B-A3B
+### Qwen3.5-35B-A3B (default)
 
 Alibaba's Mixture-of-Experts model with 35 billion total parameters and 3 billion active parameters per forward pass. This architecture delivers excellent quality at remarkably low cost and latency, making it our **recommended default model** for most use cases.
 
 - **Best for**: General-purpose chat, code generation, reasoning, multilingual tasks
 - **Strengths**: Outstanding cost-efficiency, fast response times, strong multilingual support (especially CJK languages)
 - **Model ID**: `dos-ai`
 
-### Llama 3.3 70B
+### Llama 4 Maverick 17B-128E
 
-Meta's flagship 70-billion-parameter dense model. Offers top-tier reasoning and instruction-following capabilities.
+Meta's latest Mixture-of-Experts model with 17 billion active parameters and 128 experts. Strong reasoning and multilingual capabilities with an industry-leading 1 million token context window.
 
-- **Best for**: Complex reasoning, long-form content, detailed analysis
-- **Strengths**: Strong English performance, excellent instruction following, robust safety tuning
-- **Model ID**: `llama-3.3-70b`
+- **Best for**: Complex reasoning, long-context analysis, multilingual tasks
+- **Strengths**: Massive context window, strong benchmark scores, efficient MoE architecture
+- **Model ID**: `llama-4-maverick`
+
+### Llama 4 Scout 17B-16E
+
+Meta's efficient MoE model with 17 billion active parameters and 16 experts. Fast and cost-effective for everyday tasks with a 640K context window.
+
+- **Best for**: Everyday tasks, fast responses, cost-sensitive workloads
+- **Strengths**: Good balance of speed and quality, large context window
+- **Model ID**: `llama-4-scout`
 
 ### DeepSeek V3
 
@@ -39,6 +73,14 @@ DeepSeek's latest Mixture-of-Experts model, known for strong performance across
 - **Strengths**: Competitive benchmark scores, good at structured/JSON output, strong code capabilities
 - **Model ID**: `deepseek-v3`
 
+### Llama 3.3 70B
+
+Meta's 70-billion-parameter dense model. Offers top-tier reasoning and instruction-following capabilities.
+
+- **Best for**: Complex reasoning, long-form content, detailed analysis
+- **Strengths**: Strong English performance, excellent instruction following, robust safety tuning
+- **Model ID**: `llama-3.3-70b`
+
 ### Llama 3.1 8B
 
 Meta's efficient 8-billion-parameter model. An excellent choice when you need fast, affordable responses and the task does not require the full capability of a larger model.
@@ -51,11 +93,13 @@ Meta's efficient 8-billion-parameter model. An excellent choice when you need fa
 
 | Use Case | Recommended Model | Why |
 | -------- | ----------------- | --- |
+| Let DOS AI decide | `dos-auto` | Smart routing picks the best model per request |
 | General assistant / chatbot | Qwen3.5-35B-A3B | Best balance of quality, speed, and cost |
-| Complex analysis / long documents | Llama 3.3 70B | Strongest reasoning for demanding tasks |
+| Long-context analysis (100K+ tokens) | Llama 4 Maverick | 1M context window, strong reasoning |
+| Complex reasoning / analysis | Llama 3.3 70B | Dense model, top reasoning capability |
 | Code generation / math | DeepSeek V3 | Top coding and math benchmark scores |
 | High-volume / low-cost tasks | Llama 3.1 8B | Fastest and cheapest option |
-| Multilingual (especially Asian languages) | Qwen3.5-35B-A3B | Superior CJK language performance |
+| Multilingual (CJK languages) | Qwen3.5-35B-A3B | Superior CJK language performance |
 
 ## Listing Models via API
 
@@ -66,8 +110,11 @@ curl https://api.dos.ai/v1/models \
   -H "Authorization: Bearer YOUR_API_KEY"
 ```
 
-See the [Models API reference](../api-reference/models.md) for the full response schema.
+For the full retail catalog with pricing and metadata:
 
-## Coming Soon
+```bash
+curl https://api.dos.ai/v1/catalog \
+  -H "Authorization: Bearer YOUR_API_KEY"
+```
 
-We are continuously evaluating and adding new models. Upcoming additions may include vision models, embedding models, and larger reasoning models. Check back regularly or follow our announcements for updates.
+See the [Models API reference](../api-reference/models.md) for the full response schema.
diff --git a/models/pricing.md b/models/pricing.md
@@ -21,11 +21,15 @@ Pricing is calculated per **1 million tokens** (both input and output).
 
 | Model | Input Price (per 1M tokens) | Output Price (per 1M tokens) |
 | ----- | --------------------------- | ---------------------------- |
-| **Qwen3.5-35B-A3B** | $0.15 | $0.15 |
-| **Llama 3.3 70B** | $0.20 | $0.20 |
+| **Qwen3.5-35B-A3B** (default) | $0.15 | $0.15 |
+| **Llama 4 Maverick 17B-128E** | $0.17 | $0.66 |
+| **Llama 4 Scout 17B-16E** | $0.11 | $0.38 |
 | **DeepSeek V3** | $0.25 | $0.25 |
+| **Llama 3.3 70B** | $0.20 | $0.20 |
 | **Llama 3.1 8B** | $0.05 | $0.05 |
 
+> Prices are DB-driven and may be updated. Check the [dashboard](https://app.dos.ai/models) or `GET /v1/catalog` for the latest pricing.
+
 ### What is a Token?
 
 A token is roughly 3-4 characters of English text, or about 0.75 words. For example: