Skip to content

Commit 2286e4b

Browse files
JOYclaude
andcommitted
docs: add two-layer billing architecture and LLM fallback details
Document the billing separation: application layer (request-based via consume_quota) vs api.dos.ai gateway (token-based for external consumers). Add INTERNAL_API_KEY and LLM fallback cost tracking documentation. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1 parent b3e191a commit 2286e4b

1 file changed

Lines changed: 36 additions & 4 deletions

File tree

β€ŽDOSafe-Architecture.mdβ€Ž

Lines changed: 36 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# DOSafe β€” System Architecture
22

3-
**Updated:** 2026-03-10
3+
**Updated:** 2026-03-14
44
**Status:** AI Detection COMPLETE. Threat Intel Pipeline COMPLETE (1.52M+ entries, 13 sources). Risk Scoring V2 COMPLETE. Chrome Extension Protection v0.5.4 COMPLETE. Audio/Video TODO.
55

66
**Implementation ownership:** Claude is the primary coding agent for architecture changes; this document is the handoff/reference source for Claude-first implementation.
@@ -103,7 +103,7 @@ Written to by: DOS-Me Trust API (`/trust/flags/:id/attest`). Read by: DOSafe (`d
103103
| `api.dos.ai` | Qwen3.5-35B-A3B-GPTQ-Int4 | RTX Pro 6000 (96GB) | Scorer β€” LLM rubric + image analysis |
104104
| `inference-ref.dos.ai` | Qwen3-8B base | RTX 5090 (32GB) | Observer β€” Binoculars cross-entropy |
105105

106-
Both models are natively multimodal (text + image). Auth: `DOS_INFERENCE_API_KEY`.
106+
Both models are natively multimodal (text + image). Auth: `INTERNAL_API_KEY` via `api.dos.ai` gateway (bypasses billing). Fallback: Alibaba Cloud `qwen3.5-flash` when vLLM is unavailable.
107107

108108
---
109109

@@ -144,6 +144,38 @@ DOS.AI exception: uses Firebase Auth (legacy) + Supabase JWT for billing
144144

145145
DOS.AI dashboard manages API keys via `api.dos.ai/dashboard/*` (internal `X-Dashboard-Secret`). Enterprise clients use `dos_sk_*` keys to call `api.dos.ai/v1/*` which routes to inference.
146146

147+
### Billing Architecture β€” Two-Layer Model
148+
149+
```
150+
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
151+
β”‚ Application Layer (request-based billing) β”‚
152+
β”‚ β”‚
153+
β”‚ dosafe-telegram β†’ consume_quota() per request β”‚
154+
β”‚ dosafe.io web β†’ dosafe_usage per request β”‚
155+
β”‚ chrome extension β†’ anonymous quota (IP-based) β”‚
156+
β”‚ β”‚
157+
β”‚ Knows: who the user is, what tier/plan they're on β”‚
158+
β”‚ Charges: per request, regardless of token count β”‚
159+
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
160+
↓
161+
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
162+
β”‚ api.dos.ai Gateway (token-based billing) β”‚
163+
β”‚ β”‚
164+
β”‚ INTERNAL_API_KEY β†’ skip billing (app layer handles) β”‚
165+
β”‚ dos_sk_xxx β†’ deductBalance per token β”‚
166+
β”‚ β”‚
167+
β”‚ Knows: key type, token usage, model used β”‚
168+
β”‚ Charges: per token for external API key holders β”‚
169+
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
170+
```
171+
172+
**Key principle:** Application-layer products (Telegram bot, web app, extension) handle their own user-facing billing via Supabase RPC (`consume_quota`). They access `api.dos.ai` with `INTERNAL_API_KEY` which bypasses the gateway's billing β€” because billing already happened upstream.
173+
174+
External API consumers (`dos_sk_xxx` keys) are billed per-token at the gateway level, since they call `api.dos.ai` directly without an application layer.
175+
176+
**LLM Fallback & Cost Tracking:**
177+
When self-hosted vLLM is unavailable, `entity-web-search.ts` falls back to paid providers (Alibaba Cloud qwen3.5-flash). Fallback usage is logged to Vercel console as structured JSON (`event: llm_fallback_used`) with token count and estimated cost for internal cost monitoring. User-facing billing remains request-based regardless of which LLM backend served the request.
178+
147179
### DOSafe consumes Supabase directly
148180

149181
DOSafe reads/writes `dosafe.*` schema tables directly with `SUPABASE_SERVICE_ROLE_KEY`. The `dosai.dosafe_usage` table (quota tracking) is also written by DOSafe.
@@ -419,8 +451,8 @@ Input URL
419451
└── 5. Cache runtime result (fire-and-forget, 7-day TTL)
420452
421453
Extension fast path (X-Client-Type: extension):
422-
Skips: WHOIS, web search, LLM, on-chain, session check, quota
423-
Only: DB lookup + Safe Browsing + trusted domain + typosquatting
454+
Skips: Safe Browsing API, WHOIS, web search, LLM, on-chain, session check, quota
455+
Only: DB lookup + trusted domain whitelist + typosquatting detection + cached scores
424456
```
425457

426458
**Response:** `riskScore`, `riskLevel`, `confidence`, `riskSignals[]`, `webAnalysis`, `llmSummary`, `typosquatting`, `threatIntel`, `onChain`

0 commit comments

Comments
Β (0)