Skip to content

Commit 5c86a97

Browse files
JOYclaude
andcommitted
docs: update threat intel stats for ScamSniffer integration (636k+ entries)
- Add ScamSniffer as active source (343k domains + 2.5k wallets) - Update DB stats from 255k to 636k+ entries - Add ChongLuaDao as static import source - Add Caller ID / iCallMe-like feature to roadmap - Update architecture diagram and performance table Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent ae7e4e7 commit 5c86a97

2 files changed

Lines changed: 15 additions & 7 deletions

File tree

DOSafe-Architecture.md

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -197,7 +197,7 @@ Quota: 20 checks/day per chat
197197
Detailed in [threat-intel.md](threat-intel.md).
198198

199199
**Summary:**
200-
- **255k+ entries** from MetaMask (233k domains), URLhaus (22k URLs), OpenPhish (300 URLs)
200+
- **636k+ entries** from ScamSniffer (343k domains + 2.5k wallets), MetaMask (233k domains), ChongLuaDao (34k domains, static), URLhaus (22k URLs), OpenPhish (300 URLs)
201201
- **Schema:** `dosafe.threat_intel` in `dosafe` schema (separate from `public`)
202202
- **Sync:** Edge Function via pg_cron every 6 hours, DB-side SHA-256 hashing
203203
- **Lookup:** Hash entity value → query `entity_hash` index → aggregate multi-source signals
@@ -225,7 +225,7 @@ Other products (Rate.Box, Bexly) call `api.dos.me/trust/check` — they never qu
225225

226226
| Table | Schema | Purpose |
227227
|-------|--------|---------|
228-
| `threat_intel` | dosafe | Unified threat data (255k+ entries) |
228+
| `threat_intel` | dosafe | Unified threat data (636k+ entries) |
229229
| `threat_clusters` | dosafe | Scammer group linking |
230230
| `sync_log` | dosafe | Sync health monitoring |
231231
| `bot_quota` | public | Telegram bot daily limits |
@@ -313,7 +313,7 @@ DOSAFE_API_URL=https://dosafe.io
313313
- [x] URL/domain scam check with on-chain integration
314314
- [x] Telegram bot with bilingual support
315315
- [x] Chrome extension
316-
- [x] Threat intelligence pipeline (255k+ entries, 6h sync)
316+
- [x] Threat intelligence pipeline (636k+ entries, 6h sync)
317317
- [x] Quota system (anonymous + authenticated)
318318

319319
### In Progress
@@ -324,6 +324,7 @@ DOSAFE_API_URL=https://dosafe.io
324324
- [ ] User report command (/report) with LLM entity extraction
325325
- [ ] Entity clustering (auto-link related scammer identities)
326326
- [ ] Sync confirmed flags to DOS.Me Trust API
327+
- [ ] Caller ID / spam phone lookup (iCallMe-like feature)
327328
- [ ] Audio detection pipeline (TTS/voice cloning)
328329
- [ ] Video detection pipeline (deepfake)
329-
- [ ] Vietnamese-specific threat sources (chongluadao.vn)
330+
- [ ] Vietnamese-specific threat sources (kiemtraluadao.vn, checkscam.vn)

threat-intel.md

Lines changed: 10 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,8 @@
88
DOSafe aggregates threat data from multiple external sources into a unified Supabase database (`dosafe.threat_intel`), enabling instant DB-first lookups for URL/phone/entity scam checks with runtime fallback and automatic caching.
99

1010
**Key stats:**
11-
- **255,000+ entries** from 3 sources (MetaMask, URLhaus, OpenPhish)
11+
- **636,000+ entries** from 5 sources (ScamSniffer, MetaMask, ChongLuaDao, URLhaus, OpenPhish)
12+
- **Entity types:** domains (611k), URLs (22k), wallets (2.5k)
1213
- **Sync cadence:** Every 6 hours via pg_cron → Edge Function
1314
- **Lookup speed:** <10ms (SHA-256 hash index)
1415

@@ -19,6 +20,7 @@ DOSafe aggregates threat data from multiple external sources into a unified Supa
1920
│ DATA INGESTION │
2021
│ │
2122
│ pg_cron (every 6h) → Edge Function: sync-threats │
23+
│ ├── ScamSniffer scam-db (343k+2.5k) │
2224
│ ├── MetaMask eth-phishing (233k) │
2325
│ ├── URLhaus abuse.ch (22k) │
2426
│ ├── OpenPhish community (300) │
@@ -145,7 +147,9 @@ Monitoring table for sync health.
145147

146148
| Source | Type | Size | Sync | Mapping |
147149
|--------|------|------|------|---------|
150+
| ScamSniffer scam-database | GitHub JSON | 343k domains + 2.5k wallets | Full replace / 6h | `domain`/`wallet`, `scam`, risk 85 |
148151
| MetaMask eth-phishing-detect | GitHub JSON | 233k domains | Full replace / 6h | `domain`, `phishing`, risk 90 |
152+
| ChongLuaDao blocklist | GitHub JSON | 34k domains (static) | One-time import | `domain`, `phishing`, risk 85 |
149153
| URLhaus (abuse.ch) | Text feed | 22k URLs | Upsert / 6h | `url`, `malware`, risk 85 |
150154
| OpenPhish community | Text feed | 300 URLs | Full replace / 6h | `url`, `phishing`, risk 80 |
151155
| Runtime cache | Auto-generated | Growing | On each check | Various, risk varies, 7-day TTL |
@@ -157,7 +161,8 @@ Monitoring table for sync health.
157161
| DOS Chain on-chain | EAS attestations | Incremental sync of Schema 6 attestations |
158162
| User reports | Telegram bot / Web | `/report` command, LLM entity extraction, initial risk 50 |
159163
| PhishStats | CSV API | ~5k URLs/day, free |
160-
| chongluadao.vn | Vietnamese-specific | Community scam reports |
164+
| kiemtraluadao.vn | Vietnamese scam checker | Investigating API access |
165+
| checkscam.vn | Vietnamese scam checker | Investigating API access |
161166

162167
## Sync Infrastructure
163168

@@ -201,7 +206,9 @@ The Edge Function is deployed with `--no-verify-jwt`, so any Bearer token works
201206
| OpenPhish | 300 | ~1s |
202207
| URLhaus | 22k | ~5s |
203208
| MetaMask | 233k | ~37s |
204-
| **Total** | **255k** | **~43s** |
209+
| ScamSniffer domains | 343k | ~40s |
210+
| ScamSniffer wallets | 2.5k | ~1s |
211+
| **Total** | **~636k** | **~109s** |
205212

206213
## Check Flow Integration
207214

0 commit comments

Comments
 (0)