Skip to content

dwgx/vrchat-il2cpp-re

Repository files navigation

VRChat IL2CPP Reverse Engineering

2026-06-29 build — Unity 6 (6000.0.60f1) — 64,773 classes, 569,859 methods, 188,384 fields GameAssembly.dll (222 MB) | IL2CPP | Unity 6 6000.0.60f1 | Beebyte Obfuscation 🏁 Project complete / archived (2026-07-01) — no longer maintained. Final state frozen below. See BASELINE.md.

Coverage (Unity 6 baseline, official criterion — FINAL)

Metric Count Coverage
Classes (semantic) 5,646 / 9,928 obfuscated 56.9% semantic class names
Methods (semantic) 533,436 / 569,859 93.5% semantic
Methods (hash remaining) 36,423 6.5% fallback (m_XXX)
Fields (semantic) 160,256 / 188,384 85.1% semantic
cross_version entries 40,223 reusable across builds
Pipeline runtime ~30s full run

Final result — structural ceiling reached (2026-07-01). All three axes are at their structural limits: methods 93.5% and fields 85.1% are effectively saturated; classes plateau at 56.9%. This is not effort-limited — it is a proven limit. Four independent type-signal approaches (field-type, parent, interface, method-return-type) each failed the decisive gate2 test (~9% recovery accuracy, far below the 15% acceptance line) because a class is a HAS-A/role over its field types, so type signals reveal what it holds, not its role identity. Metadata decryption was also confirmed a dead end: Beebyte destroys the original class names at compile time (they are stored as ÌÍÎÏ garbage in the metadata itself), so decrypting the structure tables yields no new real names. The remaining unexhausted signal is runtime instance values (strings/JSON keys), which static analysis cannot capture; it was scoped but not pursued to completion. Every assigned name here is an evidence-backed inference, not a recovered original.

Canonical numbers live in output/coverage_stats.json (regenerated every pipeline run via tools/compute_final_stats.py — the single authoritative criterion, which delegates to tools/name_quality.py). The 1,212 field-signature class names (workflow + A1 parallel-agent passes) are re-applied reproducibly by pipeline stage 2d (tools/apply_class_names.py, idempotent), so a rerun never drops them.

Coverage criterion fix (2026-06). The previously reported 62.7% class coverage was inflated: the official weak-name test missed a whole class of structural placeholders (BaseClass290ImplImpl_31B9, BackingFieldBase_16D7 — names synthesized from class topology with no semantics) and counted them as semantic. The corrected criterion (strip synthetic tokens; if nothing meaningful remains, it's fallback) lives in tools/name_quality.py, shared by the pipeline, stats, apply, and grader so they can never drift. The corrected baseline is 45.4%; under the unified criterion the real field-signature names now correctly override placeholders and 8 evidence-synthesized names are added, reaching 46.7% (4,641/9,928). Each obfuscated class also carries a deterministic evidence grade (A/B/C/D, see tools/grade_evidence.py) so every assigned name is auditable. Note: obfuscated real names were destroyed at compile time and are unrecoverable — every name here is an evidence-backed inference, not a recovered original.

Why Unity 6 was a full re-crack. VRChat upgraded from Unity 2022 to Unity 6, which reshuffled the entire IL2CPP runtime layout — the old extractor failed completely. Metadata is encrypted and export symbols are stripped, so static tools (Il2CppDumper) do not work. The extractor uses reverse MethodInfo enumeration: scan all MethodInfo in the heap, resolve their klass, rebuild the type tree. Verified against ground truth (Vector3=x/y/z, Color=r/g/b/a, Transform=157 methods).

IL2CPP struct layout (Unity 6 6000.0.60f1, ground-truth verified)

Field June-13 (Unity 2022) Unity 6
MethodInfo.name 0x10 0x10
MethodInfo → klass 0x18 0x20
klass.name 0xA8 0x98
klass.namespace 0x18 0x18
klass.parent varies 0xA0 (runtime consensus)
klass.fields 0xA8
FieldInfo stride 0x30 0x20
FieldInfo.name 0x10 0x08

The extractor (tools/extract_reverse_unity6.py) is self-checking: ASLR heap-band auto-detect, Transform offset self-verify, parent-offset runtime consensus. If a future build shifts the layout, it errors loudly instead of emitting garbage.

Quick Start

# Full deobfuscation pipeline (5 stages)
python tools/run_full_pipeline.py

# Skip heavy binary analysis
python tools/run_full_pipeline.py --skip-binary

# Quick vocabulary merge + source tree rebuild
python tools/quick_update.py

# Runtime field extraction (requires VRChat offline)
start "" "VRChat.exe" --no-vr
python tools/extract_field_types_v2.py

Pipeline Architecture

precise_dump.json (IL2CPP struct extraction from memory dump)
    │
    ▼
run_full_pipeline.py (orchestrator)
    ├── Stage 0: Merge all name sources → unified_vocabulary.json (44,309 names)
    │     Sources: community deob maps + mod mining + SDK + IDA xrefs + cross-version + metadata
    │
    ├── Stage 1: deobfuscate.py (11-phase rename engine)
    │     lifted → compiler artifacts → community → semantic → property → Unity →
    │     inheritance → cross-ref → shared-method → binary-string → fallback
    │     Result: 8,434 classes + 108,480 method renames applied
    │
    ├── Stage 2: Cross-reference (Photon, SDK, structural, community)
    │     high-confidence overrides on weak/fallback names
    │
    ├── Stage 3: Generate outputs
    │     deobfuscated_dump.json/cs (RVA), name_mapping.json, src/ tree (1,538 files),
    │     coverage_stats.json (canonical numbers)
    │
    └── Stage 4: Generate IDA rename script (226,911 function renames)

Directory Structure

├── tools/              188 scripts (170 Python + 18 JavaScript)
│   ├── Core Pipeline       run_full_pipeline.py, deobfuscate.py, quick_update.py
│   ├── Extraction          extract_precise_dump.py, reverse_struct_layout.py
│   ├── Cross-version       lift_*.py (body-hash, vtable, typedef-token lifts)
│   ├── LLM naming          codex_worker.py, build_audit_batches.py, apply_audit_results.py
│   ├── Runtime/Frida       bridge.py/js, vrc_frida_lib.js, extract_field_types_v2.py
│   ├── Auth/Tracing        trace_auth_flow.js, hook_eos_anticheat.js
│   └── Patching            patch_ga_binary.py, deploy_to_steam.py
│
├── output/             Final products
│   ├── src/                1,538 deobfuscated C# source files (RVA-annotated)
│   │   ├── VRC/                VRChat game code (397 files)
│   │   ├── ThirdParty/         Libraries: Photon, BestHTTP, etc (956 files)
│   │   └── Global/             Global namespace (182 files)
│   ├── coverage_stats.json    Canonical coverage numbers (regenerated per run)
│   ├── *.json              Mappings, vocabulary, analysis results
│   └── *.md                Coverage report, protocol analysis, EAC analysis
│
├── data/               Intermediate analysis data
├── ida/                IDA Pro database + scripts (excluded from git)
├── docs/               GitHub Pages dashboard
├── dumps/              Memory dumps (excluded from git, 7.4GB)
├── external/           36+ cloned repos (excluded from git, 4.9GB)
├── metadata/           Patched global-metadata.dat (excluded from git)
└── archive/            80 historical scripts from 5 dev phases (excluded)

Obfuscation: Beebyte

Beebyte Obfuscator renames identifiers to ÌÍÎÏ strings (U+00CC-00CF) and shuffles the Il2CppClass/FieldInfo/MethodInfo field layout every release (see the Unity 6 offset table above vs the prior Unity 2022 build). Key invariants:

Property Value
Obfuscated identifier regex ^[Ì-Ï]{3,}$
IL2CPP exports stripped — static dumpers do not work
Struct layout re-discovered per build via tools/extract_reverse_unity6.py (self-checking)
global-metadata.dat encrypted (see Metadata Decryption below)

Network Layer

VRChat uses Photon Realtime with FlatBuffer serialization:

Application     VRCPlayer / NetworkManager / UdonBehaviour
Serialization   FlatBufferSerializerCodec (8-bit + 32-bit)
Event Layer     VRCPhotonEvent / IFlatBufferNetworkSerializer
Photon          PhotonPeer → EnetPeer (UDP) / TPeer (TCP) / WebSocket
Encryption      PhotonEncryptorPlugin (native DLL)

Key findings:

  • 15 custom event types documented (Voice, Serialization, Moderation, etc.)
  • 4-token auth chain: Steam → VRChat API → Photon → EAC
  • Server-side EAC validation gates room joins via AuthCookie in Photon plugin
  • FlatBuffers used for both 8-bit (frequent) and 32-bit (full precision) serialization

EAC Analysis

EAC (EOS Anti-Cheat) runs in Client-Server mode with continuous opaque message exchange:

  • Bypass mode: EAC not initialized → no integrity messages → server rejects room joins
  • Normal mode: EAC kernel driver blocks Frida/injection
  • Recommended: Hybrid workflow — offline+Frida for analysis, MelonLoader+EAC for online

See EAC Auth Analysis and Photon Protocol Analysis for details.

Key Constraints

  • EAC blocks online analysis — always use offline VRChat (VRChat.exe --no-vr)
  • ASLR — GameAssembly base changes every launch, hardcoded addresses need updating
  • Never blindly call unknown IL2CPP exports — crashes Frida/VRChat
  • Bridge trampoline (bridge.js) writes shellcode in GA .data section for anti-tamper
  • All Python scripts use sys.stdout.reconfigure(encoding='utf-8') for Windows CJK

Using the Output

IDA / Ghidra Rename Script

The pipeline generates output/ida_apply_names.py with 226K+ function renames.

# In IDA: File -> Script File -> output/ida_apply_names.py
# The script auto-detects IDA's imagebase via idaapi.get_imagebase()
# No manual base address configuration needed

For Ghidra or other tools, use output/name_mapping.json:

{
  "methods": { "OriginalObfClass::OrigObfMethod": "SemanticName", ... },
  "classes": { "ÌÍÎÏÍÌÎ...": "VRCPlayer", ... }
}

Deobfuscated Dump Format (dump.cs)

output/deobfuscated_dump.cs uses RVA (Relative Virtual Address) for method offsets, similar to Il2CppDumper output:

public class VRCPlayer : VRCPlayerApi
{
    public Transform _avatar; // 0x48
    void Awake(); // RVA: 0x1A2B3C0
    void OnPhotonSerializeView(); // RVA: 0x1A2B520
}

To use RVAs in IDA/Ghidra: imagebase + RVA = actual address. IDA's default imagebase for PE files is 0x180000000. The runtime GA base varies per launch due to ASLR.

For richer output with field types, use the source tree (output/src/) which includes resolved types and offsets from field_types.json when available.

Metadata Decryption

VRChat encrypts global-metadata.dat with Beebyte's custom XOR scheme. Use tools/decrypt_metadata.py:

python tools/decrypt_metadata.py <path_to_global-metadata.dat> <output_path>

Algorithm (reverse-engineered from sub_180A7E880 in GameAssembly.dll):

  1. Header (first 0x148 bytes): XOR with key[i] = (i - 0x34) & 0xFF
  2. Sections: 7 sections XOR-decoded with position-dependent keys derived from header size fields

The decrypted metadata enables tools/lift_typedef_tokens.py to recover real class/method names from TypeDefinition tokens.

Note: The encryption constants may change with new VRChat builds. If decryption produces invalid output, re-analyze the decrypt function in GameAssembly.dll (search for the metadata magic 0xFAB11BAF handler).

Documentation

Document Description
Workflow Guide Complete pipeline guide for new contributors
Dashboard Interactive visual overview (GitHub Pages)
Coverage Report Current pipeline coverage metrics
Network Analysis Photon network layer mapping
Photon Protocol Protocol reverse engineering
EAC Auth Analysis EOS anti-cheat authentication

License

Private research project. Not for redistribution.

About

VRChat IL2CPP deobfuscation pipeline — Unity 6 (6000.0.60f1) baseline. Reverse MethodInfo enumeration, 64K classes / 570K methods / 188K fields, reproducible naming.

Topics

Resources

License

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages