Curious guy who likes maths, physics, motorbikes, and programming. I build AI systems that ship in production — multi-agent orchestration, hybrid retrieval, and RLVR fine-tuning. CS at Queens College, CUNY (May 2026). Founded WORT.AI and SOUL.md.
Recursive BFS · parallel researcher–reviewer loops · cited HTML reports · hybrid RAG memory
A research engine built from first principles. WORT decomposes a goal into sub-queries, runs parallel researcher–reviewer subgraphs, closes information gaps iteratively, and ships cited reports people can trust.
| Layer | Implementation |
|---|---|
| Orchestration | LangGraph state machine with human-in-the-loop checkpoints and dynamic parallel subgraphs |
| Research | BFS tree per agent; reviewer redirects weak paths in real time |
| Memory | Qdrant dense + sparse BM25, RRF fusion, cross-encoder reranking |
| Training | RLVR on Qwen for query generation (71% → 94%) and report synthesis |
| Xtreme | VM-backed code execution, messy web pages, richer artifact export |
Behavioral steering · multi-model TRAIT eval · open source · drop-in agent personalities
Open-source collection of portable SOUL.md files that give LLM agents a distinct, stable personality and steer how they reason, respond, and push back.
| Layer | Detail |
|---|---|
| Format | Markdown persona spec: traits, tone, boundaries, reasoning style |
| Research | Weak-to-strong jailbreaking; small models can steer larger aligned models past 99% misalignment |
| Evaluation | Tested across DeepSeek, Gemini, Claude; adversarial personas held up to ~78% |
| Impact | 200+ stars and 20+ forks within two weeks of launch |
Architected the workforce platform for 5,000+ caregivers — scheduling, payroll, and performance tracking at scale.
- Built Agent ZOI with tool use and self-RAG memory — ticket resolution from days to minutes, covering workload of 7 support staff
- 80% load reduction on scheduling APIs via Redis, query restructuring, and Postgres indexing under 5,000+ concurrent sessions
- Shipped gamified performance and rewards pipelines driving measurable caregiver engagement
Led a 4-person team on LLMs for tabular data.
- 90–93% classification and 81–86% regression — ~40 points above traditional ML baselines
- Fine-tuning pipelines for GPT-3.5/4 with token-efficient table representations
- Turned 25+ papers into a reproducible experimental pipeline
Backend for an education platform serving 10,000+ students — REST APIs, 45% latency reduction, CI/CD (4h → 30min), Docker/K8s on AWS.

