Senior Technical Program Manager | AI Platform & Infrastructure | Distributed Systems Execution
13+ years leading cross-team execution across platform engineering, AI/ML infrastructure, and enterprise systems. I drive complex, multi-team technical programs from ambiguity to shipped, measurable outcomes. Deep technical fluency across distributed systems, AI inference pipelines, cloud infrastructure, and compliance automation.
I build production AI systems on hardware I own: fine-tuned compliance LLMs, GPU-accelerated inference infrastructure, and agentic evaluation frameworks, deployed on an NVIDIA DGX Spark.
Senior Technical Program Manager | AI Builder | NVIDIA Inception Member
I build AI systems on hardware I own. 13 years of program management, 13 LLMs fine-tuned across 8 architectures with eval datasets on HuggingFace, and a DGX Spark running 24/7 on my desk.
GPU-native knowledge intelligence platform built on 6 NVIDIA technologies.
Your experts. Your GPU. Your data never leaves.
Ingest domain expertise at scale, build GPU-accelerated relationship graphs with RAPIDS cuGraph, and serve answers through any OpenAI-compatible tool. Built on NIM, TensorRT-LLM, Triton, NeMo Guardrails, cuGraph, and CUDA.
Active development / reference architecture: built and tested (1,006 passing tests), not yet a turnkey end-to-end deploy. See the repo Status section.
| Project | What It Does | Stack |
|---|---|---|
| NeuralForge | GPU-native knowledge intelligence with temporal knowledge graphs | NIM, TensorRT-LLM, Triton, NeMo Guardrails, cuGraph |
| Speech-Systems | Hub for ASR, TTS, and orchestration speech-AI projects (6-version Aurora Echo progression, ASR pipeline, TTS pipeline) | Parakeet, pyannote, MOSS-TTS, faster-whisper, FastAPI, DGX Spark |
| CMMC Compliance AI | 13 fine-tuned LLMs across 8 architectures for cybersecurity compliance (CMMC, NIST, HIPAA) | QLoRA, GGUF, Ollama, DGX Spark |
| Governed LLM Gateway | Policy-as-code gateway with tamper-evident audit trails | FastAPI, SHA-256 hash chains, 103 tests |
| Agentic Evaluation Sandbox | Doer/Judge/Adversary/Observer framework for agent testing | Multi-agent orchestration |
| Self-Healing Agentic Workflows | Circuit breakers, fallback chains, auto-reroute for autonomous agents | Failure detection, recovery |
| garak Contributions | Adversarial probes for NVIDIA's LLM vulnerability scanner | Prompt injection, Unicode obfuscation |
- NVIDIA DGX Spark (GB10, 128GB unified memory) running daily
- Gemma 4 26B A4B for inference at 43 tok/s
- 486K+ knowledge chunks from 80+ AI/ML experts
- 10G office network connecting DGX Spark, NAS, and workstations
Production AI systems: model training pipelines, inference serving, evaluation harnesses, and observability.
| Project | What It Does | Stack |
|---|---|---|
| cmmc-compliance-ai-model | 13 fine-tuned LLMs across 8 architectures (7B-72B) for regulated industries. Flagship: Gemma 4 31B (eval loss 0.4517). QLoRA/DoRA, GGUF, air-gapped Ollama. Eval datasets on HuggingFace. | PyTorch, Unsloth, CUDA, Ollama |
| cmmc-training-data | 18,747 curated compliance examples across 11 regulatory frameworks. Rebuilt from 67K raw examples (73% noise removed). | NIST, CMMC, HIPAA, FedRAMP |
| dgx-spark-kv-cache-benchmark | KV-cache quantization inference benchmarks on NVIDIA DGX Spark GB10 (q4/q8/f16 at long context). Published to r/LocalLLaMA, HN, NVIDIA Forums. | llama.cpp, CUDA 13.0, aarch64 |
| governed-llm-gateway | Policy-as-code LLM gateway: tamper-evident audit trails, rate limiting, cost telemetry. 103 tests. | Python, FastAPI |
| el-barto-serve | OpenAI-compatible inference server. Auto-patches Flash Attention for Blackwell GPUs. | Python, PyTorch |
| memoriant-ops-bot | Multi-provider AI agent orchestration via Telegram/Matrix. Manages Claude Code, Codex CLI, Gemini CLI. | Python, WebSocket |
Competition work is under my dentity007 handle (which displays as Nathan Maine).
Training the best language model in 16MB on 8xH100s. Only entrant to implement all 7 of OpenAI's explicitly requested research directions. 13 PRs submitted, 8 complete training scripts (11,810 lines of novel research code), 25+ GPU experiments across RTX 5090 and H200 SXM pods.
Record Submissions (3-seed verified):
| PR | Architecture | BPB |
|---|---|---|
| #968 | Order-20 Dirichlet Posterior + Per-Order OBCL + Phrase Cache | 0.1154 |
| #948 | Two-Level Dirichlet Posterior + Phrase Cache | 0.1156 |
| #1127 | 11L XSA-all + EMA + LoRA TTT + Partial RoPE + dim480 | 1.1311 |
Neural Track (progressive improvement):
| PR | Architecture | BPB | Seeds |
|---|---|---|---|
| #406 | 11L XSA4 + EMA + Self-Distillation TTT | 1.1287 | 3 |
| #385 | 11L Int6 QAT + SmearGate + SWA(0.4) + WD=0.04 | 1.1488 | 3 |
| #273 | 10L Int6 QAT + SmearGate + SWA | 1.1575 | 1 |
Research Submissions (all 7 OpenAI-requested architectures):
| PR | Architecture | BPB |
|---|---|---|
| #1192 | Fused Triton Megakernels (RMSNorm + LeakyReLU) | 1.356 |
| #1191 | H-Net Dynamic Chunking (learned tokenization) | 1.359 |
| #1193 | Universal Transformer + Adaptive Density | 1.439 |
| #1195 | Learning Adapters on Random Linear Maps | 2.202 |
| #1196 | LLM-JEPA (Joint Embedding Prediction) | 2.202 |
| #1197 | Mamba-Inspired SSM Hybrid (3:1 SSM:Attention) | 3.317 |
| #1194 | Text Diffusion (MDLM, masked discrete diffusion) | 3.380 |
Novel techniques developed beyond OpenAI's requests: Adaptive Density Training (sparse-to-dense progressive unmasking), Echo Training (self-distillation from EMA checkpoints), Gradient Quilting (per-iteration adaptive LR with auto-freezing).
Infrastructure built: 486K+ chunk expert knowledge base from 80+ AI/ML experts. Competitive intelligence pipeline analyzing 1,084 competitor PRs. Multi-pod experiment orchestration. Full Hessian GPTQ validation on Hopper (H200 SXM).
Deterministic, auditable agent components: evaluation, recovery, orchestration, and compliance enforcement.
| Project | What It Does | Link |
|---|---|---|
| Evaluation Sandbox | Doer/Judge/Adversary/Observer holdout scenario evaluation | Repo |
| Blind Scenario Testing | Black-box behavioral testing of live API systems, 151 tests | Repo |
| Self-Healing Workflows | Retry logic, fallback chains, circuit breakers for agent tasks | Repo |
| Temporal Executive Agent | Dependency-ordered planning and execution with state tracking | Repo |
| MCP Data Agent | MCP server exposing CRM/ticket/database tools to LLMs | Repo |
| Fairness Governor | Weighted round-robin allocation with skew-ratio detection | Repo |
Full suite: agentic-ai-portfolio
Tools for scaling governance across distributed engineering teams in regulated environments (CMMC 2.0, NIST 800-171, HIPAA, FedRAMP, DFARS).
| Project | What It Does | Link |
|---|---|---|
| garak Compliance Probes | LLM vulnerability probes for NVIDIA garak. Fabricated regulatory citations (PR #1658), homoglyph obfuscation (PR #1660), architecture Discussion #1659. Decomposed from monolithic PR #1619 per maintainer architectural feedback. | Repo |
| Governance Graph Compiler | Compiles policy Markdown into DAGs for deterministic audit evaluation | Repo |
| Compliance Validation Agent | Validates workflows against compliance rules, generates audit trails | Repo |
| Patent Platform | Full patent pipeline: search, analyze, draft, review, file. 706+ tests. | Repo |
| Component | Details |
|---|---|
| GPU Infrastructure | NVIDIA DGX Spark (GB10, 128GB) for inference/training. 10G backbone, NFS-mounted NAS (3.6TB models). |
| Distributed Training | 8xH100 SXM on RunPod. torchrun DDP, torch.compile, FA3, GPTQ, zstd/Brotli compression. |
| CI/CD & Automation | GitHub Actions, launchd scheduling, automated replay archival, cron-based scraping pipelines. |
| Observability | GPU-accelerated knowledge-platform dashboard (FastAPI + Qdrant + SSE). GPU benchmarking scripts. Pod performance validation. |
| Containerization | Docker Compose for multi-service deployments. TensorRT-LLM containers for NVFP4 quantization. |
Teaching ML by building from scratch. Free, fill-in-the-blanks format.
| Tutorial | What You Build | Link |
|---|---|---|
| smallest-ai-tutorial | 4 neural networks from scratch in pure Python (MLP → LSTM → Transformer → BitNet) teaching phonics. 273 tests. | Repo |
| smallest-ai-built-from-the-ground-up | Full project: Phase 1 complete with all 4 architectures, C export for ESP32, ARM QEMU verification. | Repo |
14 published plugins for AI-powered development workflows: patent drafting, architecture review, load testing, documentation drift detection, governance compilation, test coverage analysis, and more.
| Domain | Proof Points |
|---|---|
| Platform Scale | $20M+ portfolios, 700K-user identity systems, multi-cloud (Sales/Service/Data/Marketing Cloud) |
| Cross-Team Execution | Consecutive 5/5 CSAT across multiple client organizations, cycle times cut 67% (6 weeks to 2 weeks) |
| Security & Identity | 200 application SSO (Okta/SAML/OIDC) across federated business divisions |
| Data Platforms | 89M records, 28+ source systems, 99% identity unification, 95.48% match rates |
| Compliance | SOC2/SOX/CMMC/HIPAA/FedRAMP governance structures across independent engineering teams |
| Regulated Environments | Air-gapped AI deployment, CUI-handling systems, DFARS compliance |
MIT Applied Data Science Certificate | Salesforce: Data Cloud Consultant, Administrator, AI Associate | Scrum: CSM | NVIDIA Inception Member
📧 nmaine@gmail.com | LinkedIn | GitHub | HuggingFace
