Budget-aware multi-agent video production with AI orchestration. Manages competitive pilots, real video generation with Luma AI, image generation with DALL-E, text to audio generation with ElevenLabs, vision-based QA analysis, and self-improving provider learnings.
- Learning system - as you use it, it gets smarter
- Provider onboarding - it is an agent that reads documentation and creates implementations and tests that can use whatever that new thing can do (like a new image/audio generator or data source like pdfs or stats), and yes, this uses the learning system
- Knowledge base - content sourced from things like scientific journals are atomized, graphed, and then usable assets for your videos
- Multi-tenant memory - secure, share, and manage your memories by adjusting their scope
- Scale your agents - deploy graph, swarm, or mixes of the two to go parallel or use the outputs of one part of the workflow for other parts of the workflow
- Claude (obviously)
- Strands SDK (for agent workflows and scaling)
- Click and Rich (for the CLI)
- FFmpeg (for video rendering)
Optional
- lumaai (you can use other providers)
- pymupdf (if you want pdf source material)
- jinja2 (if you want to use the dashboard)
- Amazon Bedrock AgentCore (if you want to host the long term memory and run agents on their platform; agents are locally run by default)
I wanted to make a demo project that 1) shows off what you can do pretty quickly with Claude; 2) how to design and implement a working multi-agent workflow; 3) use learning/memory; 4) use rewards; and 5) have fun.
If you're curious about the design aspect, there are a bunch of spec docs and you can look at their timestamps to get a rough idea of the layering of the features. Well, I/we (me & the Claudes) did a lot in two days, let's just say that. I started this January 9th, and
Read more about the project in my developer notes
System Requirements:
- Python 3.11+
- FFmpeg (required for video/audio processing)
Install FFmpeg:
# macOS
brew install ffmpeg
# Ubuntu/Debian
sudo apt install ffmpeg
# Windows
choco install ffmpeg
# Or download from https://ffmpeg.org/download.htmlVerify installation:
ffmpeg -version# Clone and install
git clone https://github.com/aaronmarkham/claude-studio-producer.git
cd claude-studio-producer
pip install -e ".[server]"
# Set up API keys securely with OS keychain
claude-studio secrets set ANTHROPIC
claude-studio secrets set LUMA
# Or use environment variables
cp .env.example .env
# Add ANTHROPIC_API_KEY and LUMA_API_KEY
# Run a production (mock mode - no API costs)
claude-studio produce "A serene mountain lake at sunset" --budget 5
# Run with real video generation
claude-studio produce "A serene mountain lake at sunset" --budget 5 --live --provider lumaflowchart TB
subgraph Input["Input"]
Request["Production Request<br/>concept + budget + seed assets"]
end
subgraph Memory["Memory System"]
LTM["Long-Term Memory<br/>Multi-tenant namespace hierarchy<br/>Platform → Org → User → Session"]
end
subgraph Planning["Planning Stage"]
Producer["ProducerAgent<br/>Creates pilot strategies"]
ScriptWriter["ScriptWriterAgent<br/>Generates scenes"]
end
subgraph Generation["Parallel Generation"]
Video["VideoGenerator<br/>Luma / Runway / DALL-E"]
Audio["AudioGenerator<br/>ElevenLabs / OpenAI / Google"]
end
subgraph Evaluation["Real Evaluation Pipeline"]
QA["QAVerifier<br/>Claude Vision<br/>Frame extraction + analysis"]
Critic["CriticAgent<br/>Scores + decisions +<br/>provider analysis"]
end
subgraph Output["Output Stage"]
Editor["EditorAgent<br/>Edit candidates"]
Renderer["FFmpegRenderer<br/>Final video + text overlays"]
end
Request --> Producer
Producer --> ScriptWriter
ScriptWriter --> Video
ScriptWriter --> Audio
Video --> QA
Audio --> QA
QA --> Critic
Critic --> Editor
Editor --> Renderer
LTM -.->|"provider guidelines<br/>avoid patterns"| Producer
LTM -.->|"prompt tips<br/>what works"| ScriptWriter
Critic -.->|"new learnings<br/>what worked/failed"| LTM
The memory system uses a hierarchical namespace structure for learnings:
PROVIDER LEARNING LIFECYCLE
============================
1. ONBOARDING (one-time per provider)
┌─────────────────┐
│ API Docs │──► Onboarding ──► tips, gotchas, limitations
│ Stub File │ Agent │
└─────────────────┘ │
▼
2. STORAGE (namespace hierarchy) ┌─────────┐
│ USER │ ◄── initial home
┌──────────────────────────────────┴─────────┴──────────────────┐
│ │
│ SESSION (0.5) ──► USER (0.65) ──► ORG (0.8) ──► PLATFORM (1.0)
│ experimental validated team-wide curated
│ ▲ ▲
│ │ │
│ promote if promote if
│ works well cross-team value
└───────────────────────────────────────────────────────────────┘
3. PRODUCTION (ongoing)
┌─────────────────┐ ┌─────────────┐ ┌─────────────┐
│ ScriptWriter │◄─────│ merged │◄─────│ all tiers │
│ (uses tips) │ │ learnings │ │ by priority│
└────────┬────────┘ └─────────────┘ └─────────────┘
│
▼
┌─────────────────┐
│ Video/Audio │──► actual generation
│ Generation │
└────────┬────────┘
│
▼
┌─────────────────┐ ┌─────────────┐
│ Critic Agent │──────│ SESSION │──► new learnings
│ (evaluates) │ │ memory │ (what worked/failed)
└─────────────────┘ └─────────────┘
Key Features:
- Priority-based retrieval: Platform learnings override org, org overrides user
- Automatic promotion: Learnings can be promoted up the hierarchy based on validation count
- CLI management:
claude-studio memorycommands for viewing, adding, and managing learnings - Categories: avoid, prefer, tip, pattern - for different types of provider knowledge
# Basic production
python -m cli.produce "Product demo for mobile app" --budget 10
# With seed image (image-to-video)
python -m cli.produce "Animate this logo" --budget 5 --seed logo.png
# Live mode with real generation
python -m cli.produce "Tech startup intro" --budget 15 --live --provider lumaThe CLI shows:
- Real-time agent progress
- QA scores per scene (Visual, Style, Technical, Narrative)
- Issues found and suggestions
- Provider learnings extracted
After a production run, create custom edits by combining specific scenes:
# List available scenes from a run
python -m cli.combine 20260109_080534 --list
# Combine scenes 1 and 3 (skipping scene 2)
python -m cli.combine 20260109_080534 --scenes 1,3
# Combine with custom output name
python -m cli.combine 20260109_080534 --scenes 1,3,5 -o highlight_reel.mp4This is useful when:
- Some scenes have different actors/styles you want to exclude
- Creating alternate cuts or highlight reels
- Manual override of the automated EDL
Create videos from research papers, documents, and notes with automatic knowledge extraction:
# Create a knowledge project
claude-studio kb create "AI Research" -d "Papers on neural networks"
# Add a PDF paper (uses Claude for intelligent atom extraction)
claude-studio kb add "AI Research" --paper paper.pdf
# Add with mock mode (faster, no LLM costs)
claude-studio kb add "AI Research" --paper paper.pdf --mock
# View project summary
claude-studio kb show "AI Research" --graph
# Produce video from knowledge base
claude-studio kb produce "AI Research" \
-p "Explain the key findings and methodology" \
--style podcast \
--duration 120 \
--mockFeatures:
- Content-aware classification: Pre-LLM classifier detects document type (scientific paper, news, etc.) and structural zones to guide extraction
- Document ingestion: Extracts atoms (paragraphs, figures, tables, equations) with PyMuPDF
- LLM classification: Claude categorizes atoms by type and extracts topics/entities, with zone-aware filtering to prevent metadata pollution
- Knowledge graph: Builds cross-document connections via shared entities
- Rich concept generation: Assembles KB content into detailed prompts for ScriptWriter
flowchart LR
subgraph Sources["Source Documents"]
PDF1["PDF Paper 1"]
PDF2["PDF Paper 2"]
Note["Text Notes"]
end
subgraph Ingestion["Document Ingestion"]
PyMuPDF["PyMuPDF<br/>Text + Figure Extraction"]
Claude["Claude LLM<br/>Atom Classification<br/>Topic/Entity Extraction"]
end
subgraph KB["Knowledge Project"]
DG1["DocumentGraph 1<br/>atoms, hierarchy"]
DG2["DocumentGraph 2<br/>atoms, hierarchy"]
KG["KnowledgeGraph<br/>cross-links, themes,<br/>entity index"]
end
subgraph Production["Video Production"]
Concept["Rich Concept<br/>abstracts + quotes +<br/>figures + entities"]
Pipeline["Production Pipeline<br/>ScriptWriter → Video →<br/>Audio → QA → Edit"]
end
PDF1 --> PyMuPDF
PDF2 --> PyMuPDF
Note --> KG
PyMuPDF --> Claude
Claude --> DG1
Claude --> DG2
DG1 --> KG
DG2 --> KG
KG --> Concept
Concept --> Pipeline
Control how verbose and conversational your scripts are:
# Brief visual storyboard (default) - ~20-30 words per scene
claude-studio produce -c "Product demo" --style visual_storyboard
# Rich podcast narrative (NotebookLM-style) - ~100 words per scene
claude-studio produce -c "Explain quantum computing" --style podcast
# Educational lecture format - ~80-120 words per scene
claude-studio produce -c "Tutorial on React hooks" --style educational
# Documentary with gravitas - ~60-100 words per scene
claude-studio produce -c "History of the internet" --style documentary| Style | Words/Scene | Best For |
|---|---|---|
visual_storyboard |
~20-30 | Product demos, ads, visual-first content |
podcast |
~85-100 | Explainers, research summaries, educational deep-dives |
educational |
~80-120 | Tutorials, lectures, learning content |
documentary |
~60-100 | Narratives, historical content, storytelling |
Train the system to generate high-quality podcast scripts from research papers using an ML-style iterative approach:
# Run a training trial on a knowledge base project
claude-studio training run uav-positioning \
--reference-audio ref_podcast.mp3 \
--reference-transcript ref_transcript.txt
# View training results
claude-studio training show trial_000_20260205
# List all training trials
claude-studio training listThe training pipeline:
- Transcribes reference podcasts using Whisper
- Classifies segments (INTRO, BACKGROUND, METHODOLOGY, KEY_FINDING, etc.)
- Extracts style profiles (vocabulary, pacing, conversation dynamics)
- Synthesizes learnings for improved script generation
- Runs iterative trials to optimize quality
Create explainer videos from training outputs with budget-aware visual generation:
# Show budget tier costs for a training trial
claude-studio produce-video -t trial_000_20260205 --show-tiers
# Produce with low budget (hero images only)
claude-studio produce-video -t trial_000_20260205 --budget low --mock
# Produce with KB figures and live generation
claude-studio produce-video -t trial_000_20260205 --budget medium --kb uav-research --live
# Incremental production (first 5 scenes)
claude-studio produce-video -t trial_000_20260205 --budget medium --limit 5 --liveBudget Tiers:
| Tier | DALL-E Images | Luma Animations | Estimated Cost |
|---|---|---|---|
micro |
0 (text only) | 0 | $0 |
low |
~15 hero images | 0 | $1-2 |
medium |
~40 consolidated | 0 | $3-5 |
high |
~80 images | 5 selective | $8-12 |
full |
All scenes | All candidates | $15+ |
The system uses scene importance scoring to allocate limited image budgets to the most impactful moments (KEY_FINDING, METHODOLOGY, FIGURE_DISCUSSION).
Claude Studio Producer supports two production workflows that determine which asset drives the timeline:
In audio-led mode, the audio narration drives the timeline. Videos are generated or adjusted to match the audio duration. Perfect for:
- Podcast-style narratives
- Educational content with precise voiceover
- Documentary-style productions
# Audio-led production (auto-detected from style)
claude-studio produce "Explain quantum computing" --style podcast --budget 5 --live
# Or explicitly set mode
claude-studio produce "Tutorial on Python decorators" --mode audio-led --budget 5 --liveIn video-led mode, the video drives the timeline. Audio is generated to match video duration. Perfect for:
- Visual-first storytelling
- Cinematic sequences
- Image-to-video pipelines
# Video-led production (default for visual styles)
claude-studio produce "Cinematic coffee commercial" --style visual_storyboard --budget 5 --liveThe production pipeline automatically:
- Generates video for each scene
- Generates audio narration for each scene
- Mixes video + audio using FFmpeg with appropriate fit modes
- Concatenates all scenes into final output (
artifacts/<run_id>/final_output.mp4)
No manual mixing required! The system handles synchronization based on your chosen production mode.
Fit Modes (for duration mismatches):
stretch: Speed-adjust video to match audio (default for audio-led)truncate: Trim longer asset to match shorterloop: Loop shorter asset to match longer
# Start the server
python -m server.main
# Open http://localhost:8000View all runs, preview generated videos, and inspect QA scores.
When using --live mode, the QA system:
- Extracts frames from generated videos using ffmpeg
- Sends frames to Claude Vision for analysis
- Scores on 4 dimensions (0-100 each):
- Visual Accuracy: Do visuals match the scene description?
- Style Consistency: Does it match the production tier?
- Technical Quality: Any artifacts, blur, or issues?
- Narrative Fit: Does it work in the overall story?
- Records issues and improvement suggestions
The system learns from every run:
Run 1: "magical transformation effect" -> Score: 45
-> Learning: "Luma struggles with VFX transformations"
-> Added to avoid_list
Run 2: System avoids VFX, uses "slow camera pan" -> Score: 88
-> Learning: "Detailed physical descriptions work well"
-> Added to prompt_guidelines
Run 3+: Better prompts, higher scores
Learnings are stored in the multi-tenant memory system (artifacts/memory/) and used to improve future runs. Use claude-studio memory list luma to see current learnings.
The CLI shows real-time progress through each stage of production:
Stage 1: Planning - Producer creates pilot strategy, ScriptWriter generates scenes
Stage 2: Generation - Video generation with Luma AI
Stage 3: QA & Evaluation - Claude Vision analyzes frames, Critic extracts learnings
Stage 4: Output - Editor creates EDL, Renderer produces final video
Example outputs from Luma AI image-to-video generation:
| Pencil Animation | Keyboard Animation |
|---|---|
![]() |
![]() |
The system accumulates learnings from each run to improve future prompts:
| Agent | Status | Description |
|---|---|---|
| ProducerAgent | Implemented | Analyzes requests, creates pilot strategies using provider knowledge |
| ScriptWriterAgent | Implemented | Breaks concepts into scenes, applies learned prompt guidelines |
| VideoGeneratorAgent | Implemented | Generates video with Luma AI (real) or mock providers |
| QAVerifierAgent | Implemented | Vision-based quality analysis with Claude |
| CriticAgent | Implemented | Evaluates results, extracts provider learnings |
| ProviderOnboardingAgent | Implemented | Analyzes API docs, generates provider implementations, validates with tests |
| EditorAgent | Implemented | Creates EDL candidates for final assembly |
| AudioGeneratorAgent | Implemented | TTS voiceover generation with ElevenLabs/OpenAI |
| DocumentIngestorAgent | Implemented | PDF ingestion, atom extraction, LLM classification |
| AssetAnalyzerAgent | Stub | Seed asset analysis with Claude Vision |
| Provider | Status | Notes |
|---|---|---|
| Luma AI | Implemented | Image-to-video, text-to-video, scene chaining |
| Runway ML | Stub | Interface + cost model ready |
| Pika Labs | Stub | Interface + cost model ready |
| Kling AI | Stub | Interface + cost model ready |
| Provider | Status | Notes |
|---|---|---|
| ElevenLabs | Implemented | High-quality TTS, 29 languages, voice cloning |
| OpenAI TTS | Implemented | 6 voices, fast generation |
| Google TTS | Implemented | Neural2/WaveNet/Studio voices, 75+ languages |
| Inworld | Stub | Interface ready |
| Provider | Status | Notes |
|---|---|---|
| DALL-E | Implemented | DALL-E 2 & 3, seed images, thumbnails |
| Midjourney | - | Not planned (no API) |
| Stability | Stub | Interface ready |
claude-studio-producer/
├── agents/ # Agent implementations
│ ├── producer.py # Pilot strategy creation
│ ├── script_writer.py # Scene generation with provider guidelines
│ ├── video_generator.py # Video generation orchestration
│ ├── qa_verifier.py # Real vision-based QA
│ ├── critic.py # Evaluation + provider learning extraction
│ └── editor.py # EDL generation
│
├── cli/
│ ├── produce.py # Main CLI with --live mode and --style
│ ├── produce_video.py # Transcript-led video production with budget tiers
│ ├── training.py # Podcast training pipeline CLI
│ ├── kb.py # Knowledge base management CLI
│ └── luma.py # Luma testing CLI
│
├── core/
│ ├── claude_client.py # Claude SDK wrapper with vision support
│ ├── budget.py # Cost models and tracking
│ ├── renderer.py # FFmpeg video rendering
│ ├── memory/ # Memory system
│ │ ├── manager.py # MemoryManager (STM + LTM)
│ │ └── bootstrap.py # Provider knowledge seeding
│ ├── models/
│ │ ├── memory.py # ProviderKnowledge, ProviderLearning, etc.
│ │ ├── knowledge.py # KnowledgeProject, KnowledgeGraph, KnowledgeSource
│ │ └── document.py # DocumentGraph, DocumentAtom, AtomType
│ └── providers/
│ └── video/
│ ├── luma.py # Real Luma AI integration
│ └── ... # Other provider stubs
│
├── server/
│ ├── main.py # FastAPI server
│ ├── routes/
│ │ ├── runs.py # Run list and preview API
│ │ └── memory.py # Memory/LTM API
│ └── templates/ # Dashboard HTML templates
│
├── docs/
│ └── specs/ # Detailed specifications
│ ├── MULTI_TENANT_MEMORY_ARCHITECTURE.md # Memory system design
│ └── ... # Other specs
│
└── artifacts/ # Run outputs
├── memory.json # LTM with provider learnings
└── runs/ # Per-run data and videos
# .env file
ANTHROPIC_API_KEY=sk-ant-... # Required
LUMA_API_KEY=luma-... # For live video generation
RUNWAY_API_KEY=... # For Runway provider (optional)# Install dev dependencies
pip install -e ".[dev,server]"
# Run tests
pytest
# Start server with auto-reload
uvicorn server.main:app --reload- Multi-agent orchestration with Strands
- Real video generation (Luma AI)
- Vision-based QA with Claude
- Provider learning system (LTM)
- Web dashboard
- CLI with live progress
- FFmpeg rendering
- Multi-tenant memory system with namespace hierarchy
- Memory CLI (
claude-studio memorycommands) - Learning promotion system (session → user → org → platform)
- Provider onboarding agent with auto-test and session resume
- ElevenLabs TTS integration (voice selection, streaming, voice settings)
- OpenAI TTS integration
- Knowledge base system (
kbCLI for multi-source document management) - Document ingestion with figure extraction (PyMuPDF + Claude)
- Configurable narrative styles (podcast, educational, documentary)
- DALL-E image generation provider
- Google Cloud TTS provider (Neural2, WaveNet, Studio voices)
- Runway ML video provider (image-to-video)
- Multi-provider pipeline (e.g., DALL-E image → Runway video)
- Audio-video synchronization and mixing (Audio-led and video-led modes with automatic pipeline)
- Video/audio mixing and rendering with volume control
- Podcast training pipeline (ML-style iterative improvement with reference podcasts)
- Transcript-led video production (
produce-videoCLI with budget tiers) - Scene importance scoring for budget-aware image allocation
- KB figure integration (use extracted PDF figures in videos)
- Claude Code skills integration (progressive disclosure with
/produce,/train)
- Additional video providers (Pika, Kling)
- Additional audio providers (Inworld)
- Multi-pilot competitive generation
- AWS AgentCore memory backend (production)
- S3 storage integration
MIT-0 (MIT No Attribution) - see LICENSE






