Skip to content

lossless-group/investment-memo-orchestrator

Repository files navigation

Investment Memo Orchestrator

Multi-agent orchestration system for generating high-quality investment memos using LangGraph and specialized AI agents.

Status: Production-Ready with Section-by-Section Processing ✅

Supported by Hypernova Capital, Avalanche VC, and Emerge Capital


Table of Contents


Similar Services

Many people may not want to manage using an Open Source library and deal with the command line. These are services we have found that can provide similar Investment Memo Generation.

Dedicated Private Markets AI Platforms

General AI Automation Platforms with Blueprints or Templates

Recent Updates

2026-03-24: Interactive CLI Application (memopop) (v0.5.2)

A guided terminal application that replaces the need to know 15+ CLI commands. Single entry point with auto-discovery of firms, deals, and versions:

# Activate your venv, then:
memopop
╭─────────────────────────────────────────────────────────╮
│                                                         │
│   Investment Memo Orchestrator                          │
│   v0.5.2 · 33 agents · Powered by Claude + Perplexity  │
│                                                         │
╰─────────────────────────────────────────────────────────╯

? What would you like to do?
  ❯ 📝  Generate a new investment memo
    📄  Generate a one-pager summary
    📤  Export an existing memo (HTML / PDF / Word)
    🔧  Improve a specific section
    🔄  Integrate content from versions
    📊  Run a specific agent

Capabilities:

  • Generate: Firm → deal → version strategy (fresh/resume) → confirmation → pipeline run → post-run actions
  • Export: Select format (HTML+PDF, Word, one-pager, all) and mode (light/dark/both)
  • Improve: Select a section, improve with Perplexity Sonar Pro
  • Integrate: Review sources, competitive landscape, table proposals across versions
  • Agent runner: Run any individual agent on existing output

Built with Rich + questionary. No flags, no file paths required.

2026-03-23: LLM Fact-Check Pipeline, Source Catalogs, One-Pager (v0.5.0–v0.5.2)

  • Fact-check pipeline: Three-step extract → verify (Perplexity) → correct (Claude) chain with full traceability
  • Source catalogs: Per-section complete source lists documenting every source the pipeline encountered
  • One-pager generator: Claude-designed single-page visual summary (agent + CLI)
  • Citation spacing agent: Mechanical regex fixer for markdown citation formatting
  • TOC fixes: Resolved Table of Contents being destroyed during scorecard integration
  • Disambiguation: Mechanical post-synthesis override for company identity

See v0.5.2 release notes for full details.

2025-12-03: Firm-Scoped IO System (v0.3.0)

Multi-tenant architecture enabling private firm-specific configurations, deal data, and branded exports. Each firm can maintain isolated data while sharing the core codebase.

New Directory Structure:

io/
└── {firm}/                           # e.g., "hypernova", "emerge"
    ├── configs/
    │   └── brand-{firm}-config.yaml  # Firm-specific brand styling
    ├── templates/
    │   ├── outlines/                 # Firm-specific content outlines
    │   └── scorecards/               # Firm-specific evaluation scorecards
    └── deals/
        └── {deal}/                   # e.g., "Blinka", "CoachCube"
            ├── {deal}.json           # Deal configuration
            ├── inputs/               # Pitch decks, datarooms
            ├── outputs/              # Versioned memo artifacts
            └── exports/              # Branded HTML/PDF exports
                ├── dark/
                └── light/

Key Features:

  • Privacy: Firm data stays in io/ (gitignored or private git submodule)
  • Multi-Tenant: Multiple firms use same codebase with isolated configs
  • Backward Compatible: Legacy output/, data/, templates/ paths still work
  • Auto-Detection: System auto-detects firm from io/{firm}/deals/{deal}/ paths
  • Resume: New resume_from_interruption.py CLI to continue interrupted generation

Usage:

# Generate memo with firm context
python -m src.main "Blinka" --firm hypernova

# Export with firm-scoped paths
python cli/export_branded.py --firm hypernova --deal Blinka --mode dark --pdf

# Resume interrupted generation
python cli/resume_from_interruption.py --firm emerge --deal CoachCube

See io/README.md for complete firm-scoped IO documentation.


2025-11-26: Dataroom Analyzer Agent System

New multi-agent dataroom analyzer for processing investment datarooms containing diverse document types (pitch decks, financials, battlecards, etc.). The system scans, classifies, and extracts structured data from dataroom documents.

Phase 1 - Document Scanning & Classification

  • Three-stage classification: directory-based → filename pattern → LLM fallback
  • Supports 12+ document types (pitch_deck, cap_table, competitive_analysis, etc.)
  • Confidence scoring with classification reasoning

Phase 2 - Specialized Extractors (5/5 Complete) ✅

  • Competitive Extractor - Synthesizes battlecards into unified competitive landscape
  • Cap Table Extractor - Ownership structure, shareholders, option pools, SAFEs, convertible notes
  • Financial Extractor - P&L, projections, key metrics from CSV/Excel
  • Traction Extractor - Customer counts, ARR/MRR, retention, pipeline, partnerships
  • Team Extractor - Founders, leadership, headcount, advisors, board

Phase 2.1 - Cap Table Extractor Enhancement (2025-03-09)

  • Multi-sheet Carta export parsing (CS Certificate Ledger + Convertible Ledger)
  • Pre-round vs post-round cap table classification with reasoning
  • SAFE and convertible note extraction (investor, amount, cap, discount, dates)
  • Estimated post-conversion ownership table sorted by ownership %
  • Markdown table generation for memo section insertion
  • Expanded data model (amount_invested, vesting, price_per_share on shareholders)
  • Option pool extraction from equity ledger
  • Discount-only SAFE conversion estimation (needs round price)
  • Post-conversion estimates accounting for new round dilution
  • Date formatting cleanup from Carta timestamps

Phase 3 - Data Synthesis

  • Conflict detection - ARR mismatches, headcount discrepancies, ownership totals
  • Data gap identification - Critical (burn rate, runway) and medium priority gaps
  • Cross-reference engine - Unified metrics with confidence scores
  • Synthesis report - Human-readable conflict/gap analysis

Output Structure:

output/Company-v0.0.1/
├── 0-dataroom-inventory.json/md    # Document inventory
├── 1-competitive-analysis.json/md  # Competitive landscape
├── 2-cap-table.json/md             # Ownership structure
├── 3-financial-analysis.json/md    # Financial projections
├── 4-traction-analysis.json/md     # Customers & traction metrics
├── 5-team-analysis.json/md         # Team & leadership profiles
└── 6-synthesis-report.json/md      # Cross-reference, conflicts, gaps

See changelog/2025-11-26_01.md through changelog/2025-11-26_07.md for complete details.

2025-11-22: Premium Data Sources Integration

Perplexity @ Syntax Integration: Research queries now automatically target premium data sources using Perplexity's @source syntax. All 20 outline sections (direct + fund) have section-specific source preferences that ensure high-quality research from authoritative sources like @crunchbase, @pitchbook, @statista, and @cbinsights. This prevents low-quality filler content from generic benchmark sites. See changelog/2025-11-22_02.md for complete details.

Key Benefits:

  • ✅ 80-90% authoritative sources (up from 50-70%)
  • ✅ Section-specific source targeting (Market Context uses @statista, Team uses @linkedin, etc.)
  • ✅ Automatic source aggregation (8 premium sources from 5 key sections)
  • ✅ Zero additional cost (uses existing Perplexity API)

2025-11-20: Section-by-Section Processing

Major architecture refactor: The system now processes sections individually throughout the entire pipeline, eliminating API timeout issues and ensuring consistent citation formatting. All enrichment agents work on section files rather than assembled content. See changelog/2025-11-20_01.md for complete details.

Outstanding Issues

  • Ongoing need for reactivating venv and reinstalling dependencies, despite having done so already.

Overview

This system uses a supervisor pattern with specialized AI agents to generate investment memos that match Hypernova Capital's analytical standards. Instead of a single AI prompt, it coordinates multiple expert agents that research, write, enrich, cite, validate, and iterate on memos using section-by-section processing to avoid timeouts and maintain quality.

Key Features

Core Command

# Legacy mode (outputs to output/)
python -m src.main "Company Name" --type direct --mode consider

# Firm-scoped mode (outputs to io/{firm}/deals/{deal}/outputs/)
python -m src.main "Company Name" --firm hypernova --type direct --mode consider

Multi-Agent Pipeline

The system coordinates 18 specialized agents: deck analysis, research, writing, enrichment (trademarks, socials, links, citations), citation cleanup (remove invalid sources, assembly), validation, fact-checking, and scorecard evaluation. All agents process sections individually to avoid API timeouts.

See Pipeline Agents Reference and CLI Tools Reference for complete listings.

Web Search Integration with Premium Source Targeting

  • Premium data sources: Research queries enhanced with Perplexity @source syntax (@crunchbase, @pitchbook, @statista, @cbinsights)
  • Section-specific sources: Each memo section targets appropriate authoritative sources (e.g., Market Context uses @statista, Team uses @linkedin)
  • Automatic aggregation: 8 premium sources aggregated from 5 key sections for comprehensive coverage
  • Quality control: Prevents low-quality filler from benchmark sites and SEO spam
  • Multi-query strategy: Company overview, funding, team, news with targeted source selection
  • Multiple providers: Tavily API (preferred), Perplexity Sonar Pro API, or DuckDuckGo (free fallback)
  • Automatic fallback: If Tavily unavailable, falls back to Perplexity → DuckDuckGo
  • Free option: Set RESEARCH_PROVIDER=duckduckgo for free web search (no API key required)

Artifact Trail System

  • Complete transparency: Every workflow step saves artifacts to output directory
  • Research artifacts: 1-research.json (structured data) and 1-research.md (human-readable summary)
  • Section drafts: Individual section files in 2-sections/ (all 10 sections as separate .md files)
  • Validation reports: 3-validation.json (scores/feedback) and 3-validation.md (human-readable report)
  • Final output: 4-final-draft.md with inline citations and citation list
  • State snapshot: state.json for full workflow debugging
  • Benefits: Inspect intermediate outputs, identify improvement areas, preserve citations through pipeline

Citation System (Perplexity Sonar Pro with Premium Sources)

  • Inline citations: Industry-standard [^1], [^2] format throughout memo with space separation
  • Placement: After punctuation with space: text. [^1] or multiple: text. [^1] [^2]
  • Source enrichment: Perplexity Sonar Pro adds citations to each section independently
  • Global renumbering: Citations renumbered sequentially across all sections ([^1][^2][^3]...)
  • Consolidated format: ONE citation block at the end (not duplicated per section)
  • Premium sources: Automatically targets authoritative sources via @ syntax:
    • @crunchbase: Funding data, investors, team backgrounds, firmographics
    • @pitchbook: Valuations, market analysis, deal data, fund performance
    • @statista: Market statistics, TAM/SAM sizing, industry forecasts
    • @cbinsights: Market trends, competitive intelligence, startup tracking
    • @bloomberg, @reuters, @forbes: Financial journalism and news
    • @sec: Regulatory filings, IPO data, fund disclosures
    • @linkedin: Professional backgrounds and team profiles
  • Quality control: Prevents citations from low-quality blogs, benchmark sites, SEO spam
  • Citation format: [^1]: YYYY, MMM DD. [Source Title](URL). Published: YYYY-MM-DD | Updated: N/A
  • Markdown links: URLs wrapped in clickable markdown links for easy reference

Multi-Brand Export System

  • Customizable branding: Configure company name, tagline, colors, and fonts via YAML files
  • Firm-scoped configs: Brand configs in io/{firm}/configs/ for private firm branding
  • Shared configs: Brand configs in templates/brand-configs/ for cross-firm use
  • Multiple brands: Support for multiple VC firm clients in a single installation
  • Quick setup: Copy templates/brand-configs/brand-config.example.yaml and customize with your firm's colors and fonts
  • Export formats: HTML (light/dark modes) and PDF with full branding
  • Easy switching: python cli/export_branded.py --firm hypernova --deal Blinka --brand collide applies Collide branding to a Hypernova deal
  • System fonts: Works with or without custom font files (supports local .woff2, .ttf files)
  • Documentation: Complete guide in docs/CUSTOM-BRANDING.md

Creating Your Own Brand

Any VC firm can create their own branded exports in 3 simple steps:

  1. Copy the example config:

    cp templates/brand-configs/brand-config.example.yaml templates/brand-configs/brand-yourfirm-config.yaml
  2. Edit with your firm's details:

    company:
      name: "Your VC Firm Name"
      tagline: "Your firm's tagline"
    
    colors:
      primary: "#1a3a52"        # Your brand's primary color (hex code)
      secondary: "#1dd3d3"      # Accent color
      # ... (see example file for all options)
    
    fonts:
      family: "Inter"           # Use system fonts like Inter, Georgia, Arial
      custom_fonts_dir: null    # Or path to custom WOFF2 font files
  3. Export with your brand:

    python export-branded.py memo.md --brand yourfirm

That's it! Your memos will now export with your firm's branding. See docs/CUSTOM-BRANDING.md for detailed customization options including custom fonts, color modes, and troubleshooting.

Version Control System

  • Semantic versioning for memo iterations (v0.0.x → v0.x.0 → vx.0.0)
  • Automatic patch increments for each generation
  • Complete version history with timestamps and scores
  • JSON-tracked state for each version

Quality Standards

  • Follows Hypernova template (10 sections)
  • Style guide enforcement (analytical tone, specific metrics, balanced perspective)
  • Source citation requirements
  • Validation score threshold (8/10) for auto-finalization
  • Detailed improvement suggestions when score < 8

Dual-Template System

Hypernova is a Fund-of-Funds, deploying 40% of capital as LP commitments to solo GPs and emerging managers, and 60% as direct investments into technology startups. The system supports both investment types:

Direct Investment Template (memo-template-direct.md):

  • 10 sections optimized for startup analysis
  • Sections: Executive Summary, Business Overview, Market Context, Team, Technology & Product, Traction & Milestones, Funding & Terms, Risks & Mitigations, Investment Thesis, Recommendation

Fund Commitment Template (memo-template-fund.md):

  • 10 sections optimized for LP diligence
  • Sections: Executive Summary, GP Background & Track Record, Fund Strategy & Thesis, Portfolio Construction, Value Add & Differentiation, Track Record Analysis, Fee Structure & Economics, LP Base & References, Risks & Mitigations, Recommendation

Memo Modes:

  • Justify mode: Retrospective analysis for existing investments - recommendation is always "COMMIT" with rationale explaining the investment decision
  • Consider mode: Prospective analysis for potential investments - recommendation is "PASS/CONSIDER/COMMIT" based on objective analysis

Scorecard Template System

Scorecards codify your firm's proprietary evaluation criteria into structured YAML templates. This ensures AI-generated analysis reflects your actual investment thinking rather than generic LLM output.

Why Scorecards Matter:

  • Generic AI output lacks firm-specific perspective
  • Experienced investors have mental models they apply consistently
  • Scorecards make implicit evaluation criteria explicit and repeatable
  • Teams can align on what matters before AI generates content

Create Your Own Scorecard:

# templates/scorecards/your-firm/your-scorecard.yaml
metadata:
  scorecard_id: "your-firm-evaluation-v1"
  name: "Your Evaluation Framework"
  applicable_types: ["direct", "fund"]  # or just one

scoring:
  scale:
    min: 1
    max: 5  # or 10, or any range

dimension_groups:
  - group_id: "team_quality"
    name: "Team Assessment"
    dimensions: [founder_market_fit, technical_depth, execution_speed]

dimensions:
  founder_market_fit:
    name: "Founder-Market Fit"
    short_description: "How well founders understand the problem space"
    evaluation_guidance:
      questions:
        - "Have founders experienced this problem firsthand?"
        - "Do they have unfair insight into the market?"
      red_flags:
        - "No domain experience"
        - "Thesis based on market reports, not lived experience"
    scoring_rubric:
      5: "Deep personal experience with problem; unique insight"
      3: "Relevant adjacent experience"
      1: "No connection to problem space"

Using Scorecards:

# Generate scorecard for a memo
python cli/generate_scorecard.py "CompanyName"

# Output: scorecard.md in artifact directory with scored dimensions

Scorecard Structure:

  • Dimensions: Individual criteria you evaluate (any number)
  • Groups: Logical groupings of related dimensions (any number)
  • Scoring rubrics: What each score level means for your firm
  • Evaluation guidance: Questions to ask, evidence to seek, red flags to watch

See templates/scorecards/lp-commits_emerging-managers/hypernova-scorecard.yaml for a complete example with 12 dimensions across 3 groups.

Tech Stack

  • Orchestration: LangGraph (Python) for multi-agent coordination
  • LLM: Anthropic Claude Sonnet 4.5 for analysis and writing
  • Web Search: Tavily API (preferred), Perplexity Sonar Pro, or DuckDuckGo (free fallback)
  • Web Scraping: httpx + BeautifulSoup for website parsing
  • CLI: Rich for beautiful terminal output with progress indicators
  • State Management: TypedDict schemas with LangGraph state graphs

Quick Start

Installation

System Dependencies (Recommended)

While Python dependencies install automatically, certain export features require system-level tools. Install these for the best experience:

Pandoc (for Word/HTML exports):

# macOS
brew install pandoc

# Ubuntu/Debian
sudo apt install pandoc

# Windows
choco install pandoc

Note: If not installed, pypandoc will attempt to auto-download, but brew installation is faster and more reliable.

WeasyPrint Dependencies (for PDF exports):

# macOS
brew install cairo pango gdk-pixbuf libffi

# Ubuntu/Debian
sudo apt install libpango-1.0-0 libpangocairo-1.0-0 libgdk-pixbuf2.0-0 libffi-dev

Poppler (for PDF-to-image conversion in deck analysis):

# macOS
brew install poppler

# Ubuntu/Debian
sudo apt install poppler-utils

# Windows
# Download from: https://github.com/oschwartz10612/poppler-windows/releases

Note: Required for Claude Vision API to analyze pitch deck pages as images. Without poppler, deck analysis falls back to text-only extraction.

Python Dependencies

# Install dependencies with uv (requires Python 3.11+)
uv pip install -e . --python /path/to/python3.11

# Or with pip
pip install -e .

Configuration

# Copy example env file
cp .env.example .env

# Edit .env and add your API keys
# Required:
ANTHROPIC_API_KEY=your-claude-key

# Optional (for web search - highly recommended):
TAVILY_API_KEY=your-tavily-key        # Preferred - has domain filtering
PERPLEXITY_API_KEY=your-perplexity-key # Also used for citation enrichment

# Research provider selection (default: tavily)
RESEARCH_PROVIDER=tavily     # Options: tavily, perplexity, duckduckgo

Research Provider Options:

Provider API Key Required Best For
tavily Yes (TAVILY_API_KEY) General research, domain filtering
perplexity Yes (PERPLEXITY_API_KEY) Deep research with citations
duckduckgo No (free!) Free fallback, no API key needed

Automatic Fallback: If your configured provider's API key is missing, the system automatically tries: Tavily → Perplexity → DuckDuckGo.

Get API keys:

Usage

Interactive Mode (Recommended)

The easiest way to use the system — a guided terminal app that discovers your firms, deals, and versions automatically:

memopop

No flags, no file paths, no command knowledge required. Follow the prompts.

Command Line

For scripting or advanced use, the system supports direct CLI invocation with two investment types and two memo modes:

Investment Types:

  • direct: Direct startup investment (default)
  • fund: LP commitment to a venture fund

Memo Modes:

  • consider: Prospective analysis for potential investment (default)
  • justify: Retrospective justification for existing investment
# Basic usage (defaults to: direct + consider)
python -m src.main "Company Name"

# Direct investment examples
python -m src.main "Aalo Atomics" --type direct --mode justify
python -m src.main "Thinking Machines" --type direct --mode consider

# Fund commitment examples
python -m src.main "Pear VC" --type fund --mode justify
python -m src.main "Accel Growth Fund V" --type fund --mode consider

# Interactive mode
python -m src.main

# Force a specific semantic version (instead of auto-incrementing)
python -m src.main "Company Name" --version v0.1.0

# Fresh start: ignore prior artifacts, generate from scratch
python -m src.main "Company Name" --fresh

# Both: fresh start at a specific version with firm
python -m src.main "Company Name" --firm humain --fresh --version v0.1.0

CLI Arguments:

  • --type [direct|fund]: Investment type (default: direct)
  • --mode [justify|consider]: Memo mode (default: consider)
  • --version <version>: Force a specific semantic version (e.g., v0.1.0) instead of auto-incrementing. With --resume, specifies which version to resume.
  • --fresh: Start from a clean slate — ignores prior artifacts and research, generates everything from scratch using only the dataroom, deck, and web research as inputs.
  • --firm <name>: Firm name for firm-scoped IO (e.g., hypernova). Uses io/{firm}/deals/{deal}/ structure.
  • --deal <name>: Deal name (alternative to positional argument)
  • --resume: Resume from last checkpoint (skips completed agents)

Company Data Files (Optional)

You can create a JSON file in data/{CompanyName}.json to provide additional context and configuration:

{
  "type": "direct",
  "mode": "consider",
  "description": "Brief company description for research context",
  "url": "https://company.com",
  "stage": "Series B",
  "deck": "data/CompanyName-deck.pdf",
  "trademark_light": "https://company.com/logo-light.svg",
  "trademark_dark": "https://company.com/logo-dark.svg",
  "notes": "Research focus: team backgrounds, competitive positioning, unit economics",
  "disambiguation": [
    "https://wrong-company.com/",
    "https://similar-name-different-entity.com/"
  ]
}

Field Reference:

Field Type Description
type "direct" or "fund" Investment type (overrides CLI --type)
mode "consider" or "justify" Memo mode (overrides CLI --mode)
description string Brief description to guide research
url string Company website URL
stage string Investment stage (Seed, Series A, etc.)
deck string Path to pitch deck PDF or PowerPoint (.pdf, .pptx) relative to project root
trademark_light string URL or path to light mode company logo
trademark_dark string URL or path to dark mode company logo
notes string Specific research focus areas or instructions
disambiguation array URLs of wrong companies with similar names to exclude from research

Trademark Insertion:

  • If trademark paths are provided, the company logo will be automatically inserted in the memo content after the header metadata
  • Light mode exports use trademark_light, dark mode exports use trademark_dark
  • Trademarks can be URLs (e.g., from company website) or local file paths (e.g., templates/trademarks/company-logo.svg)

Entity Disambiguation:

  • Companies with common names often have multiple entities in search results (e.g., "Mercury" could be the banking startup or an insurance company)
  • The disambiguation array lists URLs of wrong entities that should be excluded from research
  • Research agents will discard data from these domains, preventing entity confusion
  • Example: A company called "Reson8" at reson8.xyz might be confused with reson8.group, reson8media.com, or reson8sms.com - add those to the disambiguation array to exclude them

Example: See data/sample-company.json and data/TheoryForge.json for complete examples.

Output

Each generation creates a versioned artifact directory:

Legacy mode (output/):

output/{Company-Name}-v0.0.x/
├── 1-research.json          # Structured research data
├── 1-research.md            # Human-readable research summary
├── 2-sections/              # Individual section drafts
│   ├── 01-executive-summary.md
│   ├── 02-business-overview.md
│   └── ... (all 10 sections)
├── 3-validation.json        # Validation scores and feedback
├── 3-validation.md          # Human-readable validation report
├── 4-final-draft.md         # Complete memo with citations
└── state.json               # Full workflow state for debugging

Firm-scoped mode (io/{firm}/):

io/{firm}/deals/{deal}/
├── {deal}.json              # Deal configuration
├── inputs/                  # Source materials (decks, datarooms)
├── outputs/                 # Generated memo artifacts
│   └── {Deal}-v0.0.x/       # Same structure as legacy
├── exports/                 # Branded HTML/PDF exports
│   ├── dark/
│   └── light/
└── assets/                  # Deal-specific assets (logos)

Plus versions.json tracking version history across all iterations.

Improving Existing Memos

After generating a memo, you can improve individual sections without regenerating the entire memo. This is useful when:

  • One section is weak or missing details
  • You want to add more citations to a specific section
  • You need to strengthen analysis in a particular area
  • Research data has been updated since generation

Section Improvement with Perplexity Sonar Pro

The improve-section.py script uses Perplexity Sonar Pro for real-time research and automatic citation addition.

Features:

  • Real-time web research for up-to-date information
  • Automatic citation addition (Obsidian-style [^1], [^2])
  • Quality source selection (TechCrunch, Crunchbase, industry reports)
  • Automatic final draft reassembly after improvement
  • One-step process (no separate citation enrichment needed)

Usage:

# Activate venv first
source .venv/bin/activate

# Improve a specific section
python improve-section.py "Avalanche" "Team"

# Specify version
python improve-section.py "Avalanche" "Market Context" --version v0.0.1

# Use direct path to artifact directory
python improve-section.py output/Avalanche-v0.0.1 "Technology & Product"

Section Names: Use the section names as they appear in 2-sections/ for the memo you're improving (e.g., "Team", "Market Context"). Section names are defined by your outline/template configuration.

Output:

✓ Loading artifacts from: output/Avalanche-v0.0.1/
✓ Loaded research data
✓ Loaded 10 existing sections

Improving section: Team
  Using Perplexity Sonar Pro for real-time research...

✓ Section improved with 11 new citations added
✓ Saved to: output/Avalanche-v0.0.1/2-sections/04-team.md

Reassembling final draft...
✓ Final draft reassembled: output/Avalanche-v0.0.1/4-final-draft.md

Citations added: 11

Next steps:
  1. Review improved section in: output/Avalanche-v0.0.1/2-sections/
  2. View complete memo: output/Avalanche-v0.0.1/4-final-draft.md
  3. Export to HTML: python export-branded.py output/Avalanche-v0.0.1/4-final-draft.md --brand hypernova

Requirements:

  • PERPLEXITY_API_KEY must be set in .env file
  • Existing artifact directory from a previous memo generation

Benefits:

  • Faster: ~60 seconds vs. 10+ minutes for full regeneration
  • Cheaper: ~$1.00 per section vs. ~$10.00 for full memo
  • Targeted: Improve only what needs improvement
  • Preserves: Other sections remain unchanged

Reassembling Final Draft

If the final draft gets corrupted or you need to manually reassemble after editing section files:

# Reassemble with citation renumbering and TOC generation
python -m cli.assemble_draft "Sava"
python -m cli.assemble_draft "Sava" --version v0.0.2

This ensures:

  • Citations renumbered globally (no collisions)
  • All citation references consolidated at document end
  • Table of Contents is present and accurate

All section improvement tools automatically call this after their changes.

Export

The system supports multiple export formats with branding and citation preservation.

Tool Format Command
md2docx.py Word (.docx) python md2docx.py output/Company-v0.0.1/4-final-draft.md
export-branded.py HTML (light) python export-branded.py output/Company-v0.0.1/4-final-draft.md
export-branded.py HTML (dark) python export-branded.py output/Company-v0.0.1/4-final-draft.md --mode dark
export-branded.py PDF python export-branded.py output/Company-v0.0.1/4-final-draft.md --pdf
export-all-modes.sh Batch (all memos) ./export-all-modes.sh

All exports preserve inline citations, footnotes, and markdown formatting.

For detailed export options, custom branding, and troubleshooting, see:

  • exports/EXPORT-GUIDE.md - Comprehensive export documentation
  • docs/CUSTOM-BRANDING.md - Multi-brand configuration guide

Architecture

Workflow

The system uses 18 specialized agents orchestrated via LangGraph. The complete pipeline:

┌──────────────────────────────────────────────────────────────────────────────┐
│                           MEMO GENERATION PIPELINE                           │
│                        (18 agents via LangGraph)                             │
└──────────────────────────────────────────────────────────────────────────────┘

┌──────────────┐
│  Supervisor  │ ← Coordinates workflow via LangGraph
└──────┬───────┘
       │
   ┌───┴────────────────┐
   │ 1. Deck Analyst    │ ← Extract info from pitch deck PDF/PPTX (if available)
   └───┬────────────────┘   Saves: 0-deck-analysis.json, 0-deck-analysis.md
       │
   ┌───┴────────────────┐
   │ 2. Research        │ ← Web search (Tavily: 4 queries) + synthesis
   └───┬────────────────┘   Saves: 1-research.json, 1-research.md
       │
   ┌───┴────────────────┐
   │ 3. Section Research│ ← Section-specific Perplexity research with citations
   └───┬────────────────┘   Saves: 1-research/*.md (per-section research)
       │
   ┌───┴────────────────┐
   │ 4. Writer          │ ← Draft memo (10 sections), polish section research
   └───┬────────────────┘   Saves: 2-sections/*.md (10 files)
       │
   ╔═══╧════════════════════════════════════════════════════════╗
   ║              ENRICHMENT PIPELINE (4 agents)                ║
   ╠════════════════════════════════════════════════════════════╣
   ║  5. Trademark  → 6. Socials → 7. Links → 8. Visualizations ║
   ╚═══╤════════════════════════════════════════════════════════╝
       │              Updates 2-sections/*.md with logos, LinkedIn, hyperlinks
       │
   ┌───┴────────────────┐
   │ 9. Citation Enrich │ ← Add inline citations (Perplexity Sonar Pro)
   └───┬────────────────┘   Adds [^1], [^2], etc. per section
       │
   ┌───┴────────────────┐
   │ 10. TOC Generator  │ ← Generate Table of Contents with anchor links
   └───┬────────────────┘   Insert TOC after header, before first section
       │
   ┌───┴────────────────┐
   │ 11. Revise Summary │ ← Rewrite Executive Summary + Closing Assessment
   └───┬────────────────┘   Extract metrics from body, ensure accurate bookends
       │
   ╔═══╧════════════════════════════════════════════════════════╗
   ║            CITATION CLEANUP PIPELINE (2 agents)            ║
   ╠════════════════════════════════════════════════════════════╣
   ║  12. Remove Invalid Sources → 13. Citation Assembly        ║
   ║      - Validate URLs (HEAD)     - Renumber [^1][^2][^3]... ║
   ║      - Remove 404s/410s         - Consolidate to ONE block ║
   ║      - Detect hallucinations    - Remove per-section defs  ║
   ╚═══╤════════════════════════════════════════════════════════╝
       │              Cleans 1-research/ and 2-sections/, assembles 6-final-draft.md
       │
   ┌───┴────────────────┐
   │ 14. Citation Valid │ ← Validate date accuracy, detect duplicates
   └───┬────────────────┘   Check remaining URLs, ensure proper formatting
       │
   ┌───┴────────────────┐
   │ 15. Fact Checker   │ ← Verify claims against research sources
   └───┬────────────────┘   Identify unsourced metrics, hallucinated data
       │
   ┌───┴────────────────┐
   │ 16. Validator      │ ← Score 0-10, identify issues
   └───┬────────────────┘   Saves: 3-validation.json, 3-validation.md
       │
   ┌───┴────────────────┐
   │ 17. Scorecard      │ ← Evaluate against firm's scorecard template
   └───┬────────────────┘   Saves: 5-scorecard/*.md
       │
   ┌───┴────────────────┐
   │ 18. Integrate Score│ ← Integrate scorecard into section 8, reassemble
   └───┬────────────────┘
       │
   ┌───┴───────────────┐
   │   Score >= 8?     │
   └───┬───────────┬───┘
       │           │
   ┌───┴────┐  ┌──┴──────────┐
   │Finalize│  │Human Review │
   └────────┘  └─────────────┘
   Both save: 6-{Company}-v0.0.x.md, state.json

Mermaid Workflow Diagram

flowchart TD
    subgraph Input
        A[Start] --> B[Deck Analyst]
    end

    subgraph Research Phase
        B --> C[Research Agent]
        C --> D[Section Research]
    end

    subgraph Writing Phase
        D --> E[Writer Agent]
    end

    subgraph Enrichment Pipeline
        E --> F[Trademark Enrichment]
        F --> G[Socials Enrichment]
        G --> H[Link Enrichment]
        H --> I[Visualization Enrichment]
    end

    subgraph Citation Pipeline
        I --> J[Citation Enrichment]
        J --> K[TOC Generator]
        K --> L[Revise Summaries]
    end

    subgraph Citation Cleanup
        L --> M[Remove Invalid Sources]
        M --> N[Citation Assembly]
    end

    subgraph Validation Phase
        N --> O[Citation Validator]
        O --> P[Fact Checker]
        P --> Q[Validator]
    end

    subgraph Scorecard Phase
        Q --> R[Scorecard Evaluator]
        R --> S[Integrate Scorecard]
    end

    subgraph Output
        S --> T{Score >= 8?}
        T -->|Yes| U[Finalize]
        T -->|No| V[Human Review]
        U --> W[Final Draft]
        V --> W
    end
Loading

State Management

MemoState = {
    "company_name": str,
    "investment_type": Literal["direct", "fund"],  # Type of investment
    "memo_mode": Literal["justify", "consider"],   # Memo purpose
    "research": ResearchData,      # Web search results + synthesis
    "draft_sections": Dict,         # Drafted memo sections
    "validation_results": Dict,     # Scores and feedback
    "overall_score": float,         # 0-10 quality score
    "revision_count": int,          # Iteration tracking
    "final_memo": str,              # Finalized content
    "messages": List[str]           # Agent outputs
}

Project Structure

investment-memo-orchestrator/
├── src/
│   ├── agents/
│   │   ├── deck_analyst.py           # Pitch deck analysis (PDF + PowerPoint)
│   │   ├── researcher.py             # Basic research (no web search)
│   │   ├── research_enhanced.py      # Web search + synthesis
│   │   ├── perplexity_section_researcher.py  # Section-specific research
│   │   ├── writer.py                 # Memo drafting
│   │   ├── trademark_enrichment.py   # Company logo insertion
│   │   ├── socials_enrichment.py     # LinkedIn link addition
│   │   ├── link_enrichment.py        # Organization hyperlinks
│   │   ├── visualization_enrichment.py # Charts/graphs (disabled)
│   │   ├── citation_enrichment.py    # Citation addition (Perplexity)
│   │   ├── toc_generator.py          # Table of Contents generation
│   │   ├── revise_summary_sections.py # Rewrite Executive Summary + Closing
│   │   ├── remove_invalid_sources.py # URL validation, remove 404s/hallucinations
│   │   ├── citation_assembly.py      # Global citation renumbering & consolidation
│   │   ├── citation_validator.py     # Citation accuracy validation
│   │   ├── fact_checker.py           # Fact verification agent
│   │   ├── validator.py              # Quality validation
│   │   ├── scorecard_agent.py        # 12-dimension emerging manager scorecard
│   │   ├── scorecard_evaluator.py    # Scorecard evaluation agent
│   │   ├── portfolio_listing_agent.py # Portfolio company extraction
│   │   └── dataroom/                 # Dataroom Analyzer Agent System
│   │       ├── __init__.py           # Package exports
│   │       ├── analyzer.py           # Main orchestrator
│   │       ├── dataroom_state.py     # TypedDict schemas
│   │       ├── document_scanner.py   # Directory scanning
│   │       ├── document_classifier.py # 3-stage classification
│   │       └── extractors/           # Specialized extractors
│   ├── state.py                      # TypedDict schemas
│   ├── workflow.py                   # LangGraph orchestration
│   ├── artifacts.py                  # Artifact trail system
│   ├── versioning.py                 # Version tracking system
│   ├── cli/                          # Module CLI commands
│   │   ├── __init__.py
│   │   └── revise_summaries.py       # Revise Executive Summary + Closing
│   ├── paths.py                      # Firm-scoped path resolution
│   ├── branding.py                   # Brand config loading
│   ├── scorecard_loader.py           # Scorecard loading
│   └── main.py                       # CLI entry point
├── cli/
│   ├── export_branded.py             # HTML/PDF export with branding
│   ├── resume_from_interruption.py   # Resume interrupted generation
│   ├── improve_section.py            # Section improvement
│   ├── score_memo.py                 # Scorecard generation
│   ├── recompile_memo.py             # Memo recompilation
│   ├── refocus_section.py            # Section refocusing
│   └── html-to-pdf.sh                # HTML to PDF conversion
├── io/                               # Firm-scoped IO (gitignored)
│   ├── README.md                     # Firm-scoped IO documentation
│   └── {firm}/                       # e.g., hypernova, emerge
│       ├── configs/                  # Brand configs
│       ├── templates/                # Outlines, scorecards
│       └── deals/                    # Deal data and outputs
├── templates/
│   ├── outlines/                     # YAML content outlines
│   ├── scorecards/                   # Evaluation scorecards
│   ├── brand-configs/                # Shared brand configurations
│   ├── memo-template-direct.md       # Direct investment template
│   ├── memo-template-fund.md         # Fund commitment template
│   └── style-guide.md                # Writing standards
├── docs/
│   ├── CUSTOM-BRANDING.md            # Brand configuration guide
│   ├── COMMANDS_CHEAT_SHEET.md       # CLI reference
│   └── WEB_SEARCH_SETUP.md           # Search provider guide
├── changelog/
│   └── releases/                     # Release notes
├── output/                           # Legacy output directory
├── data/                             # Legacy company data
└── tests/                            # Unit tests

CLI Tools Reference

Standalone tools for post-generation improvements and exports. All tools support --firm and --deal flags for firm-scoped IO.

Tool Purpose Usage
cli/resume_from_interruption.py Resume interrupted generation python cli/resume_from_interruption.py --firm hypernova --deal Blinka
cli/sanitize_commentary.py Extract LLM process commentary to internal notes python cli/sanitize_commentary.py --firm hypernova --deal Blinka
cli/improve_section.py Improve a section with Perplexity research python cli/improve_section.py --firm hypernova --deal Blinka "Team"
cli/improve_team_section.py Deep team research (LinkedIn + web) python cli/improve_team_section.py --firm hypernova --deal Blinka
cli/assemble_draft.py Rebuild final draft from sections python cli/assemble_draft.py --firm hypernova --deal Blinka
src/cli/revise_summaries.py Rewrite Executive Summary + Closing Assessment python -m src.cli.revise_summaries "Company" --firm dark-matter
cli/rewrite_key_info.py Apply YAML corrections across sections python cli/rewrite_key_info.py "Company" corrections.yaml
cli/generate_scorecard.py Generate scorecard from template python cli/generate_scorecard.py "Company"
cli/score_memo.py Score memo with scorecard python cli/score_memo.py --firm hypernova --deal Blinka
cli/evaluate_memo.py Re-run validation on existing memo python cli/evaluate_memo.py "Company"
cli/refocus_section.py Refocus section with new guidance python cli/refocus_section.py --firm hypernova --deal Blinka "Section"
cli/recompile_memo.py Recompile memo from sections python cli/recompile_memo.py --firm hypernova --deal Blinka
cli/generate_tables.py Generate markdown tables from state data python -m cli.generate_tables "Company" --firm humain
cli/export_branded.py Export to branded HTML/PDF python cli/export_branded.py --firm hypernova --deal Blinka --pdf
cli/html-to-pdf.sh Convert HTML to PDF bash cli/html-to-pdf.sh path/to/memo.html
cli/md2docx.py Export to Word (.docx) python md2docx.py memo.md

Pipeline Agents Reference

The main workflow (python -m src.main) orchestrates 26 agents through a LangGraph state machine. Agents execute sequentially, with two anti-hallucination validation gates and a conditional routing decision at the end.

Workflow Diagram

                         ┌─────────────────────────────────────────────────┐
                         │           PHASE 1: DATA GATHERING               │
                         │                                                 │
                         │  ┌───────────┐  ┌──────────────┐  ┌──────────┐ │
                         │  │ 1.Dataroom│─▶│2.Deck Analyst│─▶│3.Research│ │
                         │  └───────────┘  └──────────────┘  └────┬─────┘ │
                         │                                        │       │
                         │  ┌──────────────────┐  ┌───────────────┴─────┐ │
                         │  │5.Competitive     │◀─│4. Section Research  │ │
                         │  │  Researcher      │  │  (per-section w/    │ │
                         │  └────────┬─────────┘  │   Perplexity)       │ │
                         │           │            └─────────────────────┘ │
                         │  ┌────────▼─────────┐                         │
                         │  │6.Competitive     │                         │
                         │  │  Evaluator       │                         │
                         │  └────────┬─────────┘                         │
                         └───────────┼─────────────────────────────────────┘
                                     │
                         ┌───────────▼─────────────────────────────────────┐
                         │        PHASE 2: CITATION ENRICHMENT             │
                         │                                                 │
                         │  ┌───────────────────┐  ┌─────────────────────┐ │
                         │  │7. Citation Enrich │─▶│ 8. GATE 1: Cleanup  │ │
                         │  │  (on 1-research/) │  │  Research Citations │ │
                         │  └───────────────────┘  └──────────┬──────────┘ │
                         └────────────────────────────────────┼────────────┘
                                                              │
                         ┌────────────────────────────────────▼────────────┐
                         │          PHASE 3: WRITING & ENRICHMENT          │
                         │                                                 │
                         │  ┌────────┐  ┌────────────────┐  ┌───────────┐ │
                         │  │9.Writer│─▶│10.Inject Deck  │─▶│11.Enrich  │ │
                         │  │(10 sec)│  │   Images       │  │ Trademark │ │
                         │  └────────┘  └────────────────┘  └─────┬─────┘ │
                         │                                        │       │
                         │  ┌────────────┐  ┌──────────────┐  ┌───▼─────┐ │
                         │  │14.Enrich   │◀─│13.Link       │◀─│12.Socials│ │
                         │  │  Tables    │  │  Enrichment  │  │ Enrich  │ │
                         │  └──────┬─────┘  └──────────────┘  └─────────┘ │
                         │         │                                       │
                         │  ┌──────▼───────┐  ┌──────────────┐  ┌───────┐ │
                         │  │15.Generate   │─▶│16.Enrich     │─▶│17.TOC │ │
                         │  │  Diagrams    │  │  Visuals     │  │       │ │
                         │  └──────────────┘  └──────────────┘  └───┬───┘ │
                         │                                          │     │
                         │  ┌──────────────────┐                    │     │
                         │  │18.Revise Summary │◀───────────────────┘     │
                         │  │   Sections       │                          │
                         │  └────────┬─────────┘                          │
                         └───────────┼────────────────────────────────────┘
                                     │
                         ┌───────────▼─────────────────────────────────────┐
                         │         PHASE 4: ASSEMBLY & VALIDATION          │
                         │                                                 │
                         │  ┌─────────────────┐  ┌───────────────────────┐ │
                         │  │19. GATE 2:      │─▶│20. Citation Assembly  │ │
                         │  │ Cleanup Sections│  │ (consolidate+renumber)│ │
                         │  └─────────────────┘  └───────────┬───────────┘ │
                         │                                   │             │
                         │  ┌──────────────────┐  ┌──────────▼──────────┐  │
                         │  │22. Fact Checker  │◀─│21. Citation         │  │
                         │  │                  │  │    Validator        │  │
                         │  └────────┬─────────┘  └─────────────────────┘  │
                         │           │                                     │
                         │  ┌────────▼─────────┐  ┌─────────────────────┐  │
                         │  │23. Validator     │─▶│24. Scorecard        │  │
                         │  │  (score 0-10)    │  │    Evaluator        │  │
                         │  └──────────────────┘  └──────────┬──────────┘  │
                         │                                   │             │
                         │  ┌────────────────────────────────▼──────────┐  │
                         │  │25. Integrate Scorecard                    │  │
                         │  │  (insert into section 8, reassemble draft)│  │
                         │  └────────────────────┬──────────────────────┘  │
                         └───────────────────────┼─────────────────────────┘
                                                 │
                                        ┌────────▼────────┐
                                        │  Score >= 8 ?   │
                                        └───┬─────────┬───┘
                                    yes     │         │     no
                                  ┌─────────▼──┐  ┌───▼──────────┐
                                  │26.Finalize │  │27.Human      │
                                  │            │  │   Review     │
                                  └─────────┬──┘  └───┬──────────┘
                                            │         │
                                            ▼         ▼
                                          ┌─────────────┐
                                          │     END     │
                                          └─────────────┘

Agent Reference Table

# Agent Node File Purpose
1 Dataroom Analyzer dataroom dataroom/ Scan and extract data from dataroom documents (skips if none)
2 Deck Analyst deck_analyst deck_analyst.py Extract info from pitch decks via Claude Vision (skips if none)
3 Research research research_enhanced.py Web search via Tavily/Perplexity, synthesize findings
4 Section Research section_research perplexity_section_researcher.py Per-section deep research with Perplexity citations
5 Competitive Researcher competitive_researcher competitive_landscape_researcher.py Discover competitor candidates via multi-query Perplexity search
6 Competitive Evaluator competitive_evaluator competitive_landscape_evaluator.py Classify competitors (direct/indirect/adjacent), run gap analysis
7 Citation Enrichment cite citation_enrichment.py Add inline citations to research files via Perplexity Sonar Pro
8 GATE 1: Cleanup Research cleanup_research workflow.py + remove_invalid_sources.py Validate URLs in 1-research/, remove 404s and hallucinations before writer
9 Writer draft writer.py Draft 10 sections one at a time from outline/template
10 Inject Deck Images inject_deck_images inject_deck_images.py Place deck screenshots into section files (max 2 per image)
11 Trademark Enrichment enrich_trademark trademark_enrichment.py Insert company logo/trademark into header
12 Socials Enrichment enrich_socials socials_enrichment.py Add LinkedIn profile links to team members
13 Link Enrichment enrich_links link_enrichment.py Add hyperlinks to organizations, investors, partners
14 Table Generator generate_tables table_generator.py Generate markdown tables from structured data (funding, team, market, traction)
15 Diagram Generator generate_diagrams diagram_generator.py Create visual diagrams (TAM/SAM/SOM concentric circles, etc.)
16 Visualization Enrichment enrich_visualizations visualization_enrichment.py Search for and embed relevant visualizations (temporarily disabled)
17 TOC Generator toc toc_generator.py Generate Table of Contents with working anchor links
18 Revise Summaries revise_summaries revise_summary_sections.py Rewrite Executive Summary and Closing Assessment with accurate metrics from full draft
19 GATE 2: Cleanup Sections cleanup_sections remove_invalid_sources.py Validate URLs in 2-sections/, remove invalid citations before assembly
20 Citation Assembly assemble_citations citation_assembly.py Consolidate all citations, renumber globally, assemble final draft
21 Citation Validator validate_citations citation_validator.py Validate citation accuracy, check dates, detect duplicates
22 Fact Checker fact_check fact_checker.py Verify claims against research sources, flag unsourced metrics
23 Validator validate validator.py Score memo quality 0-10, provide specific feedback
24 Scorecard Evaluator scorecard scorecard_evaluator.py Evaluate against firm's 12Ps scorecard template
25 Integrate Scorecard integrate_scorecard workflow.py Insert scorecard into section 8, reassemble final draft
26 Finalize finalize workflow.py Verify final draft, save state snapshot (score >= 8)
27 Human Review human_review workflow.py Prepare memo for human review with issues and suggestions (score < 8)

Anti-Hallucination Gates

The pipeline includes two validation gates that prevent hallucinated citations from propagating:

  • GATE 1 (after citation enrichment, before writer): Validates all URLs in 1-research/ files. The writer never sees unverified citations.
  • GATE 2 (after revise summaries, before assembly): Validates all URLs in 2-sections/ files. Catches any issues introduced during enrichment or revision.

Standalone Agents Reference

Agents with their own CLI entry points for specialized tasks.

Agent File Purpose
Internal Comments Sanitizer internal_comments_sanitizer.py Extract LLM process commentary to internal notes
Scorecard Agent scorecard_agent.py Generate scorecards from YAML templates
Portfolio Listing portfolio_listing_agent.py Extract portfolio companies from fund memos
Key Info Rewrite key_info_rewrite.py Propagate fact corrections across sections
Dataroom Analyzer dataroom/analyzer.py Scan and extract data from dataroom documents

For detailed documentation, see:

  • docs/COMMANDS_CHEAT_SHEET.md - Complete CLI reference with all options and examples
  • docs/CASUAL_USER_GUIDE.md - Comprehensive usage guide
  • docs/SETUP.md - Installation and configuration
  • docs/TROUBLESHOOTING.md - Common issues and solutions

Versioning & Releases

This project uses git-based semantic versioning with setuptools-scm:

  • Version is automatically derived from git tags (no manual file updates needed)
  • Tags follow semantic versioning: v0.1.0, v0.2.0, v1.0.0, etc.
  • Between releases, version includes commit count: 0.1.1.dev3 (3 commits after v0.1.0)

Creating a New Release

# 1. Ensure all changes are committed
git status

# 2. Create and push a new tag (follows semantic versioning)
git tag v0.2.0 -m "Release v0.2.0: Brief description of changes"
git push origin v0.2.0

# 3. Version automatically updates to 0.2.0
python -c "from src import __version__; print(__version__)"
# Output: 0.2.0

# 4. Create GitHub release from tag (optional)
gh release create v0.2.0 --generate-notes

Semantic Versioning Guide

  • Patch (v0.1.1): Bug fixes, minor improvements
  • Minor (v0.2.0): New features, backward-compatible changes
  • Major (v1.0.0): Breaking changes, major milestones

Checking Current Version

# From Python
python -c "from src import __version__; print(__version__)"

# From command line
git describe --tags

Status

Completed ✅

  • Create a "trail" of the collected information as structured output or markdown files
  • Assure that citations are retained in the final output with proper attribution
  • Terminal progress indicators and status messages to track workflow
  • Find a way to include direct markdown links to team's LinkedIn profiles
  • Find a way to "add" links to important organizations, such as government bodies, co-investors or previous investors, etc
  • Find a way to include any public charts, graphs, diagrams, or visualizations from the company's website or other sources
  • Allow arguments for customizing the memo template based on a "Direct Investment" or an "LP Commitment" that leads to changes in the template being generated.
  • Allow arguments for specifying whether the investment has already been decided (even wired already) or is currently being considered.

Improvements that need more testing

  • Ability for users to run a command to improve or enhance a certain section rather than running the whole memo generation orchestration.
  • Ability for users to run a command that adds or corrects crucial information that influences the entire content of the memo.
    Example: the Avalanche memo output says Avalanche is raising a $50M fund, but they were raising a $10M fund. In many different places it discusses the fund size. Therefore, this correction influences the entire content of the memo.

Remaining Enhancements

  • Elegant use of Trademarks of both authoring investment firm and target company (v0.3.0)
    • VC firm logo in HTML header via brand config
    • Company trademark in memo body via deal config
    • Dark/light mode support for both
  • Specialized research strategies per investment type (e.g., GP track record analysis for funds)
  • Specialized section outline per fund or direct investment type.
  • Agent that can screenshot the deck if provided and include relevant screenshots in relevant sections in the memo.

Current Capabilities ✅

  • Multi-agent orchestration (Research → Write → Cite → Validate)
  • Web search integration (Tavily for research, Perplexity Sonar Pro for citations)
  • Citation-Enrichment Agent with inline [^1], [^2] format
  • Artifact trail system (research, sections, validation, final draft)
  • Complete workflow transparency with 16+ files per generation
  • Hypernova template following (10 sections)
  • Style guide enforcement
  • Rigorous validation with specific feedback
  • Semantic versioning (v0.0.x)
  • Version history tracking
  • Rich CLI with progress indicators
  • State export (JSON)

Up Next

Internal Comments Containerization ✅

LLMs have a tendency to include meta-commentary in generated content ("Let me search for...", "Note: Unable to find...", "If you have the actual content, please share..."). Despite aggressive prompt engineering, this process commentary leaks into final output and is inappropriate for external-facing documents.

Implemented: The cli/sanitize_commentary.py CLI and src/agents/internal_comments_sanitizer.py agent:

  1. Detects leaked commentary using 15+ regex patterns
  2. Extracts internal notes to a separate 2-sections-internal/ folder
  3. Consolidates process notes into 4-internal-notes.md
  4. Automatically reassembles clean final draft

Usage:

# Sanitize a memo
python cli/sanitize_commentary.py --firm hypernova --deal Blinka

# Preview what would be extracted without modifying
python cli/sanitize_commentary.py --firm hypernova --deal Blinka --preview

See context-vigilance/Containerizing-Internal-Comments-and-Recommendations-for-Consideration.md for complete specification.

Table Generator Agent

A specialized agent to identify and generate tables from data that would be better presented in tabular form:

  • Temporal series (funding rounds, milestones, metrics over time)
  • Entity comparisons (competitors, investors, team members)
  • Structured data from decks and datarooms

See context-vigilance/Table-Generator-Agent-Spec.md for complete specification.

Design Documents (context-v)

All specifications, blueprints, issue resolutions, and reference docs live in context-v/. Each file has YAML frontmatter with title, category, dates, and tags.

Specifications & Blueprints

Created Category Title File
2025-11-16 Reference Example Memos Document-Examples.md
2025-11-16 Prompts Generate Investment Memo for Portfolio Company Generate-Investment-Memo-for-Portfolio-Company.md
2025-11-16 Architecture Multi-Agent Orchestration for Investment Memo Generation Multi-Agent-Orchestration-for-Investment-Memo-Generation.md
2025-11-17 Reference Citation Spacing Improvements Citation-Reminders.md
2025-11-17 Reference Dark Mode Export Guide Export-Style-Templates.md
2025-11-19 Prompts Improve Investment Memo Output Improving-Memo-Output.md
2025-11-21 Architecture Format Memo According to Outline Input Format-Memo-According-to-Template-Input.md
2025-11-22 Blueprint Preferred Sources Implementation Summary Anti-Hallucination-Fact-Checker-Agent.md
2025-11-26 Architecture Dataroom Analyzer Agent System Plan Dataroom-Analyzer-Agent.md
2025-11-27 Architecture Model Scorecard Agent and Template System Model-Scorecard-Agent-and-Template-System.md
2025-11-27 Infrastructure Git Submodules for Private Data and Exports Git-Submodules-for-Private-Data-and-Exports.md
2025-11-27 Specification Portfolio Listing Agent and Current Portfolio Section Portfolio-Listing-Agent-and-Current-Portfolio-Section.md
2025-11-28 Blueprint 12Ps Framework Integration Plan 12Ps-Integration-Plan.md
2025-11-28 Specification Containerizing Risk Assessments and Diligence Skepticism Containerizing-Risk-Assessments-and-Diligence-Skepticism.md
2025-11-28 Specification Exploring a Simple GUI for the Terminal Averse Exploring a simple GUI for the terminal averse.md
2025-11-28 Specification Introducing a GUI Native Desktop with Swift Introducing-a-GUI-Native-Desktop-with-Swift.md
2025-11-28 Specification Reorder and Edit Direct Outline Reorder-and-Edit-Direct-Outline.md
2025-12-02 Blueprint Firm and Deal-Based File Organization System Firm-and-Deal-based-File-Organization-System.md
2025-12-02 Specification Fix Firm-Scoped Output Paths Fix-Firm-Scoped-Output-Paths.md
2025-12-02 Specification MemoPop Landing Page Specification MemoPop-Landing-Page-Sepcification.md
2025-12-05 Specification Containerizing Internal Comments and Recommendations Containerizing-Internal-Comments-and-Recommendations-for-Consideration.md
2025-12-05 Specification Disambiguation Management Across Agents Disambiguation-Management-across-Agents.md
2025-12-05 Specification Post-Generation Quality Agents Post-Generation-Quality-Agents.md
2025-12-05 Specification Table Generator Agent Specification Table-Generator-Agent-Spec.md
2025-12-08 Specification Papermark Self-Hosted Dataroom Deployment Papermark-Self-Hosted-Dataroom-Deployment.md
2025-12-09 Agent Docs Deck Analyzer Agent Deck-Analyzer-Agent.md
2025-12-14 Specification Agent Should Reorder and Organize Citations on Assembly An-Agent-should-Reorder-and-Organize-Citations-on-Assembly.md
2025-12-14 Specification Faked Sources from Perplexity Faked-Sources-from-Perplexity.md
2025-12-15 Specification Anti-Hallucination: Source Validation and Removal Anti-Hallucination-Source-Validation-and-Removal.md
2025-12-26 Specification Vision for Production-Grade MemoPop Monorepo Vision-for-Production-Grade-Memopop-Monorepo.md
2026-01-27 Specification PDF Parser Agent Specification PDF-Parser-Agent-Spec.md
2026-03-09 Architecture Introducing a Diagram Generator Agent Introducing-a-Diagram-Generator-Agent.md
2026-03-10 Specification Introducing a Competitive Landscape Research and Evaluation System Introducing-a-Competitive-Landscape-Research-and-Evaluation-System.md
2026-03-10 Specification Introducing a Content Density Mode System Introducing-a-Content-Density-Mode-System.md
2026-03-10 Specification Introducing a Legal Doc Comparator Introducing-a-Legal-Doc-Comparator.md
2026-03-10 Specification Introducing a Table Generator Agent Introducing-a-Table-Generator-Agent.md
2026-03-10 Specification Introducing an Augment Research Writer Agent Introducing-an-Augment-Research-Writer-Agent.md
2026-03-10 Specification Introducing an Enrich Visualizations Agent Introducing-an-Enrich-with-Visualizations-Search-Agent.md
2026-03-13 Specification Introducing a KPI Extractor Agent Introducing-a-KPI-Extractor-Agent.md

Issue Resolutions

Created Title File
2025-11-21 Getting Sonar Pro to Work in First Research Agent issue-resolution/Getting-Sonar-Pro-to-work-in-first-Research-Agent.md
2025-11-21 Perplexity API Fix - Zero Balance Issue issue-resolution/Perplexity-as-Default-Research-Agent.md
2025-11-22 Premium Data Sources Integration Plan issue-resolution/Premium-Data-Sources-Integration-Plan.md
2025-11-22 Perplexity Premium API Calls - Complete Reference issue-resolution/Perplexity-Premium-API-Calls-Reference.md
2025-11-22 Preventing Hallucinations in Memo Generation issue-resolution/Preventing-Hallucinations-in-Memo-Generation.md
2025-11-22 Tier 1 Anti-Hallucination Test Results: Work Back AI issue-resolution/Tier-1-Test-Results-WorkBack.md
2025-11-23 Resume Workflow After Interruption issue-resolution/Resume-Workflow-After-Interruption.md
2025-11-27 Getting AI to Refocus When Web Research is Empty issue-resolution/Getting-AI-to-Refocus-when-Web-Research-is-empty.md
2025-12-05 Deck Data Flow Gap Causes Hallucination issue-resolution/Deck-Data-Flow-Gap-Causes-Hallucination.md
2025-12-15 Correct Citation Pipeline Accuracy in Multi-Agent Research issue-resolution/Correct-Citation-Pipeline-Accuracy-in-Multi-Agent-Research.md

Reminders

Created Title File
2025-12-26 Frontmatter Standards for Context Files reminders/Frontmatter-Standards-for-Context-Files.md
2026-01-28 Extended Markdown Citation System Syntax reminders/Extended-Markdown-Citation-System-Syntax.md

Roadmap

  • Resume from interruption (v0.3.0 - cli/resume_from_interruption.py)
  • Multi-tenant firm isolation (v0.3.0 - firm-scoped IO)
  • Internal comments containerization (v0.3.4 - cli/sanitize_commentary.py)
  • Table generator agent
  • LangGraph native checkpointing
  • Web UI (Streamlit/Gradio)
  • Human-in-the-loop checkpoints
  • Batch processing for portfolio analysis

Contributing

This is an internal Hypernova Capital research project. For questions or suggestions, contact the development team.

License

Proprietary - Hypernova Capital


Project Sponsored By

Hypernova Capital

Investing in frontier technology companies at the intersection of climate, energy, and AI.


Last updated: 2025-12-15 Version: v0.3.6 (Citation Cleanup Pipeline: Remove Invalid Sources + Citation Assembly) Status: Production-ready with multi-tenant support