A command-line OCR tool using Mistral AI for document processing with intelligent filename generation.
uv tool install --editable . # Install globally
echo "MISTRAL_API_KEY=your_key" > ~/.env # Set API key
ocr run document.pdf --rename # Process and rename- 📄 Document OCR: Extract text from PDFs, images, and Office documents (PPTX, DOCX)
- 🤖 Intelligent Filename Generation: Automatically generates descriptive filenames from document content
- 💾 Smart Caching: Reuses OCR results to avoid redundant API calls
- 🎯 Flexible Page Selection: Process specific pages or page ranges
- 📝 Markdown Output: Save results with YAML frontmatter metadata
- 🖼️ Image Extraction: Automatically extracts and saves embedded images
- 🔧 System-Wide Installation: Install once, use anywhere
- Python 3.12 or higher
- uv package manager
# Clone the repository
git clone <repository-url>
cd OCR
# Install globally with uv
uv tool install --editable .
# Verify installation
ocr --version# Clone and install in development mode
git clone <repository-url>
cd OCR
uv pip install -e .The tool requires a Mistral AI API key. On first run, it will automatically create a configuration file at:
Windows: C:\Users\{your-username}\.env
Linux/Mac: ~/.env
Edit this file and add your API key:
# OCR Configuration File
# Get your API key from: https://console.mistral.ai/
MISTRAL_API_KEY=your-mistral-api-key-hereNote: You can also place a .env file in your current working directory, which will take precedence over the home directory configuration.
- Visit Mistral AI Console
- Sign up or log in
- Navigate to API Keys section
- Create a new API key
- Copy the key to your
.envfile
# Process a single file
ocr document.pdf
# Process multiple files
ocr invoice1.pdf invoice2.pdf receipt.pdf
# Process all files in a folder
ocr ./invoices/
# Process files matching a pattern (shell expansion)
ocr *.pdf
ocr docs/**/*.pdf# Process specific pages
ocr document.pdf --pages "1-3"
# Process page ranges
ocr document.pdf --pages "1-5,10-15"
# Process from page 5 to end
ocr document.pdf --pages "5-"
# Include page numbers as headlines
ocr document.pdf --page-headlines# Save to specific directory
ocr document.pdf --output /path/to/output
# By default:
# - Single files save in .ocr subdirectory next to source
# - Multiple files save to ocr_output/ directory
#
# Single page: document.pdf → .ocr/document.pg1.md
# Multi-page: document.pdf → .ocr/document.mdThe tool can analyze document content and generate descriptive filenames following the pattern:
{ISO-date} - {Company} - {Summary}.{ext}
Examples:
2024-12-01 - Medivere - Befund Julia.pdf2024-09-03 - WKO - Mahnung Privat.pdf2020-06-15 - Erste Bank - Eingänge.xlsx
# Generate and apply intelligent filename
ocr invoice.pdf --rename
# Preview suggested filename (dry-run)
ocr invoice.pdf --dry-run
# Ask for confirmation before renaming
ocr invoice.pdf --rename --confirm
# Force regenerate filename (ignore cache)
ocr invoice.pdf --rename --force
# Batch rename all files in a folder
ocr ./invoices/ --rename
# Batch rename with confirmation
ocr ./invoices/ --rename --confirm- Smart Analysis: Processes the first page to extract date, company name, and document type
- Confidence Check: If confidence is low, processes all pages for better accuracy
- Caching: Stores generated filenames in metadata to avoid regeneration
- Safe Renaming: Renames both original file AND OCR markdown file to keep them in sync
- Collision Handling: Automatically adds counter suffix (
_2,_3, etc.) if filename exists
| Flag | Description |
|---|---|
--rename |
Enable intelligent filename generation and renaming |
--dry-run |
Show suggested filename without renaming |
--confirm |
Ask for confirmation before each rename |
--force |
Force regenerate filename even if cached |
OCR results are saved in a .ocr subdirectory as Markdown files with YAML frontmatter:
your-document.pdf
.ocr/
├── your-document.pg1.md # Single-page document
├── your-document.md # Multi-page document
└── images/ # Extracted images
├── your-document_1.jpg
└── your-document_2.png
---
source_file: /path/to/document.pdf
original_filename: scan001.pdf
processed_at: 2026-01-08T15:30:45.123456
content_length: 15432
include_page_headlines: false
images_saved: 5
filename_metadata:
generated_filename: 2024-12-01 - Company - Invoice
generation_timestamp: 2026-01-08T15:30:50
generation_method: mistral-small-2506
confidence: high
extracted_date: 2024-12-01
extracted_company: Company Name
extracted_summary: Invoice
pages_analyzed: 1
---
# Document content here...Note: The original_filename field preserves the filename before any renaming, allowing you to track the original file name even after intelligent renaming.
- Documents: PDF, PPTX, DOCX
- Images: PNG, JPG, JPEG, AVIF
Process documents with OCR. Accepts files, folders, or shell glob patterns.
ocr [FILES_OR_FOLDERS...] [OPTIONS]Arguments:
FILES_OR_FOLDERS- One or more files or folders to process
Options:
-o, --output PATH- Output directory (default: .ocr/ for single file, ocr_output/ for multiple)--pages TEXT- Page pattern (e.g., '1-3', '5-', '4,5')--page-headlines- Include page numbers as markdown headlines--rename- Enable intelligent filename generation and renaming--dry-run- Show suggested filenames without renaming--confirm- Ask for confirmation before operations--force- Force regenerate filenames even if cached-v, --version- Show version and exit--help- Show help message
Examples:
# Single file
ocr document.pdf
# Multiple files
ocr file1.pdf file2.pdf file3.pdf
# Entire folder
ocr ./invoices/
# Shell glob patterns
ocr *.pdf
ocr documents/**/*.pdf
# With options
ocr document.pdf --pages "1-5" --page-headlines
ocr ./invoices/ --rename --confirm
ocr *.pdf --dry-run- First OCR with rename: ~$0.011 per file
- OCR: ~$0.01
- Filename generation: ~$0.001
- Cached filename reuse: $0 (no API calls)
- Force regenerate: ~$0.001 (chat only, reuses OCR)
- Smart Caching: Reuses existing OCR markdown to avoid redundant processing
- First-Page Analysis: Only OCRs first page initially for filename generation
- Incremental Processing: Only processes full document if initial confidence is low
# Process a single document
ocr contract.pdf
# Output: .ocr/contract.pg1.md or .ocr/contract.md# Generate smart filename
ocr scan001.pdf --rename
# Before: scan001.pdf
# After: 2024-12-01 - Acme Corp - Service Agreement.pdf# Process and rename all PDFs in a folder
ocr ./invoices/ --rename
# Each file gets a descriptive name based on its content# See what the filename would be without changing anything
ocr document.pdf --dry-run
# Output: Suggested filename: 2024-09-15 - Company - Report.pdf# Process all PDFs in current directory
ocr *.pdf
# Process PDFs in multiple locations
ocr invoices/*.pdf receipts/*.pdfIf you see an error about missing MISTRAL_API_KEY:
- Check that
.envfile exists in your home directory - Verify the API key is correctly set
- Make sure there are no quotes around the key value
If ocr command is not recognized after installation:
Windows:
- Run
uv tool update-shell - Restart your terminal
- Check if
%USERPROFILE%\.local\binis in your PATH
Linux/Mac:
- Run
uv tool update-shell - Restart your terminal or run
source ~/.bashrc(or~/.zshrc)
If you encounter permission errors on Windows with OneDrive:
- Move the project to a local directory (e.g.,
C:\Users\{username}\Projects\OCR)
If filename generation returns "Document":
- Check that the document has readable text content
- Try using
--forceto regenerate - Ensure the document contains a date, company name, or description
# Remove global installation
uv tool uninstall ocr# Build wheel package
uv build --wheel
# Install from wheel
uv tool install dist/ocr-0.3.0-py3-none-any.whl# Install development dependencies
uv sync --group dev
# Run tests
uv run pytest
# Run with coverage
uv run pytest --cov=ocr
# Lint
uv run ruff check src/ tests/Note: The test suite contains unit tests only. Integration tests that call the Mistral API are not yet implemented. The test file ocr_test/max_mustermann_brief.png is available for manual testing -- expected rename: 2026-03-19 - Max Mustermann - Anfrage Datenschutz.png.
- Mistral AI Documentation
- Mistral OCR Guide
- Mistral OCR Tutorial
- Batch OCR Examples
- Mistral OCR Deep Dive
- 🎉 Simplified CLI: Removed subcommands - just use
ocr file.pdfinstead ofocr process-file --file file.pdf - Unified command: Single command handles files, folders, and glob patterns automatically
- Batch rename support:
--renamenow works for multiple files and folders - Batch confirmation:
--confirmflag now prompts before processing multiple files - Breaking change: Old commands (
process-file,process-files,process-folder) removed
- New
.ocrsubdirectory structure: All OCR markdown files now saved in.ocrsubdirectory for better organization - Page-based naming: Single-page documents use
.pg1.mdsuffix, multi-page use.mdsuffix - Original filename tracking: Added
original_filenamefield in metadata to preserve pre-rename filenames - Dry-run fix: OCR markdown files now created even in
--dry-runmode
- Added automatic
.envfile setup in user home directory - Improved configuration file handling with fallback support
- Enhanced user guidance for API key setup
- Added intelligent filename generation using Mistral chat API
- Implemented smart caching for OCR results
- Added CLI flags:
--rename,--dry-run,--confirm,--force - Switched to YAML frontmatter for metadata
- Added version tracking with
--versionflag
- Initial release with basic OCR functionality
- Support for PDFs, images, and Office documents
- Page selection and filtering
- Image extraction and materialization
MIT License
Leonard Tulipan (leo@leotulipan.at)