Skip to content

leotulipan/ocr

Repository files navigation

OCR Tool

CI Python 3.12+ License: MIT

A command-line OCR tool using Mistral AI for document processing with intelligent filename generation.

Quick Start

uv tool install --editable .     # Install globally
echo "MISTRAL_API_KEY=your_key" > ~/.env  # Set API key
ocr run document.pdf --rename    # Process and rename

Features

  • 📄 Document OCR: Extract text from PDFs, images, and Office documents (PPTX, DOCX)
  • 🤖 Intelligent Filename Generation: Automatically generates descriptive filenames from document content
  • 💾 Smart Caching: Reuses OCR results to avoid redundant API calls
  • 🎯 Flexible Page Selection: Process specific pages or page ranges
  • 📝 Markdown Output: Save results with YAML frontmatter metadata
  • 🖼️ Image Extraction: Automatically extracts and saves embedded images
  • 🔧 System-Wide Installation: Install once, use anywhere

Installation

Prerequisites

  • Python 3.12 or higher
  • uv package manager

Install Globally

# Clone the repository
git clone <repository-url>
cd OCR

# Install globally with uv
uv tool install --editable .

# Verify installation
ocr --version

Development Installation

# Clone and install in development mode
git clone <repository-url>
cd OCR
uv pip install -e .

Configuration

API Key Setup

The tool requires a Mistral AI API key. On first run, it will automatically create a configuration file at:

Windows: C:\Users\{your-username}\.env Linux/Mac: ~/.env

Edit this file and add your API key:

# OCR Configuration File
# Get your API key from: https://console.mistral.ai/

MISTRAL_API_KEY=your-mistral-api-key-here

Note: You can also place a .env file in your current working directory, which will take precedence over the home directory configuration.

Get Your API Key

  1. Visit Mistral AI Console
  2. Sign up or log in
  3. Navigate to API Keys section
  4. Create a new API key
  5. Copy the key to your .env file

Usage

Basic OCR

# Process a single file
ocr document.pdf

# Process multiple files
ocr invoice1.pdf invoice2.pdf receipt.pdf

# Process all files in a folder
ocr ./invoices/

# Process files matching a pattern (shell expansion)
ocr *.pdf
ocr docs/**/*.pdf

Page Selection

# Process specific pages
ocr document.pdf --pages "1-3"

# Process page ranges
ocr document.pdf --pages "1-5,10-15"

# Process from page 5 to end
ocr document.pdf --pages "5-"

# Include page numbers as headlines
ocr document.pdf --page-headlines

Output Options

# Save to specific directory
ocr document.pdf --output /path/to/output

# By default:
# - Single files save in .ocr subdirectory next to source
# - Multiple files save to ocr_output/ directory
#
# Single page: document.pdf → .ocr/document.pg1.md
# Multi-page: document.pdf → .ocr/document.md

Intelligent Filename Generation

Overview

The tool can analyze document content and generate descriptive filenames following the pattern:

{ISO-date} - {Company} - {Summary}.{ext}

Examples:

  • 2024-12-01 - Medivere - Befund Julia.pdf
  • 2024-09-03 - WKO - Mahnung Privat.pdf
  • 2020-06-15 - Erste Bank - Eingänge.xlsx

Usage

# Generate and apply intelligent filename
ocr invoice.pdf --rename

# Preview suggested filename (dry-run)
ocr invoice.pdf --dry-run

# Ask for confirmation before renaming
ocr invoice.pdf --rename --confirm

# Force regenerate filename (ignore cache)
ocr invoice.pdf --rename --force

# Batch rename all files in a folder
ocr ./invoices/ --rename

# Batch rename with confirmation
ocr ./invoices/ --rename --confirm

How It Works

  1. Smart Analysis: Processes the first page to extract date, company name, and document type
  2. Confidence Check: If confidence is low, processes all pages for better accuracy
  3. Caching: Stores generated filenames in metadata to avoid regeneration
  4. Safe Renaming: Renames both original file AND OCR markdown file to keep them in sync
  5. Collision Handling: Automatically adds counter suffix (_2, _3, etc.) if filename exists

Filename Generation Flags

Flag Description
--rename Enable intelligent filename generation and renaming
--dry-run Show suggested filename without renaming
--confirm Ask for confirmation before each rename
--force Force regenerate filename even if cached

Output Format

OCR results are saved in a .ocr subdirectory as Markdown files with YAML frontmatter:

File Structure

your-document.pdf
.ocr/
├── your-document.pg1.md    # Single-page document
├── your-document.md        # Multi-page document
└── images/                 # Extracted images
    ├── your-document_1.jpg
    └── your-document_2.png

Markdown Format

---
source_file: /path/to/document.pdf
original_filename: scan001.pdf
processed_at: 2026-01-08T15:30:45.123456
content_length: 15432
include_page_headlines: false
images_saved: 5
filename_metadata:
  generated_filename: 2024-12-01 - Company - Invoice
  generation_timestamp: 2026-01-08T15:30:50
  generation_method: mistral-small-2506
  confidence: high
  extracted_date: 2024-12-01
  extracted_company: Company Name
  extracted_summary: Invoice
  pages_analyzed: 1
---

# Document content here...

Note: The original_filename field preserves the filename before any renaming, allowing you to track the original file name even after intelligent renaming.

Supported File Types

  • Documents: PDF, PPTX, DOCX
  • Images: PNG, JPG, JPEG, AVIF

Command Reference

Main Command

Process documents with OCR. Accepts files, folders, or shell glob patterns.

ocr [FILES_OR_FOLDERS...] [OPTIONS]

Arguments:

  • FILES_OR_FOLDERS - One or more files or folders to process

Options:

  • -o, --output PATH - Output directory (default: .ocr/ for single file, ocr_output/ for multiple)
  • --pages TEXT - Page pattern (e.g., '1-3', '5-', '4,5')
  • --page-headlines - Include page numbers as markdown headlines
  • --rename - Enable intelligent filename generation and renaming
  • --dry-run - Show suggested filenames without renaming
  • --confirm - Ask for confirmation before operations
  • --force - Force regenerate filenames even if cached
  • -v, --version - Show version and exit
  • --help - Show help message

Examples:

# Single file
ocr document.pdf

# Multiple files
ocr file1.pdf file2.pdf file3.pdf

# Entire folder
ocr ./invoices/

# Shell glob patterns
ocr *.pdf
ocr documents/**/*.pdf

# With options
ocr document.pdf --pages "1-5" --page-headlines
ocr ./invoices/ --rename --confirm
ocr *.pdf --dry-run

Cost & Performance

API Costs (Approximate)

  • First OCR with rename: ~$0.011 per file
    • OCR: ~$0.01
    • Filename generation: ~$0.001
  • Cached filename reuse: $0 (no API calls)
  • Force regenerate: ~$0.001 (chat only, reuses OCR)

Performance Features

  • Smart Caching: Reuses existing OCR markdown to avoid redundant processing
  • First-Page Analysis: Only OCRs first page initially for filename generation
  • Incremental Processing: Only processes full document if initial confidence is low

Examples

Example 1: Basic OCR Workflow

# Process a single document
ocr contract.pdf

# Output: .ocr/contract.pg1.md or .ocr/contract.md

Example 2: Intelligent Renaming

# Generate smart filename
ocr scan001.pdf --rename

# Before: scan001.pdf
# After:  2024-12-01 - Acme Corp - Service Agreement.pdf

Example 3: Batch Processing with Renaming

# Process and rename all PDFs in a folder
ocr ./invoices/ --rename

# Each file gets a descriptive name based on its content

Example 4: Preview Before Renaming

# See what the filename would be without changing anything
ocr document.pdf --dry-run

# Output: Suggested filename: 2024-09-15 - Company - Report.pdf

Example 5: Process Multiple Files with Glob Pattern

# Process all PDFs in current directory
ocr *.pdf

# Process PDFs in multiple locations
ocr invoices/*.pdf receipts/*.pdf

Troubleshooting

Missing API Key

If you see an error about missing MISTRAL_API_KEY:

  1. Check that .env file exists in your home directory
  2. Verify the API key is correctly set
  3. Make sure there are no quotes around the key value

Command Not Found

If ocr command is not recognized after installation:

Windows:

  1. Run uv tool update-shell
  2. Restart your terminal
  3. Check if %USERPROFILE%\.local\bin is in your PATH

Linux/Mac:

  1. Run uv tool update-shell
  2. Restart your terminal or run source ~/.bashrc (or ~/.zshrc)

Permission Errors

If you encounter permission errors on Windows with OneDrive:

  • Move the project to a local directory (e.g., C:\Users\{username}\Projects\OCR)

Filename Not Generated

If filename generation returns "Document":

  • Check that the document has readable text content
  • Try using --force to regenerate
  • Ensure the document contains a date, company name, or description

Uninstall

# Remove global installation
uv tool uninstall ocr

Development

Build from Source

# Build wheel package
uv build --wheel

# Install from wheel
uv tool install dist/ocr-0.3.0-py3-none-any.whl

Running Tests

# Install development dependencies
uv sync --group dev

# Run tests
uv run pytest

# Run with coverage
uv run pytest --cov=ocr

# Lint
uv run ruff check src/ tests/

Note: The test suite contains unit tests only. Integration tests that call the Mistral API are not yet implemented. The test file ocr_test/max_mustermann_brief.png is available for manual testing -- expected rename: 2026-03-19 - Max Mustermann - Anfrage Datenschutz.png.

Resources

Version History

v0.3.0 (Current)

  • 🎉 Simplified CLI: Removed subcommands - just use ocr file.pdf instead of ocr process-file --file file.pdf
  • Unified command: Single command handles files, folders, and glob patterns automatically
  • Batch rename support: --rename now works for multiple files and folders
  • Batch confirmation: --confirm flag now prompts before processing multiple files
  • Breaking change: Old commands (process-file, process-files, process-folder) removed

v0.2.2

  • New .ocr subdirectory structure: All OCR markdown files now saved in .ocr subdirectory for better organization
  • Page-based naming: Single-page documents use .pg1.md suffix, multi-page use .md suffix
  • Original filename tracking: Added original_filename field in metadata to preserve pre-rename filenames
  • Dry-run fix: OCR markdown files now created even in --dry-run mode

v0.2.1

  • Added automatic .env file setup in user home directory
  • Improved configuration file handling with fallback support
  • Enhanced user guidance for API key setup

v0.2.0

  • Added intelligent filename generation using Mistral chat API
  • Implemented smart caching for OCR results
  • Added CLI flags: --rename, --dry-run, --confirm, --force
  • Switched to YAML frontmatter for metadata
  • Added version tracking with --version flag

v0.1.0

  • Initial release with basic OCR functionality
  • Support for PDFs, images, and Office documents
  • Page selection and filtering
  • Image extraction and materialization

License

MIT License

Author

Leonard Tulipan (leo@leotulipan.at)

About

CLI OCR tool using Mistral AI with intelligent filename generation

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages