OCR Tool

A command-line OCR tool using Mistral AI for document processing with intelligent filename generation.

Quick Start

uv tool install --editable .     # Install globally
echo "MISTRAL_API_KEY=your_key" > ~/.env  # Set API key
ocr run document.pdf --rename    # Process and rename

Features

📄 Document OCR: Extract text from PDFs, images, and Office documents (PPTX, DOCX)
🤖 Intelligent Filename Generation: Automatically generates descriptive filenames from document content
💾 Smart Caching: Reuses OCR results to avoid redundant API calls
🎯 Flexible Page Selection: Process specific pages or page ranges
📝 Markdown Output: Save results with YAML frontmatter metadata
🖼️ Image Extraction: Automatically extracts and saves embedded images
🔧 System-Wide Installation: Install once, use anywhere

Installation

Prerequisites

Python 3.12 or higher
uv package manager

Install Globally

# Clone the repository
git clone <repository-url>
cd OCR

# Install globally with uv
uv tool install --editable .

# Verify installation
ocr --version

Development Installation

# Clone and install in development mode
git clone <repository-url>
cd OCR
uv pip install -e .

Configuration

API Key Setup

The tool requires a Mistral AI API key. On first run, it will automatically create a configuration file at:

Windows: C:\Users\{your-username}\.env Linux/Mac: ~/.env

Edit this file and add your API key:

# OCR Configuration File
# Get your API key from: https://console.mistral.ai/

MISTRAL_API_KEY=your-mistral-api-key-here

Note: You can also place a .env file in your current working directory, which will take precedence over the home directory configuration.

Get Your API Key

Visit Mistral AI Console
Sign up or log in
Navigate to API Keys section
Create a new API key
Copy the key to your .env file

Usage

Basic OCR

# Process a single file
ocr document.pdf

# Process multiple files
ocr invoice1.pdf invoice2.pdf receipt.pdf

# Process all files in a folder
ocr ./invoices/

# Process files matching a pattern (shell expansion)
ocr *.pdf
ocr docs/**/*.pdf

Page Selection

# Process specific pages
ocr document.pdf --pages "1-3"

# Process page ranges
ocr document.pdf --pages "1-5,10-15"

# Process from page 5 to end
ocr document.pdf --pages "5-"

# Include page numbers as headlines
ocr document.pdf --page-headlines

Output Options

# Save to specific directory
ocr document.pdf --output /path/to/output

# By default:
# - Single files save in .ocr subdirectory next to source
# - Multiple files save to ocr_output/ directory
#
# Single page: document.pdf → .ocr/document.pg1.md
# Multi-page: document.pdf → .ocr/document.md

Intelligent Filename Generation

Overview

The tool can analyze document content and generate descriptive filenames following the pattern:

{ISO-date} - {Company} - {Summary}.{ext}

Examples:

2024-12-01 - Medivere - Befund Julia.pdf
2024-09-03 - WKO - Mahnung Privat.pdf
2020-06-15 - Erste Bank - Eingänge.xlsx

Usage

# Generate and apply intelligent filename
ocr invoice.pdf --rename

# Preview suggested filename (dry-run)
ocr invoice.pdf --dry-run

# Ask for confirmation before renaming
ocr invoice.pdf --rename --confirm

# Force regenerate filename (ignore cache)
ocr invoice.pdf --rename --force

# Batch rename all files in a folder
ocr ./invoices/ --rename

# Batch rename with confirmation
ocr ./invoices/ --rename --confirm

How It Works

Smart Analysis: Processes the first page to extract date, company name, and document type
Confidence Check: If confidence is low, processes all pages for better accuracy
Caching: Stores generated filenames in metadata to avoid regeneration
Safe Renaming: Renames both original file AND OCR markdown file to keep them in sync
Collision Handling: Automatically adds counter suffix (_2, _3, etc.) if filename exists

Filename Generation Flags

Flag	Description
`--rename`	Enable intelligent filename generation and renaming
`--dry-run`	Show suggested filename without renaming
`--confirm`	Ask for confirmation before each rename
`--force`	Force regenerate filename even if cached

Output Format

OCR results are saved in a .ocr subdirectory as Markdown files with YAML frontmatter:

File Structure

your-document.pdf
.ocr/
├── your-document.pg1.md    # Single-page document
├── your-document.md        # Multi-page document
└── images/                 # Extracted images
    ├── your-document_1.jpg
    └── your-document_2.png

Markdown Format

---
source_file: /path/to/document.pdf
original_filename: scan001.pdf
processed_at: 2026-01-08T15:30:45.123456
content_length: 15432
include_page_headlines: false
images_saved: 5
filename_metadata:
  generated_filename: 2024-12-01 - Company - Invoice
  generation_timestamp: 2026-01-08T15:30:50
  generation_method: mistral-small-2506
  confidence: high
  extracted_date: 2024-12-01
  extracted_company: Company Name
  extracted_summary: Invoice
  pages_analyzed: 1
---

# Document content here...

Note: The original_filename field preserves the filename before any renaming, allowing you to track the original file name even after intelligent renaming.

Supported File Types

Documents: PDF, PPTX, DOCX
Images: PNG, JPG, JPEG, AVIF

Command Reference

Main Command

Process documents with OCR. Accepts files, folders, or shell glob patterns.

ocr [FILES_OR_FOLDERS...] [OPTIONS]

Arguments:

FILES_OR_FOLDERS - One or more files or folders to process

Options:

-o, --output PATH - Output directory (default: .ocr/ for single file, ocr_output/ for multiple)
--pages TEXT - Page pattern (e.g., '1-3', '5-', '4,5')
--page-headlines - Include page numbers as markdown headlines
--rename - Enable intelligent filename generation and renaming
--dry-run - Show suggested filenames without renaming
--confirm - Ask for confirmation before operations
--force - Force regenerate filenames even if cached
-v, --version - Show version and exit
--help - Show help message

Examples:

# Single file
ocr document.pdf

# Multiple files
ocr file1.pdf file2.pdf file3.pdf

# Entire folder
ocr ./invoices/

# Shell glob patterns
ocr *.pdf
ocr documents/**/*.pdf

# With options
ocr document.pdf --pages "1-5" --page-headlines
ocr ./invoices/ --rename --confirm
ocr *.pdf --dry-run

Cost & Performance

API Costs (Approximate)

First OCR with rename: ~$0.011 per file
- OCR: ~$0.01
- Filename generation: ~$0.001
Cached filename reuse: $0 (no API calls)
Force regenerate: ~$0.001 (chat only, reuses OCR)

Performance Features

Smart Caching: Reuses existing OCR markdown to avoid redundant processing
First-Page Analysis: Only OCRs first page initially for filename generation
Incremental Processing: Only processes full document if initial confidence is low

Examples

Example 1: Basic OCR Workflow

# Process a single document
ocr contract.pdf

# Output: .ocr/contract.pg1.md or .ocr/contract.md

Example 2: Intelligent Renaming

# Generate smart filename
ocr scan001.pdf --rename

# Before: scan001.pdf
# After:  2024-12-01 - Acme Corp - Service Agreement.pdf

Example 3: Batch Processing with Renaming

# Process and rename all PDFs in a folder
ocr ./invoices/ --rename

# Each file gets a descriptive name based on its content

Example 4: Preview Before Renaming

# See what the filename would be without changing anything
ocr document.pdf --dry-run

# Output: Suggested filename: 2024-09-15 - Company - Report.pdf

Example 5: Process Multiple Files with Glob Pattern

# Process all PDFs in current directory
ocr *.pdf

# Process PDFs in multiple locations
ocr invoices/*.pdf receipts/*.pdf

Troubleshooting

Missing API Key

If you see an error about missing MISTRAL_API_KEY:

Check that .env file exists in your home directory
Verify the API key is correctly set
Make sure there are no quotes around the key value

Command Not Found

If ocr command is not recognized after installation:

Windows:

Run uv tool update-shell
Restart your terminal
Check if %USERPROFILE%\.local\bin is in your PATH

Linux/Mac:

Run uv tool update-shell
Restart your terminal or run source ~/.bashrc (or ~/.zshrc)

Permission Errors

If you encounter permission errors on Windows with OneDrive:

Move the project to a local directory (e.g., C:\Users\{username}\Projects\OCR)

Filename Not Generated

If filename generation returns "Document":

Check that the document has readable text content
Try using --force to regenerate
Ensure the document contains a date, company name, or description

Uninstall

# Remove global installation
uv tool uninstall ocr

Development

Build from Source

# Build wheel package
uv build --wheel

# Install from wheel
uv tool install dist/ocr-0.3.0-py3-none-any.whl

Running Tests

# Install development dependencies
uv sync --group dev

# Run tests
uv run pytest

# Run with coverage
uv run pytest --cov=ocr

# Lint
uv run ruff check src/ tests/

Note: The test suite contains unit tests only. Integration tests that call the Mistral API are not yet implemented. The test file ocr_test/max_mustermann_brief.png is available for manual testing -- expected rename: 2026-03-19 - Max Mustermann - Anfrage Datenschutz.png.

Resources

Version History

v0.3.0 (Current)

🎉 Simplified CLI: Removed subcommands - just use ocr file.pdf instead of ocr process-file --file file.pdf
Unified command: Single command handles files, folders, and glob patterns automatically
Batch rename support: --rename now works for multiple files and folders
Batch confirmation: --confirm flag now prompts before processing multiple files
Breaking change: Old commands (process-file, process-files, process-folder) removed

v0.2.2

New .ocr subdirectory structure: All OCR markdown files now saved in .ocr subdirectory for better organization
Page-based naming: Single-page documents use .pg1.md suffix, multi-page use .md suffix
Original filename tracking: Added original_filename field in metadata to preserve pre-rename filenames
Dry-run fix: OCR markdown files now created even in --dry-run mode

v0.2.1

Added automatic .env file setup in user home directory
Improved configuration file handling with fallback support
Enhanced user guidance for API key setup

v0.2.0

Added intelligent filename generation using Mistral chat API
Implemented smart caching for OCR results
Added CLI flags: --rename, --dry-run, --confirm, --force
Switched to YAML frontmatter for metadata
Added version tracking with --version flag

v0.1.0

Initial release with basic OCR functionality
Support for PDFs, images, and Office documents
Page selection and filtering
Image extraction and materialization

License

MIT License

Author

Leonard Tulipan (leo@leotulipan.at)

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
.claude		.claude
.github		.github
docs		docs
ocr_test		ocr_test
src/ocr		src/ocr
tests		tests
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
ACCEPTANCE_CRITERIA.md		ACCEPTANCE_CRITERIA.md
BLUEPRINT_TEMPLATE.md		BLUEPRINT_TEMPLATE.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
TASKS.md		TASKS.md
install-global.bat		install-global.bat
install.sh		install.sh
pyproject.toml		pyproject.toml
uv.lock		uv.lock
walkthrough.md		walkthrough.md

Folders and files

Latest commit

History

Repository files navigation

OCR Tool

Quick Start

Features

Installation

Prerequisites

Install Globally

Development Installation

Configuration

API Key Setup

Get Your API Key

Usage

Basic OCR

Page Selection

Output Options

Intelligent Filename Generation

Overview

Usage

How It Works

Filename Generation Flags

Output Format

File Structure

Markdown Format

Supported File Types

Command Reference

Main Command

Cost & Performance

API Costs (Approximate)

Performance Features

Examples

Example 1: Basic OCR Workflow

Example 2: Intelligent Renaming

Example 3: Batch Processing with Renaming

Example 4: Preview Before Renaming

Example 5: Process Multiple Files with Glob Pattern

Troubleshooting

Missing API Key

Command Not Found

Permission Errors

Filename Not Generated

Uninstall

Development

Build from Source

Running Tests

Resources

Version History

v0.3.0 (Current)

v0.2.2

v0.2.1

v0.2.0

v0.1.0

License

Author

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages