CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Development Commands

Building and Testing

# Format code (MANDATORY before commits)
cargo fmt --all

# Run clippy linter with strict settings
cargo clippy --all-features -- -D warnings

# Run all Rust tests
cargo test --release

# Run comprehensive test script (includes Python tests)
./scripts/test.sh

# Build Python package with maturin
maturin develop --features python

# Run Python tests
pytest tests/ -v

# Run benchmarks
cargo bench

# Check for unused dependencies
cargo udeps --all-targets

# Publish dry run
cargo publish --dry-run

Single Test Execution

# Run a specific Rust test
cargo test test_name --release

# Run a specific Python test
pytest tests/test_file.py::test_name -v

# Run tests with output
cargo test -- --nocapture

Architecture Overview

Core Encryption Process

The self_encryption crate implements convergent encryption with obfuscation through a three-stage process:

Content Chunking: Files are split into chunks (up to 1MB each)
Per-Chunk Processing:
- Compression (Brotli with configurable quality)
- Encryption (AES-256-CBC)
- XOR obfuscation
Key Derivation: Each chunk's encryption keys are derived from a circular dependency pattern:
- Chunks 0 and 1 have special handling due to circular dependencies
- For chunk N (where N ≥ 2): uses hashes from chunks N, (N+1) % total, (N+2) % total
- Creates interdependency where modifying any chunk affects multiple others

Key Components

src/lib.rs: Main library interface, exports public API including encrypt, decrypt_full_set
src/encrypt.rs: Core encryption logic, handles chunk processing and key generation
src/decrypt.rs: Decryption logic, reverses the encryption process
src/data_map.rs: DataMap structure that stores chunk metadata (src/dst hashes, sizes, indices)
src/stream.rs: Streaming encryption/decryption for memory-efficient large file handling
src/chunk.rs: Chunk data structures (EncryptedChunk, ChunkInfo) and validation
src/aes.rs: AES encryption implementation using CBC mode
src/utils.rs: Utility functions for key derivation, hash extraction, chunk size calculation
src/python.rs: PyO3 bindings for Python interface
src/error.rs: Error types and handling

Storage Backend Design

The library uses a trait-based design for flexible storage backends:

Store functions: Fn(XorName, Bytes) -> Result<()>
Retrieve functions: Fn(XorName) -> Result<Bytes>
Supports memory, disk, or custom storage implementations

DataMap Hierarchy

For large files, DataMaps can be shrunk hierarchically:

Serialize large DataMap → Encrypt as data → Create new smaller DataMap
Process repeats until manageable size reached
child field tracks hierarchy level

Critical Constraints

Minimum file size: 3072 bytes (3 * MIN_CHUNK_SIZE) for self-encryption
Chunk size: Maximum 1MB per chunk
Key security: The returned secret key from encryption requires secure handling
Hash verification: All chunks are self-validating through SHA3-256 hashes

Python Bindings

The Python interface is built with PyO3 and maturin:

CLI tool: self-encryption command
Module: self_encryption Python package
Supports both in-memory and streaming operations

CI/CD Workflow

PR checks: Format, clippy, tests, coverage, unused deps
Warnings as errors: RUSTFLAGS="-D warnings" enforced in CI
Code coverage: Uses cargo-llvm-cov and reports to coveralls.io
32-bit testing: Includes i686 target testing
Python package: Automated publishing via GitHub Actions

Performance Considerations

Parallel chunk processing via rayon in standard implementation
Streaming APIs for memory efficiency with large files
Benchmarks in benches/lib.rs for tracking performance
Optimized compression settings in Brotli
Chunk size optimization based on file size

StreamSelfEncryptor Implementation Notes

The streaming implementation differs from the standard implementation in several important ways:

Design Differences

Memory Usage:
- Standard: Loads entire file into memory, processes all chunks at once
- Streaming: Processes one chunk at a time, O(1) memory usage
API Pattern:
- Standard: Functional approach with encrypt(bytes) -> (DataMap, Vec<EncryptedChunk>)
- Streaming: Stateful object with next_encryption() returning chunks incrementally
Chunk Processing:
- Standard: Special handling for chunks 0 and 1 (deferred processing due to circular dependencies)
- Streaming: Processes all chunks uniformly (potential issue)

Known Issues with StreamSelfEncryptor

First Two Chunks: Does not implement the special handling for chunks 0 and 1 that the standard implementation uses. This could lead to incorrect encryption in edge cases.
Error Handling: Less robust error handling compared to standard implementation, particularly around chunk validation.
File System Dependency: StreamSelfDecryptor uses temporary files extensively, which adds complexity and potential failure points.

When to Use Each Implementation

Standard Implementation: Use for files that fit comfortably in memory (< 1GB)
Streaming Implementation: Use for large files where memory usage is a concern
Note: Both implementations produce compatible output when working correctly

Potential Improvements Needed

Unify Chunk Processing: Align StreamSelfEncryptor's chunk processing with standard implementation, especially for chunks 0 and 1
Error Handling: Improve error handling in streaming implementation to match standard implementation's robustness
Reduce File System Operations: Consider memory-mapping or buffering strategies for StreamSelfDecryptor
Progress Callbacks: Add progress reporting capabilities to streaming implementation
Test Coverage: Ensure streaming implementation has comprehensive tests for edge cases
API Consistency: Consider refactoring to provide more consistent APIs between implementations

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLAUDE.md

Development Commands

Building and Testing

Single Test Execution

Architecture Overview

Core Encryption Process

Key Components

Storage Backend Design

DataMap Hierarchy

Critical Constraints

Python Bindings

CI/CD Workflow

Performance Considerations

StreamSelfEncryptor Implementation Notes

Design Differences

Known Issues with StreamSelfEncryptor

When to Use Each Implementation

Potential Improvements Needed

FilesExpand file tree

CLAUDE.md

Latest commit

History

CLAUDE.md

File metadata and controls

CLAUDE.md

Development Commands

Building and Testing

Single Test Execution

Architecture Overview

Core Encryption Process

Key Components

Storage Backend Design

DataMap Hierarchy

Critical Constraints

Python Bindings

CI/CD Workflow

Performance Considerations

StreamSelfEncryptor Implementation Notes

Design Differences

Known Issues with StreamSelfEncryptor

When to Use Each Implementation

Potential Improvements Needed