This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
# Format code (MANDATORY before commits)
cargo fmt --all
# Run clippy linter with strict settings
cargo clippy --all-features -- -D warnings
# Run all Rust tests
cargo test --release
# Run comprehensive test script (includes Python tests)
./scripts/test.sh
# Build Python package with maturin
maturin develop --features python
# Run Python tests
pytest tests/ -v
# Run benchmarks
cargo bench
# Check for unused dependencies
cargo udeps --all-targets
# Publish dry run
cargo publish --dry-run# Run a specific Rust test
cargo test test_name --release
# Run a specific Python test
pytest tests/test_file.py::test_name -v
# Run tests with output
cargo test -- --nocaptureThe self_encryption crate implements convergent encryption with obfuscation through a three-stage process:
- Content Chunking: Files are split into chunks (up to 1MB each)
- Per-Chunk Processing:
- Compression (Brotli with configurable quality)
- Encryption (AES-256-CBC)
- XOR obfuscation
- Key Derivation: Each chunk's encryption keys are derived from a circular dependency pattern:
- Chunks 0 and 1 have special handling due to circular dependencies
- For chunk N (where N ≥ 2): uses hashes from chunks N, (N+1) % total, (N+2) % total
- Creates interdependency where modifying any chunk affects multiple others
src/lib.rs: Main library interface, exports public API includingencrypt,decrypt_full_setsrc/encrypt.rs: Core encryption logic, handles chunk processing and key generationsrc/decrypt.rs: Decryption logic, reverses the encryption processsrc/data_map.rs: DataMap structure that stores chunk metadata (src/dst hashes, sizes, indices)src/stream.rs: Streaming encryption/decryption for memory-efficient large file handlingsrc/chunk.rs: Chunk data structures (EncryptedChunk,ChunkInfo) and validationsrc/aes.rs: AES encryption implementation using CBC modesrc/utils.rs: Utility functions for key derivation, hash extraction, chunk size calculationsrc/python.rs: PyO3 bindings for Python interfacesrc/error.rs: Error types and handling
The library uses a trait-based design for flexible storage backends:
- Store functions:
Fn(XorName, Bytes) -> Result<()> - Retrieve functions:
Fn(XorName) -> Result<Bytes> - Supports memory, disk, or custom storage implementations
For large files, DataMaps can be shrunk hierarchically:
- Serialize large DataMap → Encrypt as data → Create new smaller DataMap
- Process repeats until manageable size reached
childfield tracks hierarchy level
- Minimum file size: 3072 bytes (3 * MIN_CHUNK_SIZE) for self-encryption
- Chunk size: Maximum 1MB per chunk
- Key security: The returned secret key from encryption requires secure handling
- Hash verification: All chunks are self-validating through SHA3-256 hashes
The Python interface is built with PyO3 and maturin:
- CLI tool:
self-encryptioncommand - Module:
self_encryptionPython package - Supports both in-memory and streaming operations
- PR checks: Format, clippy, tests, coverage, unused deps
- Warnings as errors:
RUSTFLAGS="-D warnings"enforced in CI - Code coverage: Uses cargo-llvm-cov and reports to coveralls.io
- 32-bit testing: Includes i686 target testing
- Python package: Automated publishing via GitHub Actions
- Parallel chunk processing via rayon in standard implementation
- Streaming APIs for memory efficiency with large files
- Benchmarks in
benches/lib.rsfor tracking performance - Optimized compression settings in Brotli
- Chunk size optimization based on file size
The streaming implementation differs from the standard implementation in several important ways:
-
Memory Usage:
- Standard: Loads entire file into memory, processes all chunks at once
- Streaming: Processes one chunk at a time, O(1) memory usage
-
API Pattern:
- Standard: Functional approach with
encrypt(bytes) -> (DataMap, Vec<EncryptedChunk>) - Streaming: Stateful object with
next_encryption()returning chunks incrementally
- Standard: Functional approach with
-
Chunk Processing:
- Standard: Special handling for chunks 0 and 1 (deferred processing due to circular dependencies)
- Streaming: Processes all chunks uniformly (potential issue)
-
First Two Chunks: Does not implement the special handling for chunks 0 and 1 that the standard implementation uses. This could lead to incorrect encryption in edge cases.
-
Error Handling: Less robust error handling compared to standard implementation, particularly around chunk validation.
-
File System Dependency: StreamSelfDecryptor uses temporary files extensively, which adds complexity and potential failure points.
- Standard Implementation: Use for files that fit comfortably in memory (< 1GB)
- Streaming Implementation: Use for large files where memory usage is a concern
- Note: Both implementations produce compatible output when working correctly
- Unify Chunk Processing: Align StreamSelfEncryptor's chunk processing with standard implementation, especially for chunks 0 and 1
- Error Handling: Improve error handling in streaming implementation to match standard implementation's robustness
- Reduce File System Operations: Consider memory-mapping or buffering strategies for StreamSelfDecryptor
- Progress Callbacks: Add progress reporting capabilities to streaming implementation
- Test Coverage: Ensure streaming implementation has comprehensive tests for edge cases
- API Consistency: Consider refactoring to provide more consistent APIs between implementations