A comprehensive image processing application powered by UΒ²-Net deep learning architecture and classical computer vision algorithms. This project provides an interactive Streamlit interface for various image transformation tasks.
- Features
- Architecture
- Installation
- Usage
- Project Structure
- Technical Details
- Extending the Application
- References
-
Picture to Sketch π¨
- Converts photographs into artistic pencil sketches
- Uses UΒ²-Net deep learning model
- Preserves fine details and edges
- Real-time processing
-
Edge Detection π
- Applies Canny edge detection algorithm
- Detects sharp intensity changes
- Clean, precise edge extraction
-
Cartoon Effect π
- Transforms photos into cartoon-style images
- Combines edge detection with bilateral filtering
- Maintains color while simplifying details
-
Background Removal πΌοΈ
- Segments foreground from background
- Outputs transparent PNG
- Uses UΒ²-Net for accurate segmentation
-
Custom Prompt π§
- Extensible placeholder for future AI models
- Ready for integration with Vision Language Models
- Support for custom processing instructions
Input (RGB Image)
β
βββββββββββββββββββββββ
β ENCODER STAGES β
β β
β Stage 1: RSU-7 β ββ
β Stage 2: RSU-6 β β
β Stage 3: RSU-5 β β Skip
β Stage 4: RSU-4 β β Connections
β Stage 5: RSU-4F β β
β Stage 6: RSU-4F β ββ
β (Bottleneck) β
βββββββββββββββββββββββ
β
βββββββββββββββββββββββ
β DECODER STAGES β
β β
β Stage 5d: RSU-4F β
β Stage 4d: RSU-4 β
β Stage 3d: RSU-5 β
β Stage 2d: RSU-6 β
β Stage 1d: RSU-7 β
βββββββββββββββββββββββ
β
βββββββββββββββββββββββ
β SIDE OUTPUTS & β
β FUSION LAYER β
βββββββββββββββββββββββ
β
Output (Sketch/Mask)
Each RSU block is a mini U-Net structure with:
- Encoder path with max pooling
- Decoder path with upsampling
- Skip connections at each level
- Residual connection from input to output
Key Innovations:
- Two-level nested U-structure
- Multi-scale feature extraction
- Efficient parameter usage
- Deep supervision with side outputs
- Python 3.8 or higher
- pip package manager
- (Optional) CUDA-capable GPU for faster processing
git clone <repository-url>
cd ai-image-processing-studioOr download the project files directly.
# Create virtual environment
python -m venv venv
# Activate on Windows
venv\Scripts\activate
# Activate on macOS/Linux
source venv/bin/activatepip install -r requirements.txtFor the Picture to Sketch feature, you can download pre-trained UΒ²-Net weights:
- Create a directory:
mkdir -p pretrained_u2net- Download weights from UΒ²-Net GitHub
- Place
u2net.pthin thepretrained_u2net/folder
Note: The application will work without pre-trained weights, but the Picture to Sketch feature will produce random results until you load proper weights.
streamlit run app.pyThe application will open in your default web browser at http://localhost:8501
-
Upload Image
- Click "Browse files" in the sidebar
- Select an image (JPG, PNG, BMP)
- Image info will be displayed
-
Select Task
- Choose from the dropdown menu:
- Picture to Sketch
- Edge Detection
- Cartoon Effect
- Background Removal
- Custom Prompt
- Choose from the dropdown menu:
-
Configure Options
- Adjust output size (256-1024 pixels)
- For custom prompt, enter instructions
-
Process
- Click "π Process Image"
- Wait for processing to complete
- View results side-by-side with original
-
Download
- Click "πΎ Download Processed Image"
- Save result to local machine
jupyter notebook U2NET_Image_Processing.ipynbThe notebook includes:
- Detailed architecture explanations
- Network diagrams
- Step-by-step implementation
- Visualization examples
- Educational content
ai-image-processing-studio/
βββ app.py # Streamlit application
βββ U2NET_Image_Processing.ipynb # Educational notebook
βββ requirements.txt # Python dependencies
βββ README.md # This file
βββ pretrained_u2net/ # Pre-trained model weights
β βββ u2net.pth
βββ images/ # Sample images
β βββ test_photos/ # Test dataset
β βββ my_photos/ # custom images
βββ results/ # Output directory
βββ my_sketches/ # Generated sketches
- Total Parameters: ~44 million
- Input Size: 512Γ512 RGB
- Output Size: 512Γ512 grayscale
- Architecture Depth: 6 stages
- Memory: ~2GB GPU RAM / ~4GB CPU RAM
-
Preprocessing
- Resize to 512Γ512
- Normalize using ImageNet statistics
- Convert to tensor (CΓHΓW format)
-
Inference
- Forward pass through UΒ²-Net
- Multi-scale feature extraction
- Deep supervision with 6 side outputs
- Fusion layer combines all predictions
-
Post-processing
- Invert prediction (1 - output) for sketch effect
- Normalize to [0, 1] range
- Convert to PIL Image
- Resize to desired output size
- Grayscale conversion
- Gaussian smoothing (noise reduction)
- Gradient calculation (Sobel operators)
- Non-maximum suppression
- Hysteresis thresholding (100, 200)
- Adaptive thresholding for edges
- Bilateral filtering (9px kernel)
- Edge and color combination
- Maintains color while simplifying details
| Task | CPU Time | GPU Time | Memory |
|---|---|---|---|
| Picture to Sketch | 2-3s | 0.3-0.5s | 4GB |
| Edge Detection | 0.1s | N/A | <1GB |
| Cartoon Effect | 0.2s | N/A | <1GB |
| Background Removal | 2-3s | 0.3-0.5s | 4GB |
- Define Processing Function
def my_custom_processing(image):
"""Custom processing logic."""
# Process the image
result = image.copy()
# Apply transformations
return result- Add to Task Options
task_options = {
"New Task": "Description of what it does",
# ... existing tasks
}- Add Processing Logic
elif selected_task == "New Task":
result_image = my_custom_processing(image)The Custom Prompt feature is designed for easy integration with VLMs:
# Example: CLIP, BLIP, or other VLM integration
def process_with_vlm(image, prompt):
"""Process image based on text prompt using VLM."""
# Load VLM model
model = load_vlm_model()
# Process with prompt
result = model.process(image, prompt)
return resultdef apply_anime_style(image):
"""Convert image to anime style."""
# Use anime GAN or similar model
from anime_gan import AnimeGAN
model = AnimeGAN()
result = model.transform(image)
return result- UΒ²-Net: Qin, X., Zhang, Z., Huang, C., Dehghan, M., Zaiane, O., & Jagersand, M. (2020). UΒ²-Net: Going deeper with nested U-structure for salient object detection. Pattern Recognition, 106, 107404.
- Original UΒ²-Net Implementation: GitHub
- PyTorch: pytorch.org
- Streamlit: streamlit.io
- DUTS: Salient object detection training dataset
- ImageNet: Pre-training normalization statistics
Contributions are welcome! Areas for improvement:
-
Model Enhancements
- Add more pre-trained models
- Implement model ensemble
- Fine-tuning capabilities
-
UI/UX
- Batch processing
- Progress bars for long operations
- Image comparison slider
-
Features
- Video processing
- Real-time webcam processing
- Style transfer models
- Anime conversion
- Super-resolution
-
Performance
- GPU acceleration
- Model quantization
- Batch inference optimization
This project uses the UΒ²-Net architecture which is released under Apache License 2.0.
- Xuebin Qin et al. for the UΒ²-Net architecture
- PyTorch team for the deep learning framework
- Streamlit team for the web app framework
- OpenCV contributors for computer vision algorithms
For questions, issues, or suggestions, please open an issue on the repository or contact the maintainers.
Happy Image Processing! π¨β¨