Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 43 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# Python cache files
__pycache__/
*.py[cod]
*$py.class

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
*.manifest
*.spec

# Jupyter Notebook
.ipynb_checkpoints

# pyenv
.python-version

# virtualenv
venv/
env/
ENV/

# Temporary files
*.tmp
*.temp
.DS_Store
149 changes: 146 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,33 @@

## Quick Start: Understanding Image Cropping

If you're wondering **"Why do we need to crop images before SRCNN?"**, you're not alone!

### 🔍 **Quick Answer**
The `modcrop` function ensures image dimensions are divisible by the scale factor (usually 3), which is required for the SRCNN network architecture to work properly.

### 🧮 **Your Example Explained**
For an image with shape (176, 197, 3):
- Height: 176 - (176 % 3) = 176 - 2 = **174**
- Width: 197 - (197 % 3) = 197 - 2 = **195**
- Result: **(174, 195, 3)** ← Note: width should be 195, not 194

### 📊 **Interactive Demo**
Run our demonstration script to see exactly how modcrop works:
```bash
python modcrop_demo.py
```

This will show you:
- ✅ Mathematical calculations for different image sizes
- ✅ Why SRCNN architecture requires this cropping
- ✅ How minimal the pixel loss actually is (usually <2%)

### 📖 **Detailed Explanation**
For a complete technical explanation, see the [FAQ section](#frequently-asked-questions) below.

---

# Using The Super-Resolution Convolutional Neural Network for Image Restoration


Expand Down Expand Up @@ -245,11 +274,55 @@ Once we have tested our network, we can perform single-image super-resolution on
_# define necessary image processing functions_
```py
# Deploying the SRCNN

## Why Image Cropping (modcrop) is Necessary

Before feeding images to the SRCNN model, we need to crop them using the `modcrop` function to ensure the image dimensions are divisible by the scale factor. This is crucial for several reasons:

### 1. **Network Architecture Constraints**
- SRCNN uses convolutional layers with specific kernel sizes (9×9, 3×3, 5×5)
- Some layers use 'valid' padding, which reduces output dimensions
- Input dimensions must be compatible with the network's expected output size

### 2. **Scale Factor Alignment**
- Super-resolution typically involves upscaling by integer factors (2×, 3×, 4×)
- Having dimensions divisible by the scale factor prevents edge artifacts
- Ensures consistent pixel mapping during the upsampling process

### 3. **Training Consistency**
- The SRCNN model was trained on images with dimensions divisible by the scale factor
- Using similar dimensional constraints during inference maintains consistency
- Helps preserve the learned feature representations

### 4. **Mathematical Example**
For an image with shape (176, 197, 3) and scale factor 3:
- Height: 176 - (176 % 3) = 176 - 2 = 174
- Width: 197 - (197 % 3) = 197 - 2 = 195
- Result: (174, 195, 3) - only 2-3 pixels cropped from each dimension

def modcrop(img, scale):
"""
Crop image to make dimensions divisible by scale factor.

This function ensures that the image height and width are divisible
by the scale factor, which is essential for proper SRCNN processing.

Args:
img (numpy.ndarray): Input image with shape (H, W, C)
scale (int): Scale factor (typically 3 for SRCNN)

Returns:
numpy.ndarray: Cropped image with dimensions divisible by scale

Example:
>>> img = np.zeros((176, 197, 3)) # Original shape
>>> cropped = modcrop(img, 3) # After cropping
>>> print(cropped.shape) # (174, 195, 3)
"""
tmpsz = img.shape
sz = tmpsz[0:2]
sz = sz - np.mod(sz, scale)
img = img[0:sz[0], 0:sz[1]]
sz = tmpsz[0:2] # Get height and width only
sz = sz - np.mod(sz, scale) # Remove remainder when divided by scale
img = img[0:sz[0], 0:sz[1]] # Crop to new dimensions
return img

def shave(image, border):
Expand Down Expand Up @@ -376,3 +449,73 @@ References:
[5] [http://mmlab.ie.cuhk.edu.hk/projects/SRCNN.html](http://mmlab.ie.cuhk.edu.hk/projects/SRCNN.html).

[6] [Learning a Deep Convolutional Network for Image Super-Resolution](https://arxiv.org/pdf/1501.00092)

---

## Frequently Asked Questions

### Q: Why do we need to crop images before inputting them to SRCNN?

**A:** The `modcrop` function is essential for several technical reasons:

#### 1. **SRCNN Architecture Requirements**
The SRCNN network has a specific architecture with three convolutional layers:
- **Patch Extraction**: 9×9 conv with 128 filters (valid padding)
- **Non-linear Mapping**: 3×3 conv with 64 filters (same padding)
- **Reconstruction**: 5×5 conv with 1 filter (valid padding)

The use of 'valid' padding in the first and last layers means the output dimensions are smaller than input dimensions. Specifically, the network reduces dimensions by:
- First layer: reduces by 8 pixels (4 on each side for 9×9 kernel)
- Last layer: reduces by 4 pixels (2 on each side for 5×5 kernel)
- **Total reduction: 12 pixels per dimension**

#### 2. **Scale Factor Consistency**
Super-resolution works by learning mappings at specific scale factors (2×, 3×, 4×). The model expects:
- Input and output dimensions to have a consistent relationship
- Dimensions divisible by the scale factor to avoid fractional pixel mapping
- Proper alignment for accurate reconstruction

#### 3. **Training Data Compatibility**
The SRCNN was trained on image patches where:
- All dimensions were divisible by the scale factor
- Consistent cropping was applied during training
- Using the same preprocessing ensures the model performs optimally

#### 4. **Practical Example**
```python
# Original image: (176, 197, 3)
# Scale factor: 3

# Without modcrop - problematic:
# 176 % 3 = 2 (not divisible)
# 197 % 3 = 2 (not divisible)

# With modcrop - proper alignment:
height = 176 - (176 % 3) = 174 # Divisible by 3
width = 197 - (197 % 3) = 195 # Divisible by 3
# Result: (174, 195, 3) - only 2 pixels lost per dimension
```

#### 5. **Alternative Approaches**
Instead of cropping, other super-resolution methods use:
- **Padding**: Add pixels to make dimensions divisible (but can introduce artifacts)
- **Reflection padding**: Mirror edge pixels (used in some modern networks)
- **Zero padding**: Add black pixels (simple but can create borders)

SRCNN uses cropping because it's simple, effective, and matches the training methodology.

### Q: Does cropping affect image quality significantly?

**A:** The impact is minimal:
- Typically removes only 1-3 pixels per dimension
- Represents <2% of total image area for most images
- The super-resolution improvement far outweighs the minor crop loss
- Edge pixels often contain less critical information than central regions

### Q: Can I modify the network to avoid cropping?

**A:** Possible but not recommended:
- Would require retraining the entire network
- Different padding strategies would change learned features
- The original SRCNN paper's results are based on this cropping approach
- Modern alternatives (ESRGAN, Real-ESRGAN) handle arbitrary dimensions better
138 changes: 138 additions & 0 deletions modcrop_demo.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
#!/usr/bin/env python3
"""
Demonstration of the modcrop function and why it's necessary for SRCNN.

This script helps understand:
1. How modcrop works mathematically
2. Why cropping is needed for SRCNN architecture
3. The impact of different scale factors
4. Visual comparison of before/after dimensions

Run this script to see modcrop in action!
"""

import numpy as np

def modcrop(img, scale):
"""
Crop image to make dimensions divisible by scale factor.

Args:
img (numpy.ndarray): Input image with shape (H, W, C)
scale (int): Scale factor (typically 3 for SRCNN)

Returns:
numpy.ndarray: Cropped image with dimensions divisible by scale
"""
tmpsz = img.shape
sz = tmpsz[0:2] # Get height and width
sz = sz - np.mod(sz, scale) # Remove remainder when divided by scale
img = img[0:sz[0], 0:sz[1]] # Crop to new dimensions
return img

def analyze_modcrop(height, width, scale):
"""Analyze what modcrop will do to given dimensions."""
print(f"\n--- Analysis for {height}x{width} image with scale factor {scale} ---")

# Calculate remainders
h_remainder = height % scale
w_remainder = width % scale

# Calculate new dimensions
new_height = height - h_remainder
new_width = width - w_remainder

# Calculate pixels lost
h_lost = h_remainder
w_lost = w_remainder
total_pixels_before = height * width
total_pixels_after = new_height * new_width
pixels_lost = total_pixels_before - total_pixels_after
percent_lost = (pixels_lost / total_pixels_before) * 100

print(f"Original dimensions: {height} x {width} = {total_pixels_before:,} pixels")
print(f"Scale factor: {scale}")
print(f"Height: {height} % {scale} = {h_remainder} remainder → crop {h_lost} pixels")
print(f"Width: {width} % {scale} = {w_remainder} remainder → crop {w_lost} pixels")
print(f"New dimensions: {new_height} x {new_width} = {total_pixels_after:,} pixels")
print(f"Pixels lost: {pixels_lost:,} ({percent_lost:.2f}% of image)")

# Check if dimensions are now divisible
print(f"Verification: {new_height} % {scale} = {new_height % scale}, {new_width} % {scale} = {new_width % scale}")
print("✓ Both dimensions now divisible by scale factor!" if (new_height % scale == 0 and new_width % scale == 0) else "✗ Error in calculation!")

def demonstrate_srcnn_constraints():
"""Demonstrate why SRCNN needs specific dimension constraints."""
print("\n" + "="*60)
print("SRCNN ARCHITECTURE CONSTRAINTS DEMONSTRATION")
print("="*60)

print("\nSRCNN Network Architecture:")
print("1. Patch Extraction: 9×9 conv, 128 filters, 'valid' padding → reduces by 8 pixels")
print("2. Non-linear Mapping: 3×3 conv, 64 filters, 'same' padding → no size change")
print("3. Reconstruction: 5×5 conv, 1 filter, 'valid' padding → reduces by 4 pixels")
print("Total size reduction: 8 + 0 + 4 = 12 pixels per dimension")

print("\nExample with a 100×100 input image:")
input_size = 100
after_layer1 = input_size - 8 # 9x9 valid padding
after_layer2 = after_layer1 # 3x3 same padding
after_layer3 = after_layer2 - 4 # 5x5 valid padding

print(f"Input: {input_size}×{input_size}")
print(f"After layer 1 (9×9 valid): {after_layer1}×{after_layer1}")
print(f"After layer 2 (3×3 same): {after_layer2}×{after_layer2}")
print(f"After layer 3 (5×5 valid): {after_layer3}×{after_layer3}")
print(f"Final output: {after_layer3}×{after_layer3}")

print(f"\nThis is why the network expects specific input/output size relationships!")

def main():
"""Main demonstration function."""
print("MODCROP FUNCTION DEMONSTRATION")
print("="*50)
print("Understanding why we crop images before SRCNN processing")

# Test cases from common image dimensions
test_cases = [
(176, 197, 3), # User's original example
(512, 512, 3), # Square power-of-2 image
(1920, 1080, 3), # HD video frame
(224, 224, 3), # Common ML input size
(100, 150, 2), # Small test image with scale 2
]

for height, width, scale in test_cases:
analyze_modcrop(height, width, scale)

# Demonstrate actual modcrop function
print("\n" + "="*60)
print("PRACTICAL MODCROP DEMONSTRATION")
print("="*60)

# Create test images and apply modcrop
print("\nTesting modcrop function with numpy arrays:")

test_image = np.random.randint(0, 255, (176, 197, 3), dtype=np.uint8)
print(f"Original test image shape: {test_image.shape}")

for scale in [2, 3, 4]:
cropped = modcrop(test_image, scale)
pixels_lost = test_image.size - cropped.size
percent_lost = (pixels_lost / test_image.size) * 100
print(f"Scale {scale}: {test_image.shape} → {cropped.shape} (lost {pixels_lost:,} pixels, {percent_lost:.2f}%)")

# Show SRCNN constraints
demonstrate_srcnn_constraints()

print("\n" + "="*60)
print("CONCLUSION")
print("="*60)
print("✓ Modcrop ensures dimensions are divisible by scale factor")
print("✓ Required for SRCNN architecture compatibility")
print("✓ Minimal impact on image content (<2% pixels typically)")
print("✓ Essential for proper super-resolution processing")
print("\nThe small crop is worth it for the significant quality improvement!")

if __name__ == "__main__":
main()