-
Notifications
You must be signed in to change notification settings - Fork 0
Adding Processors
Memoria uses a processor system to support different social media platforms. If you want to add support for a new platform, you can create a custom processor.
-
Check if it's already supported - Run
./memoria.py --list-processorsto see available processors - Look for existing feature requests - Check the project's issue tracker
- Understand the export format - Download a sample export from the platform you want to support
Processors are automatically discovered from the processors/ directory. Each processor handles a specific platform or export format.
- Python knowledge (implementing abstract classes)
- Understanding of the platform's export format (JSON, HTML, file structure)
- Familiarity with EXIF metadata concepts
- Test export data from the platform
- Create a new directory in
processors/(e.g.,processors/my_platform/) - Implement the
ProcessorBaseabstract class fromprocessors/base.py - Add detection logic to identify your platform's exports
- Implement the processing logic to extract metadata and copy files
- Test thoroughly with real export data
Every processor must provide:
- Detection method: How to identify if an export is from this platform
- Name: Human-readable platform name
- Priority: Execution order when multiple processors match
- Processing logic: Extract metadata, copy files, embed EXIF data
Look at existing processors as references:
-
processors/google_photos/- Complex with preprocessing and deduplication -
processors/instagram_messages/- HTML parsing and conversation handling -
processors/snapchat_memories/- Overlay embedding and media matching
# Test detection
./memoria.py /path/to/test/export --verbose
# Verify it was discovered
./memoria.py --list-processors
# Process a test export
./memoria.py /path/to/test/export -o /output/test --verboseFor detailed implementation guidance:
-
Read
processors/base.py- Contains the abstract class and detailed comments -
Review
CONTRIBUTING.md- Coding standards and project structure - Study existing processors - See how they handle detection, metadata extraction, and EXIF embedding
- Check inline documentation - Processors have detailed docstrings
- Detection specificity: Avoid false positives by checking for unique file/structure patterns
-
Error handling: Return
Falseon failure, don't raise unhandled exceptions - Thread safety: Processing may be multi-threaded
- EXIF standards: Use standard EXIF tags that work across photo management systems
- Documentation: Document the expected export structure in your processor
WARNING: You must be extremely careful with filesystem timestamps to avoid data corruption.
The Problem: When a platform doesn't provide metadata files (JSON, EXIF, etc.), filesystem modification times may be the only available timestamp source. However, any file operation (copy, move, touch) updates the modification time to "now", permanently destroying the original timestamp.
What happens if you get this wrong:
# WRONG - Don't do this!
def process(input_dir, output_dir):
for file in input_dir.glob("*.jpg"):
output_file = output_dir / file.name
shutil.copy2(file, output_file) # File copied
timestamp = output_file.stat().st_mtime # ✗ This is NOW, not original!
embed_exif(output_file, timestamp) # ✗ Wrong date embedded!Result: All photos dated to the processing date instead of capture date.
Correct approach:
# CORRECT - Read timestamps BEFORE file operations
def preprocess(input_dir):
metadata = []
for file in input_dir.glob("*.jpg"):
# Read filesystem timestamp BEFORE copying
timestamp = file.stat().st_mtime
metadata.append({
'path': file,
'timestamp': timestamp
})
return metadata
def process(input_dir, output_dir):
metadata = preprocess(input_dir) # Get timestamps first
for item in metadata:
output_file = output_dir / item['path'].name
shutil.copy2(item['path'], output_file)
# Use stored timestamp, not current file time
embed_exif(output_file, item['timestamp']) # ✓ Correct!Key Rules:
- Read ALL filesystem timestamps during preprocessing/metadata extraction
- Store timestamps in your metadata structures
- NEVER read filesystem times after copying or modifying files
- Use stored timestamps for EXIF embedding
Platforms where this matters:
- Instagram Old Format (uses file mtimes as fallback)
- Any export lacking JSON/metadata files
- Exports with incomplete metadata
See also:
- Look at
processors/instagram_old_public_media/preprocess.pyfor a real example - See Design Decisions for detailed rationale
If you create a processor for a new platform:
- Test it thoroughly with multiple export samples
- Document the platform's export process (how users get the export)
- Include examples of the expected directory structure
- Submit a pull request with your processor and documentation
See CONTRIBUTING.md for contribution guidelines.
- Review existing processors for patterns and examples
- Check processor base class documentation in
processors/base.py - Open an issue to discuss your approach before implementing
- Design Decisions - Understand Memoria's architecture
- Usage - Command-line options for testing
- Getting Started - Development environment setup