A powerful Python utility to merge and deduplicate VCF (vCard) contact files with intelligent duplicate detection and property merging.
- Smart Duplicate Detection: Identifies duplicates based on normalized names, phone numbers, and email addresses
- Intelligent Property Merging: Combines contact information from multiple sources while preserving all data
- Robust File Handling: Supports various VCF formats and handles encoding issues gracefully
- Command-Line Interface: Easy-to-use CLI for batch processing multiple contact files
- Zero Dependencies: Uses only Python standard library - no external packages required
- Cross-Platform: Works on Windows, macOS, and Linux
- High Code Quality: Pylint score of 10/10 with comprehensive type hints and documentation
git clone https://github.com/fam007e/VCFmerger.git
cd VCFmerger
pip install -e .wget https://raw.githubusercontent.com/fam007e/VCFmerger/main/merge_script.py
python3 merge_script.py output.vcf input1.vcf input2.vcfAfter installation, you can use the vcf-merge command:
# Basic usage
vcf-merge merged_contacts.vcf contacts1.vcf contacts2.vcf contacts3.vcf
# Merge multiple backup files
vcf-merge all_contacts.vcf backup1.vcf backup2.vcf export.vcfpython3 merge_script.py output.vcf input1.vcf input2.vcf [additional_files...]from merge_script import VCFMerger
# Create merger instance
merger = VCFMerger()
# Read VCF files
with open('contacts1.vcf', 'r') as f1, open('contacts2.vcf', 'r') as f2:
vcf_contents = [f1.read(), f2.read()]
# Merge contacts
merged_vcf = merger.merge_vcfs(vcf_contents)
# Write result
with open('merged.vcf', 'w') as output:
output.write(merged_vcf)The merger uses a sophisticated key-based approach to identify duplicates:
- Name Normalization: Converts full names (FN) and structured names (N) to lowercase
- Phone Number Normalization: Strips formatting, keeping only digits and '+' prefix
- Email Normalization: Converts email addresses to lowercase
- Composite Key: Creates unique keys from normalized names, phone sets, and email sets
- Single-Value Properties: Later-processed files take priority (FN, N, ORG, TITLE, etc.)
- Multi-Value Properties: All values are preserved and combined (TEL, EMAIL, URL, ADR)
- Special Handling: PHOTO properties and quoted-printable encoding are handled correctly
- Names: FN (Full Name), N (Structured Name)
- Contact Info: TEL (Phone), EMAIL, URL
- Organization: ORG, TITLE
- Address: ADR (Address)
- Media: PHOTO (with multi-line support)
- Metadata: VERSION and custom properties
Input files:
contacts1.vcf:
BEGIN:VCARD
VERSION:3.0
FN:John Doe
TEL:+1234567890
EMAIL:john@example.com
END:VCARD
contacts2.vcf:
BEGIN:VCARD
VERSION:3.0
FN:John Doe
TEL:+1234567890
EMAIL:john.doe@work.com
ORG:Acme Corp
END:VCARD
Command:
vcf-merge merged.vcf contacts1.vcf contacts2.vcfResult: Single contact with both email addresses and organization information.
These contacts will be detected as duplicates:
TEL:+1 (555) 123-4567TEL:+15551234567TEL:555.123.4567
All normalize to +15551234567.
- Python 3.6 or later
- Git
# Clone the repository
git clone https://github.com/fam007e/VCFmerger.git
cd VCFmerger
# Install in development mode with dev dependencies
pip install -e .[dev]
# Run tests
pytest
# Check code quality
pylint merge_script.py
# Format code
black merge_script.py# Run all tests
pytest
# Run with coverage
pytest --cov=merge_script
# Run specific test
pytest test_*.pyThe project maintains high code quality standards:
# Pylint check (should score 10/10)
pylint merge_script.py
# Type checking
mypy merge_script.py
# Code formatting
black merge_script.py --checkVCFmerger/
├── merge_script.py # Main merger script
├── setup.py # Package installation script (legacy)
├── pyproject.toml # Modern Python project configuration
├── __init__.py # Package initialization
├── README.md # This file
├── LICENSE # MIT License
├── requirements.txt # Runtime dependencies (empty - no deps)
├── .gitignore # Git ignore rules
└── tests/ # Test files (if any)
├── test_*.py # Test modules
└── sample_data/ # Test data files
├── contacts1.vcf
└── contacts2.vcf
Contributions are welcome! We value your input and want to make contributing to this project as easy and transparent as possible.
Please read our Contributing Guidelines for details on our code of conduct, and the process for submitting pull requests to us.
- Code of Conduct: We are committed to providing a friendly, safe and welcoming environment for all. Please read and respect our Code of Conduct.
- Security: If you discover a security vulnerability, please see our Security Policy for reporting instructions.
Issue: "No valid input VCF files found"
- Solution: Check file paths and ensure VCF files contain valid vCard data
Issue: Encoding errors with special characters
- Solution: The script handles UTF-8 with error handling, but ensure your VCF files are properly encoded
Issue: Large files processing slowly
- Solution: The script processes files sequentially; consider splitting very large files
- Issues: Report bugs and request features on GitHub Issues
- Discussions: Join discussions on GitHub Discussions
- Email: Contact the maintainer at vcfmerger mail
- Smart duplicate detection based on names, phones, and emails
- Intelligent property merging with priority handling
- Command-line interface with multiple input support
- Python API for programmatic usage
- Comprehensive error handling and logging
- Cross-platform compatibility
- Zero external dependencies
- 10/10 pylint score with full type hints
This project is licensed under the MIT License - see the LICENSE file for details.
- Inspired by the need to merge contact backups from multiple sources
- Built with Python's robust standard library
- Thanks to the open-source community for feedback and contributions
Faisal Ahmed Moshiur
- GitHub: @fam007e
- Email: vcfmerger mail
⭐ Star this repository if you find it useful! ⭐