Production-ready FastAPI service that converts Credit Reports (PDF format) into structured JSON data using CreditGraph AI patterns with automatic PII scrubbing for data privacy.
- โ PDF Parsing - Extract data from credit report PDFs
- โ Structured Output - Validated JSON with Pydantic models
- โ PII Scrubbing - Automatic masking of sensitive personal information
- โ Multi-currency Support - Handles DOP and USD accounts
- โ Structured Logging - JSON-formatted logs for easy parsing
- โ System Monitoring - Built-in CPU, memory, and disk usage tracking
- โ Automated Backups - Daily backups with 7-day retention
- โ Docker Ready - Production-ready containers with health checks
- โ Interactive API Docs - Auto-generated Swagger/ReDoc documentation
- โ
Interactive CLI - Robust command-line interface with
richformatting
- Quick Start
- Installation
- Usage
- API Documentation
- Configuration
- Deployment
- Monitoring
- Development
- Project Structure
- Roadmap
- Contributing
- License
# Clone repository
git clone https://github.com/your-username/creditgraph-parser.git
cd creditgraph-parser
# Start with Docker Compose
docker compose up
# API available at http://localhost:8000
# Docs at http://localhost:8000/docs# Clone repository
git clone https://github.com/your-username/creditgraph-parser.git
cd creditgraph-parser
# Install dependencies (Python 3.12+ required)
pip install -e .
# Start development server (API)
uv run python src/cli.py serve --reload
# Use the CLI to parse a file
uv run python src/cli.py parse --input path/to/report.pdf- Python 3.12+ (or Docker)
- System Dependencies:
libmupdf-dev,swig(for PDF processing)
sudo apt update
sudo apt install -y python3.12 python3.12-dev build-essential libmupdf-dev swig# Create virtual environment
python3.12 -m venv myenv
source myenv/bin/activate # On Windows: myenv\Scripts\activate
# Install package in editable mode
pip install -e .
# Or install with dev dependencies
pip install -e ".[dev]"# Development server with auto-reload
uvicorn src.main:app --reload --host 0.0.0.0 --port 8000
# Production server
uvicorn src.main:app --host 0.0.0.0 --port 8000 --workers 4# Build image
docker build -t creditgraph-parser:1.0.0 .
# Run container
docker run -d -p 8000:8000 creditgraph-parser:1.0.0
# Or use Docker Compose
docker compose up -d # Development
docker compose -f docker-compose.prod.yml up -d # Production| Endpoint | Method | Description |
|---|---|---|
/ |
GET | API information and navigation |
/v1/health |
GET | Health check endpoint |
/v1/parse |
POST | Upload PDF and get structured JSON |
/docs |
GET | Interactive Swagger documentation |
/redoc |
GET | ReDoc API documentation |
import requests
# Parse credit report
with open('credit_report.pdf', 'rb') as f:
response = requests.post(
'http://localhost:8000/v1/parse',
files={'file': f}
)
# Get parsed data
data = response.json()
# Access structured data
print(f"Score: {data['score']['score']}")
print(f"Name: {data['personal_data']['first_names']}")
print(f"Accounts: {len(data['details_open_accounts'])}")# Health check
curl http://localhost:8000/v1/health
# Upload and parse PDF
curl -X POST "http://localhost:8000/v1/parse" \
-H "accept: application/json" \
-H "Content-Type: multipart/form-data" \
-F "file=@credit_report.pdf"
### CLI Usage
The CLI provides a powerful way to interact with the engine without the API.
```bash
# Parse a PDF and save result to JSON
uv run python src/cli.py parse --input report.pdf --output result.json
# Parse and show pretty JSON in terminal
uv run python src/cli.py parse --input report.pdf
# Start the API server via CLI
uv run python src/cli.py serve --port 8000
---
## ๐ API Documentation
### Interactive Documentation
Once running, visit:
- **Swagger UI**: http://localhost:8000/docs
- **ReDoc**: http://localhost:8000/redoc
### Response Structure
```json
{
"inquirer": {
"subscriber": "BANCO EJEMPLO",
"user": "Usuario123",
"consultation_date": "2024-01-29",
"consultation_time": "10:30 AM"
},
"personal_data": {
"identification": "001-******* -7",
"first_names": "Jo****",
"last_names": "Do**",
"birth_date": "1990-01-01",
"age": 34
},
"score": {
"score": 750,
"factors": ["Payment history", "Credit utilization"]
},
"summary_open_accounts": [],
"details_open_accounts": []
}
Note: PII is automatically scrubbed in responses.
| Variable | Description | Default |
|---|---|---|
DEBUG |
Enable debug mode | 0 |
MAX_WORKERS |
Number of Uvicorn workers | 4 |
LOG_LEVEL |
Logging level (info/debug/error) | info |
MAX_LOG_SIZE_MB |
Max log file size before rotation | 100 |
BACKUP_RETENTION_DAYS |
Days to keep backups | 7 |
Create a .env file in the project root:
DEBUG=0
MAX_WORKERS=4
LOG_LEVEL=infoSee the comprehensive VPS Setup Guide for detailed instructions on:
- Server hardening and security
- Docker installation
- SSL/TLS setup with Certbot
- Nginx reverse proxy configuration
- Monitoring and maintenance
# On your VPS
git clone https://github.com/your-username/creditgraph-parser.git
cd creditgraph-parser
# Build and start
docker compose -f docker-compose.prod.yml up -d
# Check status
docker compose -f docker-compose.prod.yml psThe application uses structured JSON logging:
# View API logs
tail -f logs/api.log
# View system metrics
tail -f logs/monitoring.log
# Parse JSON logs (requires jq)
tail -f logs/api.log | jq '.'Automatically logged every 60 seconds:
- CPU usage percentage
- Memory utilization
- Disk usage
- Request/response timing
# Check application health
curl http://localhost:8000/v1/health
# Expected: {"status": "healthy"}# Clone and install
git clone https://github.com/your-username/transunion-pdf-to-json.git
cd transunion-pdf-to-json
pip install -e ".[dev]"
# Run tests
pytest -v
# Run with code coverage
pytest --cov=src --cov-report=html
# Lint code
ruff check src/
black --check src/
# Format code
black src/# Run all tests
pytest
# Run specific test file
pytest tests/test_core.py
# Run with verbose output
pytest -v --tb=short
# Run with coverage
pytest --cov=src# Type checking with mypy
mypy src/
# Linting with ruff
ruff check src/
# Code formatting with black
black src/creditgraph-parser/
โโโ src/
โ โโโ api/ # API routes and endpoints
โ โ โโโ routes.py
โ โโโ models/ # Pydantic data models
โ โ โโโ report.py
โ โโโ parser/ # PDF parsing engine
โ โ โโโ engine.py
โ โโโ scrubber/ # PII scrubbing service
โ โ โโโ service.py
โ โโโ middleware/ # Request/response middleware
โ โ โโโ logging_middleware.py
โ โโโ utils/ # Utilities (logging, backup)
โ โ โโโ logging_config.py
โ โ โโโ backup.py
โ โโโ main.py # Application entry point
โ โโโ cli.py # Interactive CLI entry point
โ โโโ maintenance.py # Maintenance scheduler
โโโ tests/ # Test suite
โโโ docs/ # Documentation
โ โโโ deployment/ # Deployment guides
โ โโโ implementation/ # Technical documentation
โโโ logs/ # Application logs (gitignored)
โโโ backups/ # Automated backups (gitignored)
โโโ Dockerfile # Production Docker image
โโโ docker-compose.yml # Development compose
โโโ docker-compose.prod.yml # Production compose
โโโ pyproject.toml # Python package configuration
โโโ README.md # This file
We're continuously improving the CreditGraph parser with new features and enhancements!
Phase 2 (v1.1.0) - CI/CD & Quality ๐ก
- GitHub Actions CI/CD pipeline
- Automated testing and deployment
- 90%+ code coverage
- Enhanced test suite
Phase 3 (v1.2.0) - Authentication & Security ๐ด
- API key authentication
- Rate limiting
- OAuth2 support (optional)
- Enhanced security features
Phase 4 (v2.0.0) - Data Persistence ๐ด
- PostgreSQL database integration
- Store and manage parsed reports
- Full-text search
- Analytics dashboard
Phase 5 (v2.1.0) - Monitoring ๐ด
- Grafana dashboards
- Prometheus metrics
- APM integration
- Advanced alerting
- Machine learning for improved parsing
- Multi-bureau support (Equifax, Experian)
- Batch processing capabilities
- GraphQL API
See the full ROADMAP.md for detailed plans, timelines, and technical specifications.
Contributions are welcome! Please see CONTRIBUTING.md for details.
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes
- Add tests for new functionality
- Ensure tests pass (
pytest) - Commit your changes (
git commit -m 'feat: add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
- Built with FastAPI
- PDF processing powered by PyMuPDF
- Data validation with Pydantic
- CLI powered by Typer and Rich
If you encounter issues:
- Check the documentation
- Review existing GitHub Issues
- Create a new issue with details
TransUnionยฎ is a registered trademark of TransUnion LLC.
This project is an independent, open-source tool developed for educational and private purposes only. It is not affiliated with, endorsed by, or connected to TransUnion LLC or any of its subsidiaries. This tool is designed as a template for credit data extraction and does not constitute a bureau service.
The software is provided "as is", without warranty of any kind. The developers and contributors of this project assume no liability for any direct or indirect damages resulting from the use of this software. Users are responsible for ensuring their use of this tool complies with all applicable local laws, including Ley 172-13 in the Dominican Republic. Any automated extraction of data should be done with the express consent of the data owner and in accordance with the terms of service of the original data provider.
Made with โค๏ธ by Idequel Bernabel