CreditGraph Parser API

Production-ready FastAPI service that converts Credit Reports (PDF format) into structured JSON data using CreditGraph AI patterns with automatic PII scrubbing for data privacy.

✨ Features

✅ PDF Parsing - Extract data from credit report PDFs
✅ Structured Output - Validated JSON with Pydantic models
✅ PII Scrubbing - Automatic masking of sensitive personal information
✅ Multi-currency Support - Handles DOP and USD accounts
✅ Structured Logging - JSON-formatted logs for easy parsing
✅ System Monitoring - Built-in CPU, memory, and disk usage tracking
✅ Automated Backups - Daily backups with 7-day retention
✅ Docker Ready - Production-ready containers with health checks
✅ Interactive API Docs - Auto-generated Swagger/ReDoc documentation
✅ Interactive CLI - Robust command-line interface with rich formatting

📋 Table of Contents

Quick Start
Installation
- Local Development
- Docker
Usage
API Documentation
Configuration
Deployment
Monitoring
Development
Project Structure
Roadmap
Contributing
License

🚀 Quick Start

Using Docker (Recommended)

# Clone repository
git clone https://github.com/your-username/creditgraph-parser.git
cd creditgraph-parser

# Start with Docker Compose
docker compose up

# API available at http://localhost:8000
# Docs at http://localhost:8000/docs

Local Development

# Clone repository
git clone https://github.com/your-username/creditgraph-parser.git
cd creditgraph-parser

# Install dependencies (Python 3.12+ required)
pip install -e .

# Start development server (API)
uv run python src/cli.py serve --reload

# Use the CLI to parse a file
uv run python src/cli.py parse --input path/to/report.pdf

📦 Installation

Prerequisites

Python 3.12+ (or Docker)
System Dependencies: libmupdf-dev, swig (for PDF processing)

Local Development

1. System Dependencies (Ubuntu/Debian)

sudo apt update
sudo apt install -y python3.12 python3.12-dev build-essential libmupdf-dev swig

2. Python Environment

# Create virtual environment
python3.12 -m venv myenv
source myenv/bin/activate  # On Windows: myenv\Scripts\activate

# Install package in editable mode
pip install -e .

# Or install with dev dependencies
pip install -e ".[dev]"

3. Run the Application

# Development server with auto-reload
uvicorn src.main:app --reload --host 0.0.0.0 --port 8000

# Production server
uvicorn src.main:app --host 0.0.0.0 --port 8000 --workers 4

Docker

# Build image
docker build -t creditgraph-parser:1.0.0 .

# Run container
docker run -d -p 8000:8000 creditgraph-parser:1.0.0

# Or use Docker Compose
docker compose up -d               # Development
docker compose -f docker-compose.prod.yml up -d  # Production

💻 Usage

API Endpoints

Endpoint	Method	Description
`/`	GET	API information and navigation
`/v1/health`	GET	Health check endpoint
`/v1/parse`	POST	Upload PDF and get structured JSON
`/docs`	GET	Interactive Swagger documentation
`/redoc`	GET	ReDoc API documentation

Python Examples

import requests

# Parse credit report
with open('credit_report.pdf', 'rb') as f:
    response = requests.post(
        'http://localhost:8000/v1/parse',
        files={'file': f}
    )

# Get parsed data
data = response.json()

# Access structured data
print(f"Score: {data['score']['score']}")
print(f"Name: {data['personal_data']['first_names']}")
print(f"Accounts: {len(data['details_open_accounts'])}")

cURL Examples

# Health check
curl http://localhost:8000/v1/health

# Upload and parse PDF
curl -X POST "http://localhost:8000/v1/parse" \
  -H "accept: application/json" \
  -H "Content-Type: multipart/form-data" \
  -F "file=@credit_report.pdf"

### CLI Usage

The CLI provides a powerful way to interact with the engine without the API.

```bash
# Parse a PDF and save result to JSON
uv run python src/cli.py parse --input report.pdf --output result.json

# Parse and show pretty JSON in terminal
uv run python src/cli.py parse --input report.pdf

# Start the API server via CLI
uv run python src/cli.py serve --port 8000


---

## 📚 API Documentation

### Interactive Documentation

Once running, visit:

- **Swagger UI**: http://localhost:8000/docs
- **ReDoc**: http://localhost:8000/redoc

### Response Structure

```json
{
  "inquirer": {
    "subscriber": "BANCO EJEMPLO",
    "user": "Usuario123",
    "consultation_date": "2024-01-29",
    "consultation_time": "10:30 AM"
  },
  "personal_data": {
    "identification": "001-******* -7",
    "first_names": "Jo****",
    "last_names": "Do**",
    "birth_date": "1990-01-01",
    "age": 34
  },
  "score": {
    "score": 750,
    "factors": ["Payment history", "Credit utilization"]
  },
  "summary_open_accounts": [],
  "details_open_accounts": []
}

Note: PII is automatically scrubbed in responses.

⚙️ Configuration

Environment Variables

Variable	Description	Default
`DEBUG`	Enable debug mode	`0`
`MAX_WORKERS`	Number of Uvicorn workers	`4`
`LOG_LEVEL`	Logging level (info/debug/error)	`info`
`MAX_LOG_SIZE_MB`	Max log file size before rotation	`100`
`BACKUP_RETENTION_DAYS`	Days to keep backups	`7`

Docker Compose Configuration

Create a .env file in the project root:

DEBUG=0
MAX_WORKERS=4
LOG_LEVEL=info

🌐 Deployment

VPS Deployment

See the comprehensive VPS Setup Guide for detailed instructions on:

Server hardening and security
Docker installation
SSL/TLS setup with Certbot
Nginx reverse proxy configuration
Monitoring and maintenance

Quick Deployment

# On your VPS
git clone https://github.com/your-username/creditgraph-parser.git
cd creditgraph-parser

# Build and start
docker compose -f docker-compose.prod.yml up -d

# Check status
docker compose -f docker-compose.prod.yml ps

📊 Monitoring

Logs

The application uses structured JSON logging:

# View API logs
tail -f logs/api.log

# View system metrics
tail -f logs/monitoring.log

# Parse JSON logs (requires jq)
tail -f logs/api.log | jq '.'

System Metrics

Automatically logged every 60 seconds:

CPU usage percentage
Memory utilization
Disk usage
Request/response timing

Health Check

# Check application health
curl http://localhost:8000/v1/health

# Expected: {"status": "healthy"}

🛠️ Development

Setup Development Environment

# Clone and install
git clone https://github.com/your-username/transunion-pdf-to-json.git
cd transunion-pdf-to-json
pip install -e ".[dev]"

# Run tests
pytest -v

# Run with code coverage
pytest --cov=src --cov-report=html

# Lint code
ruff check src/
black --check src/

# Format code
black src/

Running Tests

# Run all tests
pytest

# Run specific test file
pytest tests/test_core.py

# Run with verbose output
pytest -v --tb=short

# Run with coverage
pytest --cov=src

Code Quality

# Type checking with mypy
mypy src/

# Linting with ruff
ruff check src/

# Code formatting with black
black src/

📁 Project Structure

creditgraph-parser/
├── src/
│   ├── api/                    # API routes and endpoints
│   │   └── routes.py
│   ├── models/                 # Pydantic data models
│   │   └── report.py
│   ├── parser/                 # PDF parsing engine
│   │   └── engine.py
│   ├── scrubber/               # PII scrubbing service
│   │   └── service.py
│   ├── middleware/             # Request/response middleware
│   │   └── logging_middleware.py
│   ├── utils/                  # Utilities (logging, backup)
│   │   ├── logging_config.py
│   │   └── backup.py
│   ├── main.py                 # Application entry point
│   ├── cli.py                  # Interactive CLI entry point
│   └── maintenance.py          # Maintenance scheduler
├── tests/                      # Test suite
├── docs/                       # Documentation
│   ├── deployment/             # Deployment guides
│   └── implementation/         # Technical documentation
├── logs/                       # Application logs (gitignored)
├── backups/                    # Automated backups (gitignored)
├── Dockerfile                  # Production Docker image
├── docker-compose.yml          # Development compose
├── docker-compose.prod.yml     # Production compose
├── pyproject.toml              # Python package configuration
└── README.md                   # This file

🗺️ Roadmap

We're continuously improving the CreditGraph parser with new features and enhancements!

Upcoming Features

Phase 2 (v1.1.0) - CI/CD & Quality 🟡

GitHub Actions CI/CD pipeline
Automated testing and deployment
90%+ code coverage
Enhanced test suite

Phase 3 (v1.2.0) - Authentication & Security 🔴

API key authentication
Rate limiting
OAuth2 support (optional)
Enhanced security features

Phase 4 (v2.0.0) - Data Persistence 🔴

PostgreSQL database integration
Store and manage parsed reports
Full-text search
Analytics dashboard

Phase 5 (v2.1.0) - Monitoring 🔴

Grafana dashboards
Prometheus metrics
APM integration
Advanced alerting

Future Considerations

Machine learning for improved parsing
Multi-bureau support (Equifax, Experian)
Batch processing capabilities
GraphQL API

See the full ROADMAP.md for detailed plans, timelines, and technical specifications.

🤝 Contributing

Contributions are welcome! Please see CONTRIBUTING.md for details.

Quick Contribution Guide

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Make your changes
Add tests for new functionality
Ensure tests pass (pytest)
Commit your changes (git commit -m 'feat: add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

📄 License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

🙏 Acknowledgments

Built with FastAPI
PDF processing powered by PyMuPDF
Data validation with Pydantic
CLI powered by Typer and Rich

📞 Support

If you encounter issues:

Check the documentation
Review existing GitHub Issues
Create a new issue with details

⚠️ Legal Disclaimer

TransUnion® is a registered trademark of TransUnion LLC.

This project is an independent, open-source tool developed for educational and private purposes only. It is not affiliated with, endorsed by, or connected to TransUnion LLC or any of its subsidiaries. This tool is designed as a template for credit data extraction and does not constitute a bureau service.

The software is provided "as is", without warranty of any kind. The developers and contributors of this project assume no liability for any direct or indirect damages resulting from the use of this software. Users are responsible for ensuring their use of this tool complies with all applicable local laws, including Ley 172-13 in the Dominican Republic. Any automated extraction of data should be done with the express consent of the data owner and in accordance with the terms of service of the original data provider.

Made with ❤️ by Idequel Bernabel

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
debug		debug
docs		docs
legacy_backup		legacy_backup
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
ROADMAP.md		ROADMAP.md
docker-compose.prod.yml		docker-compose.prod.yml
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

License

ibernabel/creditgraph-parser

Folders and files

Latest commit

History

Repository files navigation

CreditGraph Parser API

✨ Features

📋 Table of Contents

🚀 Quick Start

Using Docker (Recommended)

Local Development

📦 Installation

Prerequisites

Local Development

1. System Dependencies (Ubuntu/Debian)

2. Python Environment

3. Run the Application

Docker

💻 Usage

API Endpoints

Python Examples

cURL Examples

⚙️ Configuration

Environment Variables

Docker Compose Configuration

🌐 Deployment

VPS Deployment

Quick Deployment

📊 Monitoring

Logs

System Metrics

Health Check

🛠️ Development

Setup Development Environment

Running Tests

Code Quality

📁 Project Structure

🗺️ Roadmap

Upcoming Features

Future Considerations

🤝 Contributing

Quick Contribution Guide

📄 License

🙏 Acknowledgments

📞 Support

⚠️ Legal Disclaimer

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages