Skip to content

Production-ready FastAPI service that converts Credit Reports (PDF format) into structured JSON data using CreditGraph AI patterns with automatic PII scrubbing for data privacy.

License

Notifications You must be signed in to change notification settings

ibernabel/creditgraph-parser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

36 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

CreditGraph Parser API

License Python FastAPI Docker

Production-ready FastAPI service that converts Credit Reports (PDF format) into structured JSON data using CreditGraph AI patterns with automatic PII scrubbing for data privacy.


โœจ Features

  • โœ… PDF Parsing - Extract data from credit report PDFs
  • โœ… Structured Output - Validated JSON with Pydantic models
  • โœ… PII Scrubbing - Automatic masking of sensitive personal information
  • โœ… Multi-currency Support - Handles DOP and USD accounts
  • โœ… Structured Logging - JSON-formatted logs for easy parsing
  • โœ… System Monitoring - Built-in CPU, memory, and disk usage tracking
  • โœ… Automated Backups - Daily backups with 7-day retention
  • โœ… Docker Ready - Production-ready containers with health checks
  • โœ… Interactive API Docs - Auto-generated Swagger/ReDoc documentation
  • โœ… Interactive CLI - Robust command-line interface with rich formatting

๐Ÿ“‹ Table of Contents


๐Ÿš€ Quick Start

Using Docker (Recommended)

# Clone repository
git clone https://github.com/your-username/creditgraph-parser.git
cd creditgraph-parser

# Start with Docker Compose
docker compose up

# API available at http://localhost:8000
# Docs at http://localhost:8000/docs

Local Development

# Clone repository
git clone https://github.com/your-username/creditgraph-parser.git
cd creditgraph-parser

# Install dependencies (Python 3.12+ required)
pip install -e .

# Start development server (API)
uv run python src/cli.py serve --reload

# Use the CLI to parse a file
uv run python src/cli.py parse --input path/to/report.pdf

๐Ÿ“ฆ Installation

Prerequisites

  • Python 3.12+ (or Docker)
  • System Dependencies: libmupdf-dev, swig (for PDF processing)

Local Development

1. System Dependencies (Ubuntu/Debian)

sudo apt update
sudo apt install -y python3.12 python3.12-dev build-essential libmupdf-dev swig

2. Python Environment

# Create virtual environment
python3.12 -m venv myenv
source myenv/bin/activate  # On Windows: myenv\Scripts\activate

# Install package in editable mode
pip install -e .

# Or install with dev dependencies
pip install -e ".[dev]"

3. Run the Application

# Development server with auto-reload
uvicorn src.main:app --reload --host 0.0.0.0 --port 8000

# Production server
uvicorn src.main:app --host 0.0.0.0 --port 8000 --workers 4

Docker

# Build image
docker build -t creditgraph-parser:1.0.0 .

# Run container
docker run -d -p 8000:8000 creditgraph-parser:1.0.0

# Or use Docker Compose
docker compose up -d               # Development
docker compose -f docker-compose.prod.yml up -d  # Production

๐Ÿ’ป Usage

API Endpoints

Endpoint Method Description
/ GET API information and navigation
/v1/health GET Health check endpoint
/v1/parse POST Upload PDF and get structured JSON
/docs GET Interactive Swagger documentation
/redoc GET ReDoc API documentation

Python Examples

import requests

# Parse credit report
with open('credit_report.pdf', 'rb') as f:
    response = requests.post(
        'http://localhost:8000/v1/parse',
        files={'file': f}
    )

# Get parsed data
data = response.json()

# Access structured data
print(f"Score: {data['score']['score']}")
print(f"Name: {data['personal_data']['first_names']}")
print(f"Accounts: {len(data['details_open_accounts'])}")

cURL Examples

# Health check
curl http://localhost:8000/v1/health

# Upload and parse PDF
curl -X POST "http://localhost:8000/v1/parse" \
  -H "accept: application/json" \
  -H "Content-Type: multipart/form-data" \
  -F "file=@credit_report.pdf"

### CLI Usage

The CLI provides a powerful way to interact with the engine without the API.

```bash
# Parse a PDF and save result to JSON
uv run python src/cli.py parse --input report.pdf --output result.json

# Parse and show pretty JSON in terminal
uv run python src/cli.py parse --input report.pdf

# Start the API server via CLI
uv run python src/cli.py serve --port 8000

---

## ๐Ÿ“š API Documentation

### Interactive Documentation

Once running, visit:

- **Swagger UI**: http://localhost:8000/docs
- **ReDoc**: http://localhost:8000/redoc

### Response Structure

```json
{
  "inquirer": {
    "subscriber": "BANCO EJEMPLO",
    "user": "Usuario123",
    "consultation_date": "2024-01-29",
    "consultation_time": "10:30 AM"
  },
  "personal_data": {
    "identification": "001-******* -7",
    "first_names": "Jo****",
    "last_names": "Do**",
    "birth_date": "1990-01-01",
    "age": 34
  },
  "score": {
    "score": 750,
    "factors": ["Payment history", "Credit utilization"]
  },
  "summary_open_accounts": [],
  "details_open_accounts": []
}

Note: PII is automatically scrubbed in responses.


โš™๏ธ Configuration

Environment Variables

Variable Description Default
DEBUG Enable debug mode 0
MAX_WORKERS Number of Uvicorn workers 4
LOG_LEVEL Logging level (info/debug/error) info
MAX_LOG_SIZE_MB Max log file size before rotation 100
BACKUP_RETENTION_DAYS Days to keep backups 7

Docker Compose Configuration

Create a .env file in the project root:

DEBUG=0
MAX_WORKERS=4
LOG_LEVEL=info

๐ŸŒ Deployment

VPS Deployment

See the comprehensive VPS Setup Guide for detailed instructions on:

  • Server hardening and security
  • Docker installation
  • SSL/TLS setup with Certbot
  • Nginx reverse proxy configuration
  • Monitoring and maintenance

Quick Deployment

# On your VPS
git clone https://github.com/your-username/creditgraph-parser.git
cd creditgraph-parser

# Build and start
docker compose -f docker-compose.prod.yml up -d

# Check status
docker compose -f docker-compose.prod.yml ps

๐Ÿ“Š Monitoring

Logs

The application uses structured JSON logging:

# View API logs
tail -f logs/api.log

# View system metrics
tail -f logs/monitoring.log

# Parse JSON logs (requires jq)
tail -f logs/api.log | jq '.'

System Metrics

Automatically logged every 60 seconds:

  • CPU usage percentage
  • Memory utilization
  • Disk usage
  • Request/response timing

Health Check

# Check application health
curl http://localhost:8000/v1/health

# Expected: {"status": "healthy"}

๐Ÿ› ๏ธ Development

Setup Development Environment

# Clone and install
git clone https://github.com/your-username/transunion-pdf-to-json.git
cd transunion-pdf-to-json
pip install -e ".[dev]"

# Run tests
pytest -v

# Run with code coverage
pytest --cov=src --cov-report=html

# Lint code
ruff check src/
black --check src/

# Format code
black src/

Running Tests

# Run all tests
pytest

# Run specific test file
pytest tests/test_core.py

# Run with verbose output
pytest -v --tb=short

# Run with coverage
pytest --cov=src

Code Quality

# Type checking with mypy
mypy src/

# Linting with ruff
ruff check src/

# Code formatting with black
black src/

๐Ÿ“ Project Structure

creditgraph-parser/
โ”œโ”€โ”€ src/
โ”‚   โ”œโ”€โ”€ api/                    # API routes and endpoints
โ”‚   โ”‚   โ””โ”€โ”€ routes.py
โ”‚   โ”œโ”€โ”€ models/                 # Pydantic data models
โ”‚   โ”‚   โ””โ”€โ”€ report.py
โ”‚   โ”œโ”€โ”€ parser/                 # PDF parsing engine
โ”‚   โ”‚   โ””โ”€โ”€ engine.py
โ”‚   โ”œโ”€โ”€ scrubber/               # PII scrubbing service
โ”‚   โ”‚   โ””โ”€โ”€ service.py
โ”‚   โ”œโ”€โ”€ middleware/             # Request/response middleware
โ”‚   โ”‚   โ””โ”€โ”€ logging_middleware.py
โ”‚   โ”œโ”€โ”€ utils/                  # Utilities (logging, backup)
โ”‚   โ”‚   โ”œโ”€โ”€ logging_config.py
โ”‚   โ”‚   โ””โ”€โ”€ backup.py
โ”‚   โ”œโ”€โ”€ main.py                 # Application entry point
โ”‚   โ”œโ”€โ”€ cli.py                  # Interactive CLI entry point
โ”‚   โ””โ”€โ”€ maintenance.py          # Maintenance scheduler
โ”œโ”€โ”€ tests/                      # Test suite
โ”œโ”€โ”€ docs/                       # Documentation
โ”‚   โ”œโ”€โ”€ deployment/             # Deployment guides
โ”‚   โ””โ”€โ”€ implementation/         # Technical documentation
โ”œโ”€โ”€ logs/                       # Application logs (gitignored)
โ”œโ”€โ”€ backups/                    # Automated backups (gitignored)
โ”œโ”€โ”€ Dockerfile                  # Production Docker image
โ”œโ”€โ”€ docker-compose.yml          # Development compose
โ”œโ”€โ”€ docker-compose.prod.yml     # Production compose
โ”œโ”€โ”€ pyproject.toml              # Python package configuration
โ””โ”€โ”€ README.md                   # This file

๐Ÿ—บ๏ธ Roadmap

We're continuously improving the CreditGraph parser with new features and enhancements!

Upcoming Features

Phase 2 (v1.1.0) - CI/CD & Quality ๐ŸŸก

  • GitHub Actions CI/CD pipeline
  • Automated testing and deployment
  • 90%+ code coverage
  • Enhanced test suite

Phase 3 (v1.2.0) - Authentication & Security ๐Ÿ”ด

  • API key authentication
  • Rate limiting
  • OAuth2 support (optional)
  • Enhanced security features

Phase 4 (v2.0.0) - Data Persistence ๐Ÿ”ด

  • PostgreSQL database integration
  • Store and manage parsed reports
  • Full-text search
  • Analytics dashboard

Phase 5 (v2.1.0) - Monitoring ๐Ÿ”ด

  • Grafana dashboards
  • Prometheus metrics
  • APM integration
  • Advanced alerting

Future Considerations

  • Machine learning for improved parsing
  • Multi-bureau support (Equifax, Experian)
  • Batch processing capabilities
  • GraphQL API

See the full ROADMAP.md for detailed plans, timelines, and technical specifications.


๐Ÿค Contributing

Contributions are welcome! Please see CONTRIBUTING.md for details.

Quick Contribution Guide

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Make your changes
  4. Add tests for new functionality
  5. Ensure tests pass (pytest)
  6. Commit your changes (git commit -m 'feat: add amazing feature')
  7. Push to the branch (git push origin feature/amazing-feature)
  8. Open a Pull Request

๐Ÿ“„ License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.


๐Ÿ™ Acknowledgments


๐Ÿ“ž Support

If you encounter issues:

  1. Check the documentation
  2. Review existing GitHub Issues
  3. Create a new issue with details

โš ๏ธ Legal Disclaimer

TransUnionยฎ is a registered trademark of TransUnion LLC.

This project is an independent, open-source tool developed for educational and private purposes only. It is not affiliated with, endorsed by, or connected to TransUnion LLC or any of its subsidiaries. This tool is designed as a template for credit data extraction and does not constitute a bureau service.

The software is provided "as is", without warranty of any kind. The developers and contributors of this project assume no liability for any direct or indirect damages resulting from the use of this software. Users are responsible for ensuring their use of this tool complies with all applicable local laws, including Ley 172-13 in the Dominican Republic. Any automated extraction of data should be done with the express consent of the data owner and in accordance with the terms of service of the original data provider.


Made with โค๏ธ by Idequel Bernabel

About

Production-ready FastAPI service that converts Credit Reports (PDF format) into structured JSON data using CreditGraph AI patterns with automatic PII scrubbing for data privacy.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •