Skip to content

0x01-itsmurphy/fastapi_resume_parser

Repository files navigation

FastAPI Resume Parser

Python FastAPI License Code style: black

A modern, high-performance resume parsing API built with FastAPI that extracts structured information from PDF resumes using advanced NLP techniques.

🌟 Features

  • Advanced PDF Processing - Supports multiple PDF parsing methods (PyPDF, pdfminer.six)
  • NLP-Powered Extraction - Uses spaCy for intelligent text analysis
  • Comprehensive Data Extraction:
    • πŸ“§ Personal Information (name, email, phone)
    • πŸ”— Social Media Links (LinkedIn, GitHub)
    • πŸ’Ό Skills and Technologies
    • πŸŽ“ Education Details
    • πŸ—ΊοΈ Location and Address Information
    • 🌍 Languages
  • Modern FastAPI - Built with FastAPI 0.128.0 with automatic OpenAPI documentation
  • Type-Safe - Comprehensive type hints throughout
  • Production-Ready - Proper error handling, logging, and validation
  • Maintainable Architecture - Clear API, service, extractor, schema, and domain layers
  • Docker Support - Containerized for easy deployment
  • AWS Lambda Ready - Configured for serverless deployment

πŸš€ Quick Start

Prerequisites

  • Python 3.10+
  • pip

Installation

  1. Clone the repository

    git clone https://github.com/YOUR_USERNAME/fastapi_resume_parser.git
    cd fastapi_resume_parser
  2. Create virtual environment

    python3.11 -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
  3. Install dependencies

    pip install -r requirements.txt
  4. Download required NLTK data

    python -c "import nltk; nltk.download('punkt_tab'); nltk.download('averaged_perceptron_tagger_eng'); nltk.download('maxent_ne_chunker_tab'); nltk.download('stopwords'); nltk.download('words')"
  5. Run the server

    uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
  6. Test the API

    Visit http://localhost:8000/docs for interactive API documentation

πŸ“– API Documentation

Endpoints

Method Endpoint Description
GET / Root endpoint - API status
GET /health Health check endpoint
POST /v1/resumes/parse Parse resume and extract information
POST /parse Backward-compatible parse endpoint

Example Usage

Using cURL:

curl -X POST "http://localhost:8000/v1/resumes/parse" \
  -H "accept: application/json" \
  -H "Content-Type: multipart/form-data" \
  -F "file=@resume.pdf"

Using Python:

import requests

url = "http://localhost:8000/v1/resumes/parse"
files = {"file": open("resume.pdf", "rb")}
response = requests.post(url, files=files)
print(response.json())

Response Format

{
  "status": "success",
  "filename": "resume.pdf",
  "personal_info": {
    "name": "John Doe",
    "email": ["john.doe@example.com"],
    "phone_number": "+1234567890"
  },
  "social_links": {
    "linkedin": "linkedin.com/in/johndoe",
    "github": "johndoe"
  },
  "skills": ["Python", "FastAPI", "Machine Learning"],
  "education_details": {
    "courses": ["B.Tech"],
    "specializations": ["Computer Science"],
    "college": ["University of Technology"]
  },
  "languages": ["English"],
  "processing_info": {
    "text_length": 1250,
    "tokens_processed": 320,
    "entities_found": 15
  }
}

πŸ—οΈ Project Structure

fastapi_resume_parser/
β”œβ”€β”€ app/
β”‚   β”œβ”€β”€ core/
β”‚   β”‚   β”œβ”€β”€ config.py          # Configuration management
β”‚   β”‚   β”œβ”€β”€ errors.py          # Application-specific exceptions
β”‚   β”‚   └── logging.py         # Logging setup
β”‚   β”œβ”€β”€ api/
β”‚   β”‚   β”œβ”€β”€ dependencies.py    # FastAPI dependency factories
β”‚   β”‚   └── routes/            # HTTP route modules
β”‚   β”œβ”€β”€ domain/
β”‚   β”‚   └── models.py          # Internal parser result models
β”‚   β”œβ”€β”€ extractors/            # Focused resume field extractors
β”‚   β”œβ”€β”€ resources/             # Local parsing vocabularies
β”‚   β”œβ”€β”€ schemas/
β”‚   β”‚   └── responses.py       # Public API response models
β”‚   β”œβ”€β”€ services/              # Parser orchestration and infrastructure services
β”‚   β”œβ”€β”€ main.py                # FastAPI application
β”œβ”€β”€ tests/
β”‚   └── test_api.py            # API tests
β”œβ”€β”€ requirements.txt           # Production dependencies
β”œβ”€β”€ requirements-dev.txt       # Development dependencies
β”œβ”€β”€ pyproject.toml             # Tool configurations
β”œβ”€β”€ Dockerfile                 # Docker configuration
└── README.md                  # This file

πŸ§ͺ Testing

Run the test suite:

# Install development dependencies
pip install -r requirements-dev.txt

# Run tests
pytest

# Run tests with coverage
pytest --cov=app --cov-report=html

πŸ› οΈ Development

Code Quality

This project uses several tools to maintain code quality:

# Format code
black app/ tests/

# Sort imports
isort app/ tests/

# Lint code
flake8 app/ tests/

# Type checking
mypy app/

Pre-commit Hooks

Install pre-commit hooks to automatically check code quality:

pre-commit install

🐳 Docker

Build and Run

# Build the image
docker build -t fastapi-resume-parser .

# Run the container
docker run -p 8000:8000 fastapi-resume-parser

πŸš€ Deployment

Environment Variables

Create a .env file based on .env.example:

DEBUG=false
LOG_LEVEL=info
MAX_FILE_SIZE=10485760
CORS_ORIGINS=*
ALLOW_CREDENTIALS=false

Production Deployment

The app is ready to deploy as a standard ASGI service with Uvicorn, as a Docker container, or behind an API gateway using the included Mangum handler.

🀝 Contributing

Contributions are welcome! Please read CONTRIBUTING.md for details on our code of conduct and the process for submitting pull requests.

πŸ“ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

πŸ“§ Contact

For questions or support, please open an issue on GitHub.


Made with ❀️ using FastAPI and Python

About

Production-ready FastAPI resume parser with NLP-powered extraction. Features modular architecture, comprehensive testing, CI/CD pipeline, and Docker deployment.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors