Skip to content

PabloBotinGP/SLD-LLM-JSON

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SolarAPP LLM JSON Extraction API

Extracts structured equipment information (inverters, modules, racking systems) from solar installation documentation using OpenAI's GPT-4 vision API with LangGraph orchestration.

Table of Contents


Environment Setup

Prerequisites

Option 1: Using environment.yml (Recommended)

This method creates an exact replica of the development environment:

conda env create -f environment.yml
conda activate openai-env

Option 2: Using requirements.txt

If you already have a Python 3.12+ environment:

conda activate openai-env  # or your env name
pip install -r requirements.txt

Option 3: Manual Setup

conda create -n openai-env python=3.12
conda activate openai-env
pip install -r requirements.txt

🔑 OpenAI API Key Setup

Each user must provide their own personal API key. Never commit keys to GitHub.

macOS / Linux (bash / zsh)

Add to ~/.zshrc or ~/.bashrc:

export OPENAI_API_KEY=sk-your-key-here

Then reload:

source ~/.zshrc  # or ~/.bashrc

Windows (PowerShell)

setx OPENAI_API_KEY "sk-your-key-here"

Then restart your terminal.

Verify Setup

echo $OPENAI_API_KEY      # macOS/Linux
echo $Env:OPENAI_API_KEY  # Windows

📁 Project Structure

API/
├── readme.md                          # This file
├── requirements.txt                   # Python dependencies (pinned versions)
├── environment.yml                    # Conda environment definition
├── .gitignore                         # Git ignore rules
│
├── output files/                      # Extraction results
│   └── extracted_fields-*.json        # Timestamped results
│
├── prompts/                           # OpenAI prompt templates
│   └── prompt1.py                     # Prompt helper script
│
├── scripts/                           # Main execution scripts
│
└── src/                               # Shared libraries

Running Scripts Examples

Main Extraction Script

cd scripts/
python run_extraction.py

Optional flags:

  • --prompt-id <ID> - Custom prompt ID
  • --file-id <ID> - Custom file ID for extraction
  • --dry-run - Test without API calls

Upload PDF

python upload_pdf.py <pdf_path>

Fuzzy Matching (PHP)

php fuzzymatch_Jaro-Winkler.php      # Jaro-Winkler algorithm
php fuzzymatch_combined.php           # Multi-algorithm composite

Fuzzy Matching

Architecture

All fuzzy matching algorithms are centralized in src/fuzzymatch.php:

Matcher Classes:

  • JaroWinklerMatcher - Simple Jaro-Winkler matching
  • CompositeMatcher - Weighted multi-algorithm scoring

Display Utilities:

  • FuzzyMatchFunctions::displaySimpleResults() - Format Jaro-Winkler results
  • FuzzyMatchFunctions::displayCompositeResults() - Format composite results

Usage Example

// Simple Jaro-Winkler matching
$matcher = new JaroWinklerMatcher();
$results = $matcher->findMatches('EcoFlow', $companies, 5, 0.6);
FuzzyMatchFunctions::displaySimpleResults('EcoFlow', $results);

// Composite matching
$composite = new CompositeMatcher();
$results = $composite->fuzzyMatch('EcoFlow', $companies, 5, 0.3);
FuzzyMatchFunctions::displayCompositeResults('EcoFlow', $results);
?>

📝 Notes

  • All extraction results are saved to output files/ with timestamps
  • API calls are logged for debugging
  • Never hardcode API keys in scripts
  • Each user manages their own API key locally
  • Keep environment.yml and requirements.txt in sync

Resources


Last Updated: November 21, 2025

About

Python-based tool to analyze electrical permit documents (PDFs, including single-line diagrams) with OpenAI’s Responses API. Extracts structured info into clean JSON following a strict schema.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors