Extracts structured equipment information (inverters, modules, racking systems) from solar installation documentation using OpenAI's GPT-4 vision API with LangGraph orchestration.
- macOS/Linux with Conda installed
- Python 3.12
- OpenAI API Key (https://platform.openai.com/api-keys)
This method creates an exact replica of the development environment:
conda env create -f environment.yml
conda activate openai-envIf you already have a Python 3.12+ environment:
conda activate openai-env # or your env name
pip install -r requirements.txtconda create -n openai-env python=3.12
conda activate openai-env
pip install -r requirements.txtEach user must provide their own personal API key. Never commit keys to GitHub.
Add to ~/.zshrc or ~/.bashrc:
export OPENAI_API_KEY=sk-your-key-hereThen reload:
source ~/.zshrc # or ~/.bashrcsetx OPENAI_API_KEY "sk-your-key-here"Then restart your terminal.
echo $OPENAI_API_KEY # macOS/Linux
echo $Env:OPENAI_API_KEY # WindowsAPI/
├── readme.md # This file
├── requirements.txt # Python dependencies (pinned versions)
├── environment.yml # Conda environment definition
├── .gitignore # Git ignore rules
│
├── output files/ # Extraction results
│ └── extracted_fields-*.json # Timestamped results
│
├── prompts/ # OpenAI prompt templates
│ └── prompt1.py # Prompt helper script
│
├── scripts/ # Main execution scripts
│
└── src/ # Shared libraries
cd scripts/
python run_extraction.pyOptional flags:
--prompt-id <ID>- Custom prompt ID--file-id <ID>- Custom file ID for extraction--dry-run- Test without API calls
python upload_pdf.py <pdf_path>php fuzzymatch_Jaro-Winkler.php # Jaro-Winkler algorithm
php fuzzymatch_combined.php # Multi-algorithm compositeAll fuzzy matching algorithms are centralized in src/fuzzymatch.php:
Matcher Classes:
JaroWinklerMatcher- Simple Jaro-Winkler matchingCompositeMatcher- Weighted multi-algorithm scoring
Display Utilities:
FuzzyMatchFunctions::displaySimpleResults()- Format Jaro-Winkler resultsFuzzyMatchFunctions::displayCompositeResults()- Format composite results
// Simple Jaro-Winkler matching
$matcher = new JaroWinklerMatcher();
$results = $matcher->findMatches('EcoFlow', $companies, 5, 0.6);
FuzzyMatchFunctions::displaySimpleResults('EcoFlow', $results);
// Composite matching
$composite = new CompositeMatcher();
$results = $composite->fuzzyMatch('EcoFlow', $companies, 5, 0.3);
FuzzyMatchFunctions::displayCompositeResults('EcoFlow', $results);
?>- All extraction results are saved to
output files/with timestamps - API calls are logged for debugging
- Never hardcode API keys in scripts
- Each user manages their own API key locally
- Keep
environment.ymlandrequirements.txtin sync
Last Updated: November 21, 2025