📅 Period: Jan 2024 – Apr 2024 | Author: Bharghava Ram Vemuri
--- 🎯 Problem Statement
ESG (Environmental, Social, Governance) analysis requires reading hundreds of annual reports, sustainability filings, and news articles to assess corporate sustainability. Asset managers spend weeks per company doing this manually. Greenwashing is rampant — companies make vague sustainability claims that are difficult to verify. This platform ingests 100+ ESG documents, builds a FAISS semantic index, and uses GPT-4 + Claude with LangChain to answer complex ESG queries, score companies, detect greenwashing, and generate portfolio-level sustainability reports.
ESG Documents (PDF/text)
│
LangChain Document Loaders
│
Chunking + Embedding (text-embedding-ada-002)
│
FAISS Vector Index
│
┌────┴────────────────────────────────────┐
│ LangChain RetrievalQA Chain │
│ GPT-4 (primary) · Claude (validation) │
└────┬────────────────────────────────────┘
│
┌────┴──────────────────────────────────────┐
│ ESG Intelligence Layer │
│ Scoring · Greenwash Detection · Reports │
└───────────────────────────────────────────┘
green-alpha/
├── main.py
├── app/
│ ├── services/
│ │ ├── rag_service.py # LangChain RAG pipeline
│ │ ├── esg_service.py # ESG scoring + analysis
│ │ ├── greenwash_service.py # Greenwashing detection
│ │ └── report_service.py # Portfolio sustainability reports
│ └── api/routes/
│ ├── query.py
│ ├── score.py
│ └── reports.py
├── data/ # ESG document storage
├── tests/
├── Dockerfile
├── .env.example
└── requirements.txt
git clone https://github.com/bharghavaram/green-alpha.git
cd green-alpha
pip install -r requirements.txt
cp .env.example .env # Add OPENAI_API_KEY, ANTHROPIC_API_KEY
uvicorn main:app --reload| Component | Approach |
|---|---|
| Document Loading | LangChain PyPDFLoader + TextLoader |
| Chunking | RecursiveCharacterTextSplitter (chunk=1000, overlap=200) |
| Embeddings | text-embedding-ada-002 → FAISS L2 index |
| QA Chain | LangChain RetrievalQA with GPT-4 + source citations |
| ESG Scoring | 3-pillar scoring: E (0–100) + S (0–100) + G (0–100) |
| Greenwash Detection | Claim extraction → fact-check against verified metrics |
| Cross-validation | Claude independently validates GPT-4 ESG scores |
| Method | Endpoint | Description |
|---|---|---|
| POST | /ingest |
Ingest ESG documents |
| POST | /query |
RAG-powered ESG query |
| POST | /score |
ESG score for a company |
| POST | /greenwash |
Greenwashing risk assessment |
| POST | /report/portfolio |
Portfolio-level sustainability report |
Request:
curl -X POST "http://localhost:8000/score" \
-H "Content-Type: application/json" \
-d '{"company":"Microsoft","year":2024}'Response:
{
"company": "Microsoft",
"esg_scores": {
"environmental": 78,
"social": 82,
"governance": 91,
"composite": 83.7
},
"strengths": ["Carbon negative by 2030 commitment", "100% renewable energy by 2025"],
"risks": ["Significant data centre water consumption", "Supply chain emissions Scope 3 gaps"],
"greenwash_risk": "LOW",
"confidence": 0.84,
"sources_used": 7
}| Metric | Value |
|---|---|
| Documents indexed | 100+ ESG reports |
| Query response time | <2.5 seconds |
| Answer relevance (RAGAS) | 0.87 |
| ESG score correlation (vs MSCI) | 0.79 Pearson r |
| Greenwash detection precision | 81% |
pytest tests/ -vRoadmap: Bloomberg ESG data integration · Real-time news monitoring · Portfolio optimisation engine · Regulatory filing auto-parser (TCFD, SFDR)
MIT License — see LICENSE. Contributions welcome — see CONTRIBUTING.md.