| title | NamastAI | |
|---|---|---|
| emoji | 💻 | |
| colorFrom | blue | |
| colorTo | red | |
| sdk | docker | |
| pinned | false | |
| short_description | invoice-env | |
| tags |
|
OpenEnv-compliant environment and full-stack app for invoice automation.
Follow this section if someone is cloning the repo for the first time.
- Python
3.10+ - Node.js
18+and npm - Git
- Docker Desktop (for container checks)
git clone https://github.com/IshwinderKaur8/invoice-env.git
cd invoice-envWindows PowerShell:
python -m venv .venv
.\.venv\Scripts\Activate.ps1
python -m pip install --upgrade pip
pip install -r requirements.txt
pip install -r backend/requirements.txtmacOS/Linux:
python3 -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
pip install -r requirements.txt
pip install -r backend/requirements.txtCopy example files and edit values as needed:
cp .env.example .env
cp backend/.env.example backend/.env
cp frontend/.env.example frontend/.envWindows PowerShell equivalent:
Copy-Item .env.example .env
Copy-Item backend/.env.example backend/.env
Copy-Item frontend/.env.example frontend/.envMinimum required for hackathon inference:
API_BASE_URLMODEL_NAMEHF_TOKEN
python -m pytest -q
python scripts/run_baseline.pyBackend (terminal 1):
uvicorn backend.main:app --reload --host 0.0.0.0 --port 8000Frontend (terminal 2):
cd frontend
npm install
npm run devpython inference.pyInstall validator CLI once:
pip install openenv-coreRun checks:
openenv validate
bash validate-submission.sh <your-space-url>Core goal: train/evaluate agents to process invoices and receipts by solving three tasks in sequence:
- Field extraction (easy)
- Expense categorization (medium)
- Anomaly detection (hard)
- Observation: raw invoice fields and text context
- Action: extract
vendor_name,invoice_date - Reward: exact match
0.99; partial credit for fuzzy similarity>= 0.8; minimum0.01 - Grader: deterministic comparison against ground truth
- Observation: vendor, description, line-item metadata
- Action: assign category from
Travel,Office Supplies,Utilities,Misc - Reward: exact match
0.99;0.5if correct label appears in top-2 prediction; minimum0.01 - Grader: deterministic category check against labeled data
- Observation: invoice batch context (amount patterns, references, vendor/date behavior)
- Action: set anomaly flag for duplicate/high-risk invoices
- Reward: continuous score from precision/recall F1 behavior, clamped to
(0,1)as0.01..0.99 - Grader: deterministic scoring function derived from confusion counts
Implemented with Pydantic typed models:
class InvoiceObservation(BaseModel):
vendor_name: str
invoice_date: str
amount: float
description: str
metadata: Dict[str, Any]
class InvoiceAction(BaseModel):
extracted_fields: Dict[str, str]
category: Optional[str] = None
anomaly_flag: Optional[bool] = None
class InvoiceReward(BaseModel):
score: float
details: Dict[str, Any]One episode equals processing one invoice batch. Default behavior uses deterministic synthetic invoices with anomalies included.
The environment follows OpenEnv-style methods:
reset()step(action)state()
Implementation entrypoint and schema wiring are defined in openenv.yaml.
.
├── env/ # Core OpenEnv runtime (models, graders, tasks, dataset)
├── scripts/ # Baseline agent runner (OpenAI + heuristic fallback)
├── tests/ # Model, grader, and environment tests
├── backend/ # FastAPI service + Mongo integration
├── frontend/ # React/Vite/Tailwind dashboard
├── openenv.yaml # OpenEnv metadata and schema mapping
├── .env.example # Root inference env template
├── validate-submission.sh # Root wrapper for organizer validator script
├── Dockerfile # Containerized baseline execution
└── requirements.txt # Core environment dependencies
Two baseline entrypoints are provided:
inference.py(root): hackathon submission script (mandatory filename)scripts/run_baseline.py: developer helper script with local heuristic mode
scripts/run_baseline.py supports:
BASELINE_MODE=auto(default): OpenAI if API key is present, otherwise heuristicBASELINE_MODE=openai: strict OpenAI modeBASELINE_MODE=heuristic: fully offline deterministic mode
python scripts/run_baseline.pyRecent heuristic run output:
- Steps: 12
- Total score: 8.400
- Average score: 0.700
Set these variables before running inference.py:
API_BASE_URL: API endpoint for the LLM providerMODEL_NAME: model identifier for inferenceHF_TOKEN: API key/token used by OpenAI clientLOCAL_IMAGE_NAME: local image reference if organizer uses image-backed env constructor
Reference file:
.env.example
Optional reproducibility variables:
BATCH_SIZE(default24)SEED(default42)
The submission inference script is:
inference.py(at repository root)
It uses the OpenAI Python client for all LLM calls and reads credentials/config from the environment variables above.
inference.py emits strict tagged logs:
[START]once at run start[STEP]once per environment step[END]once at run completion
Example:
[START] task=invoice-processing env=invoice-openenv model=Qwen/Qwen2.5-72B-Instruct
[STEP] step=1 action={"extracted_fields":{"vendor_name":"Amazon","invoice_date":"2026-01-12"},"category":"Office Supplies","anomaly_flag":false} reward=0.70 done=false error=null
[END] success=true steps=24 rewards=0.70,0.70,0.70
Required line schema (field order preserved):
[START] task=<task_name> env=<benchmark> model=<model_name>[STEP] step=<n> action=<action_str> reward=<0.00> done=<true|false> error=<msg|null>[END] success=<true|false> steps=<n> rewards=<r1,r2,...,rn>
Run command:
python inference.pypython -m venv .venv
.venv\Scripts\activate
pip install -r requirements.txt
python -m pytest -q
python scripts/run_baseline.pycd backend
pip install -r requirements.txt
uvicorn main:app --reload --host 0.0.0.0 --port 8000Optional backend environment variables:
OPENAI_API_KEYOPENAI_MODEL(defaultgpt-4o-mini)MONGO_URIMONGO_DB_NAMEFRONTEND_ORIGIN
Reference file:
backend/.env.example
cd frontend
npm install
npm run devOptional frontend environment variable:
VITE_API_BASE_URL(defaulthttp://localhost:8000/api)
Reference file:
frontend/.env.example
POST /api/resetPOST /api/stepGET /api/statePOST /api/run-agentGET /api/results
These endpoints allow both interactive stepping and full-episode automated agent runs.
The root Dockerfile is configured for containerized API execution and HF Spaces compatibility.
docker build -t invoice-openenv .
docker run --rm -p 7860:7860 invoice-openenvHealth check examples after startup:
curl http://localhost:7860/
curl -X POST http://localhost:7860/reset -H "Content-Type: application/json" -d '{}'
curl -X POST http://localhost:7860/api/reset -H "Content-Type: application/json" -d '{"batch_size": 8}'If you need the exact organizer command path from repo root:
bash validate-submission.sh <your-space-url>For Spaces Docker deployment:
- Use this repository as the Space source.
- Select Docker SDK.
- Add Space secrets for
API_BASE_URL,MODEL_NAME, andHF_TOKEN. - Add
openenvtag in Space metadata. - Verify the Space returns
200on/reset(organizer validator check).
- Backend:
backend/render.yaml(Render) - Frontend:
frontend/vercel.json(Vercel)
- Deterministic synthetic dataset with duplicates and high-amount anomalies
- Deterministic graders for all three tasks
- Typed observation/action/reward models
- OpenEnv metadata in
openenv.yaml - Baseline script with OpenAI + offline fallback
- Docker support included
- Test suite passing (
14 passed)
openenv.yamlincludes metadata, task definitions, and typed schema referencesenv/models.pydefines typed Observation/Action/Reward Pydantic modelsenv/environment.pyimplementsstep(),reset(), andstate()- 3 tasks implemented with deterministic graders and scores in
0.0..1.0 inference.pyexists at root and uses OpenAI client + required env variables- Structured logs include
[START],[STEP],[END] - Docker image builds and serves API container
/resetresponds successfully from deployed Space
The repository includes organizer-compatible pre-validation helper:
scripts/validate-submission.sh