| title | RationGuardEnv |
|---|---|
| emoji | 🛡️ |
| colorFrom | blue |
| colorTo | green |
| sdk | docker |
| sdk_version | 0.0.1 |
| app_file | app.py |
| pinned | false |
RationGuardEnv is a deterministic OpenEnv-compatible simulation for fraud detection in India’s Public Distribution System (PDS), a system where leakage has been estimated at ~₹40,000 crore in some analyses.
The environment is designed for multi-step AI-agent reasoning: the agent must request evidence, update suspicion, and then make a final fraud decision.
PDS fraud is a real governance problem with multiple failure points:
- Beneficiary fraud: inflated claims versus entitlement
- Dealer fraud: ledger-level mismatch between recorded and available stock
- Supply chain fraud: divergence between transported and distributed stock
RationGuardEnv converts this into a clean, deterministic benchmark suitable for reliable agent evaluation.
The environment implements the required interface:
reset(level) -> observationstep(action) -> (observation, reward, done, info)state() -> current_state
Implementation file: env/ration_env.py
Each step returns fields including:
task_level,task_idstep_number,max_stepsclaimed_quantity,expected_quotadealer_stock,transported_stocksuspicion_score(0.0 to 1.0)revealed_checks(which investigations have been run)revealed_indicators(numeric evidence values)allowed_actionslast_action
Investigation actions:
REQUEST_BENEFICIARY_AUDITREQUEST_DEALER_LEDGERREQUEST_TRANSPORT_LOG
Final decision actions:
FLAG_BENEFICIARY_FRAUDFLAG_DEALER_FRAUDFLAG_SUPPLY_CHAIN_FRAUDCLEAR_CLAIM
- Suspicion begins at a deterministic baseline.
- Investigation actions reveal new indicators and increase suspicion.
- Episode ends on final decision or max-step timeout.
- No randomness is used anywhere.
Reward shaping is deterministic and interpretable:
- Partial rewards for useful investigations
- Penalties for repeated/irrelevant/invalid actions
- Final-decision reward for correct fraud classification
- Evidence bonus based on required investigation coverage
- Early bonus for correct early final decision
Every step returns a reward_breakdown in info:
baseevidence_bonuspenaltyearly_bonusfinal_reward
All tasks are deterministic and stored in env/tasks.py.
-
Easy (beneficiary fraud)
Obvious over-claim mismatch; minimal evidence needed. -
Medium (dealer fraud)
Subtle dealer-level mismatch; requires combining ledger + transport signals. -
Hard (supply chain fraud)
Multi-variable anomaly with hidden pattern; requires broader evidence synthesis.
Grading is fully rule-based in env/grader.py:
- Deterministic action-to-score mapping
- Partial correctness scoring
- No LLM-based grading ambiguity
env/
__init__.py
models.py
tasks.py
grader.py
ration_env.py
inference.py
openenv.yaml
Dockerfile
requirements.txt
README.md
python -m pip install -r requirements.txt
python inference.pyInference script requirements satisfied:
- uses
OpenAIclient - prints strict
[START],[STEP],[END]log lines - deterministic policy for reproducible baseline
Required/expected env vars for submission runners:
API_BASE_URL— OpenAI-compatible endpoint (default: HF router)MODEL_NAME— model id used for completion callsHF_TOKEN— token used as API key for routed calls
The script uses the OpenAI Python client and emits strict single-line stdout records in this exact order per task:
[START] task=<task> env=<benchmark> model=<model>[STEP] step=<n> action=<action> reward=<0.00> done=<true|false> error=<msg|null>[END] success=<true|false> steps=<n> score=<0.00-1.00> rewards=<r1,r2,...>
docker build -t rationguard-env .
docker run --rm rationguard-envNotes:
- Uses
python:3.11-slim(multi-arch compatible, including ARM64 / Apple Silicon) - Minimal dependencies for fast build/startup
- No random module usage
- Fixed task bank
- Fixed rule-based transitions and grader
- Same action sequence always gives same trajectory/reward
- ✅ OpenEnv interface methods present (
reset,step,state) - ✅ 3 tasks with increasing difficulty
- ✅ Rewards normalized to [0, 1]
- ✅ Deterministic and reproducible
- ✅ Structured observations for LLM evaluator
- ✅ Dockerized baseline runner