RationGuardEnv

title	RationGuardEnv
emoji	🛡️
colorFrom	blue
colorTo	green
sdk	docker
sdk_version	0.0.1
app_file	app.py
pinned	false

RationGuardEnv

RationGuardEnv is a deterministic OpenEnv-compatible simulation for fraud detection in India’s Public Distribution System (PDS), a system where leakage has been estimated at ~₹40,000 crore in some analyses.
The environment is designed for multi-step AI-agent reasoning: the agent must request evidence, update suspicion, and then make a final fraud decision.

Why this environment matters

PDS fraud is a real governance problem with multiple failure points:

Beneficiary fraud: inflated claims versus entitlement
Dealer fraud: ledger-level mismatch between recorded and available stock
Supply chain fraud: divergence between transported and distributed stock

RationGuardEnv converts this into a clean, deterministic benchmark suitable for reliable agent evaluation.

OpenEnv compliance

The environment implements the required interface:

reset(level) -> observation
step(action) -> (observation, reward, done, info)
state() -> current_state

Implementation file: env/ration_env.py

Environment design

Observation space (structured JSON-like dict)

Each step returns fields including:

task_level, task_id
step_number, max_steps
claimed_quantity, expected_quota
dealer_stock, transported_stock
suspicion_score (0.0 to 1.0)
revealed_checks (which investigations have been run)
revealed_indicators (numeric evidence values)
allowed_actions
last_action

Action space

Investigation actions:

REQUEST_BENEFICIARY_AUDIT
REQUEST_DEALER_LEDGER
REQUEST_TRANSPORT_LOG

Final decision actions:

FLAG_BENEFICIARY_FRAUD
FLAG_DEALER_FRAUD
FLAG_SUPPLY_CHAIN_FRAUD
CLEAR_CLAIM

Multi-step transitions

Suspicion begins at a deterministic baseline.
Investigation actions reveal new indicators and increase suspicion.
Episode ends on final decision or max-step timeout.
No randomness is used anywhere.

Reward logic (all rewards in [0, 1])

Reward shaping is deterministic and interpretable:

Partial rewards for useful investigations
Penalties for repeated/irrelevant/invalid actions
Final-decision reward for correct fraud classification
Evidence bonus based on required investigation coverage
Early bonus for correct early final decision

Every step returns a reward_breakdown in info:

base
evidence_bonus
penalty
early_bonus
final_reward

Task levels (easy → medium → hard)

All tasks are deterministic and stored in env/tasks.py.

Easy (beneficiary fraud)
Obvious over-claim mismatch; minimal evidence needed.
Medium (dealer fraud)
Subtle dealer-level mismatch; requires combining ledger + transport signals.
Hard (supply chain fraud)
Multi-variable anomaly with hidden pattern; requires broader evidence synthesis.

Grading quality

Grading is fully rule-based in env/grader.py:

Deterministic action-to-score mapping
Partial correctness scoring
No LLM-based grading ambiguity

Project structure

env/
  __init__.py
  models.py
  tasks.py
  grader.py
  ration_env.py
inference.py
openenv.yaml
Dockerfile
requirements.txt
README.md

Run locally

python -m pip install -r requirements.txt
python inference.py

Inference script requirements satisfied:

uses OpenAI client
prints strict [START], [STEP], [END] log lines
deterministic policy for reproducible baseline

Required/expected env vars for submission runners:

API_BASE_URL — OpenAI-compatible endpoint (default: HF router)
MODEL_NAME — model id used for completion calls
HF_TOKEN — token used as API key for routed calls

The script uses the OpenAI Python client and emits strict single-line stdout records in this exact order per task:

[START] task=<task> env=<benchmark> model=<model>
[STEP] step=<n> action=<action> reward=<0.00> done=<true|false> error=<msg|null>
[END] success=<true|false> steps=<n> score=<0.00-1.00> rewards=<r1,r2,...>

Run with Docker (Mac M1 + HF Spaces friendly)

docker build -t rationguard-env .
docker run --rm rationguard-env

Notes:

Uses python:3.11-slim (multi-arch compatible, including ARM64 / Apple Silicon)
Minimal dependencies for fast build/startup

Determinism guarantee

No random module usage
Fixed task bank
Fixed rule-based transitions and grader
Same action sequence always gives same trajectory/reward

Submission readiness checklist

✅ OpenEnv interface methods present (reset, step, state)
✅ 3 tasks with increasing difficulty
✅ Rewards normalized to [0, 1]
✅ Deterministic and reproducible
✅ Structured observations for LLM evaluator
✅ Dockerized baseline runner

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
__pycache__		__pycache__
env		env
server		server
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
app.py		app.py
inference.py		inference.py
openenv.yaml		openenv.yaml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RationGuardEnv

Why this environment matters

OpenEnv compliance

Environment design

Observation space (structured JSON-like dict)

Action space

Multi-step transitions

Reward logic (all rewards in [0, 1])

Task levels (easy → medium → hard)

Grading quality

Project structure

Run locally

Run with Docker (Mac M1 + HF Spaces friendly)

Determinism guarantee

Submission readiness checklist

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RationGuardEnv

Why this environment matters

OpenEnv compliance

Environment design

Observation space (structured JSON-like dict)

Action space

Multi-step transitions

Reward logic (all rewards in [0, 1])

Task levels (easy → medium → hard)

Grading quality

Project structure

Run locally

Run with Docker (Mac M1 + HF Spaces friendly)

Determinism guarantee

Submission readiness checklist

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages