[WIP] Update README.md to reflect actual grader weights and system logic by Copilot · Pull Request #1 · Praptii21/disaster-triage-env

Copilot · 2026-04-13T19:34:03Z

Explore codebase: reviewed app/graders.py, app/models.py, inference.py, and README.md
Update Grader Formula weights from (0.35, 0.40, 0.25) → (0.50, 0.30, 0.20)
Update Grader Components table percentages
Convert Action Space bullet points to structured table
Add Grader System Breakdown subsection (scoring logic, MAE^0.7, score bounding, random baseline)
Expand Baseline Agent section (three-phase execution, priority rules, target step budgets)
Fix step budget footnote typo: "30 / 40 / 70" → "30 / 50 / 70"
Fix step budget note in Design Principles section

Original prompt

Update the README.md to reflect the actual grader weights and system logic based on the code in app/graders.py and inference.py.

Current Issues:

README states grader weights as (0.35 × Prioritization) + (0.40 × Efficiency) + (0.25 × Utilization)
Actual code in app/graders.py uses: W_PRIORITIZATION=0.50, W_EFFICIENCY=0.30, W_UTILIZATION=0.20
README describes simplified 3-axis scoring, but actual implementation is more nuanced
Action Space section should be converted to a detailed table format
Step-level rewards and grading mechanics need alignment with actual inference.py behavior

Changes Required:

Update Grader Formula Section (line 181-187):
- Change weights from (0.35, 0.40, 0.25) to actual (0.50, 0.30, 0.20)
- Update "Grader Components" table to match actual weights
Add Comprehensive Action Space Table (after line 132):
- Convert bullet points to structured table
- Include: Action Type | Parameters | Description | Trade-off | Conditions | Step Reward
- Add difficulty-specific strategy subsection
Add Grader System Breakdown (new section after Reward System):
- Prioritization scoring logic (2^severity, 40% threshold)
- Efficiency scoring formula (MAE^0.7)
- Utilization scoring formula
- Score bounding (0.0001 to 0.9999)
- Random baseline (<0.15)
Add Baseline Agent Behavior Section (expand current section):
- Three-phase execution: Info Gathering → Allocation → Finalization
- Priority rules: medicine > water > food
- Target step budgets: Easy 8, Medium 10, Hard 12
Fix Tasks & Difficulty Levels (line 118-126):
- Update step budgets from (7/10/13) to actual configured values (30/50/70)
- Or clarify that (7/10/13) are "demo" budgets with footnote

Keep all original content, structure, and ensure backward compatibility with existing documentation.

The following is the prior conversation context from the user's chat exploration (may be truncated):

User: can u conver action space to a table

Assistant: I'll convert the Action Space section to a table format for you.

Assistant: Here's the Action Space converted into a table format:

Action	Parameters	Description	Trade-off	Impact
`request_info`	`zone_id`	Reveals true severity and demand for a zone	Costs a step but reduces uncertainty	+0.02 reward
`allocate_resource`	`zone_id`, `resource_type`, `amount`	Allocates a specific quantity of a resource to a zone	Irreversible and bounded by global supply	+0.05 reward
`finalize`	None	Terminates the episode and triggers final evaluation	Premature finalization reduces achievable score	-0.005 per step penalty

Would you like me to update the README.md with this table format and commit the changes to your repository?

User: heres the inference """
inference.py

Hackathon inference script for DisasterTriageEnv.
Connects an LLM agent (via OpenAI-compatible API) to the running
DisasterTriage FastAPI server and runs one full episode.
Logging output (stdout only):
[START] task=<task_name> env=disaster-triage-env model=<model_name>
[STEP] step= action=<action_str> reward=<0.00> done=<true|false> error=
[END] success=<true|false> steps= rewards=<r1,r2...rn>
Usage:
python inference.py [--task easy|medium|hard] [--base-url http://127.0.0.1:8000]
Environment variables (loaded from .env):
API_BASE_URL — LLM API base URL (default: https://api.openai.com/v1)
MODEL_NAME — model identifier (default: gpt-4.1-mini)
HF_TOKEN — API key / HF token (REQUIRED — raises error if missing)
"""

from future import annotations

import argparse
import json
import os
import sys
import time
from typing import Any, Dict, List, Optional

import numpy as np
import requests
from dotenv import load_dotenv
from openai import OpenAI

---------------------------------------------------------------------------

Load .env

---------------------------------------------------------------------------

load_dotenv()

---------------------------------------------------------------------------

Environment variables

---------------------------------------------------------------------------

LLM Proxy (must be provided by the environment, e.g. LiteLLM)

LLM_API_BASE: str = os.getenv("API_BASE_URL", "https://api.openai.com/v1")
MODEL_NAME: str = os.getenv("MODEL_NAME", "gpt-4.1-mini")
HF_TOKEN: str | None = os.getenv("HF_TOKEN")

if not HF_TOKEN:
print("[WARN] HF_TOKEN not found. Simulation will fail until a secret is provided.")
# We define a dummy client to avoid crash during import
client = None
else:
# ---------------------------------------------------------------------------
# OpenAI client (OpenAI-compatible, pointed at API_BASE_URL)
# ---------------------------------------------------------------------------
client = OpenAI(
base_url=LLM_API...

This pull request was created from Copilot chat.

Initial plan

ffab8b7

Copilot AI assigned Copilot and Praptii21 Apr 13, 2026

Copilot started work on behalf of Praptii21 April 13, 2026 19:34 View session

Praptii21 closed this Apr 13, 2026

Copilot stopped work on behalf of Praptii21 due to an error April 13, 2026 19:37
The session was cancelled by the user.

Copilot AI requested a review from Praptii21 April 13, 2026 19:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Update README.md to reflect actual grader weights and system logic#1

[WIP] Update README.md to reflect actual grader weights and system logic#1
Copilot wants to merge 1 commit into
mainfrom
copilot/update-readme-grader-weights

Copilot AI commented Apr 13, 2026 •

edited by Praptii21

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Copilot AI commented Apr 13, 2026 • edited by Praptii21 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User: heres the inference """ inference.py

---------------------------------------------------------------------------

Load .env

---------------------------------------------------------------------------

---------------------------------------------------------------------------

Environment variables

---------------------------------------------------------------------------

LLM Proxy (must be provided by the environment, e.g. LiteLLM)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Apr 13, 2026 •

edited by Praptii21

Loading

User: heres the inference """
inference.py