Skip to content

[WIP] Update README.md to reflect actual grader weights and system logic#1

Closed
Copilot wants to merge 1 commit into
mainfrom
copilot/update-readme-grader-weights
Closed

[WIP] Update README.md to reflect actual grader weights and system logic#1
Copilot wants to merge 1 commit into
mainfrom
copilot/update-readme-grader-weights

Conversation

Copy link
Copy Markdown

Copilot AI commented Apr 13, 2026

  • Explore codebase: reviewed app/graders.py, app/models.py, inference.py, and README.md
  • Update Grader Formula weights from (0.35, 0.40, 0.25) → (0.50, 0.30, 0.20)
  • Update Grader Components table percentages
  • Convert Action Space bullet points to structured table
  • Add Grader System Breakdown subsection (scoring logic, MAE^0.7, score bounding, random baseline)
  • Expand Baseline Agent section (three-phase execution, priority rules, target step budgets)
  • Fix step budget footnote typo: "30 / 40 / 70" → "30 / 50 / 70"
  • Fix step budget note in Design Principles section
Original prompt

Update the README.md to reflect the actual grader weights and system logic based on the code in app/graders.py and inference.py.

Current Issues:

  1. README states grader weights as (0.35 × Prioritization) + (0.40 × Efficiency) + (0.25 × Utilization)
  2. Actual code in app/graders.py uses: W_PRIORITIZATION=0.50, W_EFFICIENCY=0.30, W_UTILIZATION=0.20
  3. README describes simplified 3-axis scoring, but actual implementation is more nuanced
  4. Action Space section should be converted to a detailed table format
  5. Step-level rewards and grading mechanics need alignment with actual inference.py behavior

Changes Required:

  1. Update Grader Formula Section (line 181-187):

    • Change weights from (0.35, 0.40, 0.25) to actual (0.50, 0.30, 0.20)
    • Update "Grader Components" table to match actual weights
  2. Add Comprehensive Action Space Table (after line 132):

    • Convert bullet points to structured table
    • Include: Action Type | Parameters | Description | Trade-off | Conditions | Step Reward
    • Add difficulty-specific strategy subsection
  3. Add Grader System Breakdown (new section after Reward System):

    • Prioritization scoring logic (2^severity, 40% threshold)
    • Efficiency scoring formula (MAE^0.7)
    • Utilization scoring formula
    • Score bounding (0.0001 to 0.9999)
    • Random baseline (<0.15)
  4. Add Baseline Agent Behavior Section (expand current section):

    • Three-phase execution: Info Gathering → Allocation → Finalization
    • Priority rules: medicine > water > food
    • Target step budgets: Easy 8, Medium 10, Hard 12
  5. Fix Tasks & Difficulty Levels (line 118-126):

    • Update step budgets from (7/10/13) to actual configured values (30/50/70)
    • Or clarify that (7/10/13) are "demo" budgets with footnote

Keep all original content, structure, and ensure backward compatibility with existing documentation.

The following is the prior conversation context from the user's chat exploration (may be truncated):

User: can u conver action space to a table

Assistant: I'll convert the Action Space section to a table format for you.

Assistant: Here's the Action Space converted into a table format:

Action Parameters Description Trade-off Impact
request_info zone_id Reveals true severity and demand for a zone Costs a step but reduces uncertainty +0.02 reward
allocate_resource zone_id, resource_type, amount Allocates a specific quantity of a resource to a zone Irreversible and bounded by global supply +0.05 reward
finalize None Terminates the episode and triggers final evaluation Premature finalization reduces achievable score -0.005 per step penalty

Would you like me to update the README.md with this table format and commit the changes to your repository?

User: heres the inference """
inference.py

Hackathon inference script for DisasterTriageEnv.
Connects an LLM agent (via OpenAI-compatible API) to the running
DisasterTriage FastAPI server and runs one full episode.
Logging output (stdout only):
[START] task=<task_name> env=disaster-triage-env model=<model_name>
[STEP] step= action=<action_str> reward=<0.00> done=<true|false> error=
[END] success=<true|false> steps= rewards=<r1,r2...rn>
Usage:
python inference.py [--task easy|medium|hard] [--base-url http://127.0.0.1:8000]
Environment variables (loaded from .env):
API_BASE_URL — LLM API base URL (default: https://api.openai.com/v1)
MODEL_NAME — model identifier (default: gpt-4.1-mini)
HF_TOKEN — API key / HF token (REQUIRED — raises error if missing)
"""

from future import annotations

import argparse
import json
import os
import sys
import time
from typing import Any, Dict, List, Optional

import numpy as np
import requests
from dotenv import load_dotenv
from openai import OpenAI

---------------------------------------------------------------------------

Load .env

---------------------------------------------------------------------------

load_dotenv()

---------------------------------------------------------------------------

Environment variables

---------------------------------------------------------------------------

LLM Proxy (must be provided by the environment, e.g. LiteLLM)

LLM_API_BASE: str = os.getenv("API_BASE_URL", "https://api.openai.com/v1")
MODEL_NAME: str = os.getenv("MODEL_NAME", "gpt-4.1-mini")
HF_TOKEN: str | None = os.getenv("HF_TOKEN")

if not HF_TOKEN:
print("[WARN] HF_TOKEN not found. Simulation will fail until a secret is provided.")
# We define a dummy client to avoid crash during import
client = None
else:
# ---------------------------------------------------------------------------
# OpenAI client (OpenAI-compatible, pointed at API_BASE_URL)
# ---------------------------------------------------------------------------
client = OpenAI(
base_url=LLM_API...

This pull request was created from Copilot chat.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants