HelixDesk OpenEnv

title

HelixDesk OpenEnv

emoji

📧

colorFrom

blue

colorTo

indigo

sdk

docker

pinned

false

HelixDesk OpenEnv

HelixDesk OpenEnv is a complete, real-world Gymnasium-style reinforcement learning environment where an AI agent named HelixDesk learns to manage customer email queues by interacting with a realistic simulation of a company's complaint management system. It is fully compatible with standard RL libraries including Stable-Baselines3, RLlib, and CleanRL.

The RL Problem

graph TD
    A[Incoming Emails] --> B[HelixDesk AI Agent]
    B -->|Action: Classify| C[Query / Complaint / Flag]
    B -->|Action: Priority| D[Critical - Normal]
    B -->|Action: Assign| E[Employee 1-5]
    B -->|Action: Secondary| F[KB Auto-reply / GM Alert]
    
    C --> G((Environment State))
    D --> G
    E --> G
    F --> G
    
    G -->|Reward| B
    style B fill:#4f46e5,stroke:#312e81,stroke-width:2px,color:#fff
    style G fill:#0891b2,stroke:#164e63,stroke-width:2px,color:#fff

State: A 42-dimensional observation vector encoding the current email's features (sentiment, category, customer tier, keyword flags), the support queue state (priority counts, overdue tickets), team workload (5 employees' loads and resolve times), SLA pressure, complaint volume trends, simulated time, and episode progress.

Action: A 4-part decision for each incoming email—classification (query / complaint / flag for review), priority assignment (critical / high / medium / normal), employee assignment (5 employees or none), and a secondary action (auto-reply from KB / alert GM / none).

Reward: A composite signal from 12 distinct components: correct classification (+0.5), timely resolution (+1.0), high CSAT (+0.8), trend prevention (+0.6), workload balance (+0.4), KB updates (+0.3), and penalties for missed deadlines (−1.0), bad auto-replies (−0.8), unnecessary escalations (−0.6), misclassification (−0.5), reopened complaints (−0.4), and missed keyword flags (−0.3). Total reward is clipped to [−1.0, +1.0] per step.

Setup

cd helixdesk-openenv
pip install -r requirements.txt

Quick Start

Run rule-based agent (no learning)

python train.py --agent rule --episodes 100

Run random baseline

python train.py --agent random --episodes 100

Train with Stable-Baselines3 PPO

pip install stable-baselines3
python train.py --agent sb3 --episodes 500

Evaluate an agent

python evaluate.py --agent rule --episodes 100

Gymnasium Compatibility

HelixDeskEnv passes gymnasium.utils.env_checker.check_env() with 0 errors:

from gymnasium.utils.env_checker import check_env
from helixdesk import HelixDeskEnv

env = HelixDeskEnv()
check_env(env)  # ✓ passes

Compatible with any Gymnasium-based training library:

Stable-Baselines3: PPO("MlpPolicy", HelixDeskEnv())
CleanRL: use the env like any standard gymnasium env
RLlib: register with gymnasium.register()

State Space (42 dimensions)

Group	Dims	Description
Current Email	0–9	Sentiment, keyword flag, customer tier (3-hot), category (5-slot overflow encoding)
Queue State	10–14	Normalized counts: critical, high, medium, normal, pending review
Team State	15–24	5 employees × (load_norm, avg_resolve_norm)
SLA State	25–28	Overdue norm, near-deadline norm, SLA pressure, critical overdue flag
Trend State	29–36	8 categories × growth rate fraction [−1, 1]
Time State	37–38	Hour of day / 24, day of week / 7
Episode Progress	39–41	Steps remaining norm, episode reward norm, agent confidence

All values normalized to [-1.0, 1.0]. Observation space: Box(low=-1, high=1, shape=(42,), dtype=float32).

Action Space (MultiDiscrete[3, 4, 6, 3])

Dim	Choices	Description
0: Classification	0=query, 1=complaint, 2=flag_for_review	How to classify the current email
1: Priority	0=critical, 1=high, 2=medium, 3=normal	Priority level (complaints only)
2: Assignment	0–4=employee_0..4, 5=no_assignment	Who handles it (complaints only)
3: Secondary	0=auto_reply_from_kb, 1=alert_gm, 2=none	Additional action

Rule: If classification = flag_for_review, dims 1/2/3 are forced to (normal, no_assignment, none).

Reward Signals (12 components)

Signal	Value	Condition
`resolve_on_time`	+1.0	Employee resolves ticket within SLA
`csat_high`	+0.8	CSAT score ≥ 4 on resolved ticket
`trend_prevented`	+0.6	GM alerted during category surge
`correct_classification`	+0.5	Classification matches ground truth
`balanced_assignment`	+0.4	Workload std decreased
`kb_updated`	+0.3	Knowledge base learned new entry
`missed_deadline`	−1.0	Ticket missed SLA deadline
`bad_autoreply`	−0.8	CSAT score ≤ 2
`unnecessary_escalation`	−0.6	Flagged for review despite low complexity
`misclassification`	−0.5	Classification doesn't match ground truth
`complaint_reopened`	−0.4	Complaint reopened after resolution
`keyword_flag_missed`	−0.3	Keyword-flagged email not treated as complaint/critical

How to Extend

Add an employee

In config.yaml, set env.n_employees: 6
The observation space grows by 2 dims (new employee load + resolve time)
Update spaces.py accordingly (increase OBS_SIZE and add employee dims)
Update action space dim 2 to 7 (6 employees + no_assignment)

Add a category

Add the category name to email_gen.categories in config.yaml
Add 5 query + 5 complaint templates in email_gen.py
Add 3 KB entries in knowledge_base.py
Trend state dims grow by 1

Tune parameters

All parameters in config.yaml propagate through without code changes:

Adjust episode_emails for longer/shorter episodes
Modify reward weights to shape different agent behaviours
Change sla.*_hours to tighten or relax deadlines
Adjust employee_sim.base_resolve_rate for harder/easier simulation

Project Structure

helixdesk-openenv/
├── helixdesk/
│   ├── __init__.py          # exports HelixDeskEnv
│   ├── env.py               # main environment class
│   ├── models.py            # Pydantic typed wrappers (HelixObservation, HelixAction, HelixReward)
│   ├── spaces.py            # observation & action space definitions
│   ├── rewards.py           # reward function
│   ├── simulator/           # simulation components
│   │   ├── clock.py         # simulated time
│   │   ├── email_gen.py     # synthetic email generation
│   │   ├── employee_sim.py  # employee behaviour model
│   │   ├── knowledge_base.py # KB lookup & auto-learn
│   │   └── trend_watchdog.py # volume surge detection
│   ├── agents/              # baseline agents
│   │   ├── base_agent.py    # abstract agent interface
│   │   ├── random_agent.py  # random baseline
│   │   └── rule_agent.py    # deterministic rule-based agent
│   └── monitor/             # logging & visualization
│       ├── episode_logger.py    # CSV per-step logger
│       └── terminal_dashboard.py # Rich live dashboard
├── tasks/                   # graded task definitions
│   ├── easy_classify.py     # keyword-flag classification (easy)
│   ├── medium_sla.py        # SLA compliance rate (medium)
│   ├── hard_trend.py        # trend detection + CSAT (hard)
│   └── expert_full.py       # full expert evaluation (expert)
├── tests/                   # pytest test suite
├── train.py                 # training entry point
├── evaluate.py              # evaluation with rich table output
├── baseline.py              # GPT-4o + rule + random baseline runner
├── inference.py             # mandatory hackathon inference script
├── config.yaml              # all configurable parameters
├── openenv.yaml             # OpenEnv manifest
├── Dockerfile               # container image
├── requirements.txt         # Python dependencies
└── README.md                # this file

Tasks

HelixDesk OpenEnv ships with 4 graded tasks of increasing difficulty. Each task's grade(env, agent) function returns a score in [0.0, 1.0].

Task	Difficulty	Scoring Criteria
`easy`	🟢 Easy	Run 20 emails. Score = fraction of keyword-flagged emails correctly classified as complaint with critical priority.
`medium`	🟡 Medium	Run 1 full episode (100 emails). Score = fraction of tickets resolved within SLA deadline.
`hard`	🔴 Hard	Run 1 full episode. Score = avg of (trend alerts caught / surge events, CSAT / 4.5, overdue control).
`expert`	⚫ Expert	Geometric mean of keyword score × classification accuracy × review abuse rate. One weakness tanks the whole score.

# Run all tasks against rule + random baselines
python baseline.py

Baseline Benchmark Results

We evaluated the baseline agents across 3 exact seeds (42, 100, 2026) to ensure reproducibility. The results clearly demonstrate that while a deterministic rule agent performs well on simple classification, it struggles on adversarial routing tasks (Hard / Expert) due to intentionally injected conflicting signals (ambiguous texts) and delayed consequence penalties in the environment.

Task	Random Agent (n=3)	Rule-based Agent (n=3)	Metric Type
easy	0.040 ± 0.02	1.000 ± 0.00	Strict priority assignment
medium	0.354 ± 0.04	0.865 ± 0.03	SLA Compliance %
hard	0.455 ± 0.06	0.490 ± 0.20	Trend isolation & Ambiguity resolution
expert	0.210 ± 0.05	0.550 ± 0.15	Geometric mean of workload balance + SLAs

Run python baseline.py to reproduce these precise evaluation traces locally.

Docker

# Build
docker build -t helixdesk-openenv .

# Start the web dashboard and API server
docker run --rm -p 7860:7860 helixdesk-openenv

# Run evaluation instead
docker run --rm -p 7860:7860 helixdesk-openenv python evaluate.py --agent rule --episodes 100

Hackathon Validation Proofs

Run our pre-configured test suite to verify full compliance with the Meta PyTorch OpenEnv harness requirements.

$ python -m pytest tests/test_validation.py -v

collected 4 items
tests/test_validation.py::test_endpoints PASSED                 [ 25%]
tests/test_validation.py::test_manifest_validation PASSED       [ 50%]
tests/test_validation.py::test_inference_script_format PASSED   [ 75%]
tests/test_validation.py::test_grader_consistency PASSED        [100%]
======================== 4 passed in 22.78s ========================

Inference Script Format: The stdout logs rigorously follow the [START], [STEP], and [END] syntax required by the OpenEnv validation harness.
Grader Consistency: Graders execute deterministically based on seed injections, returning strict, reproducible bounds in [0, 1].
API Endpoints: The FastAPI application inside the Docker entry point (app:app) properly handles POST /reset, POST /step, and POST /grader.

Running Tests

pytest tests/ -v

HuggingFace Space

Live demo: https://huggingface.co/spaces/nottherajyk/helixdesk-openenv

The Space runs the rule-based and random agents interactively in your browser. No install required.

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HelixDesk OpenEnv

The RL Problem

Setup

Quick Start

Run rule-based agent (no learning)

Run random baseline

Train with Stable-Baselines3 PPO

Evaluate an agent

Gymnasium Compatibility

State Space (42 dimensions)

Action Space (MultiDiscrete[3, 4, 6, 3])

Reward Signals (12 components)

How to Extend

Add an employee

Add a category

Tune parameters

Project Structure

Tasks

Baseline Benchmark Results

Docker

Hackathon Validation Proofs

Running Tests

HuggingFace Space

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 79 Commits
.huggingface		.huggingface
helixdesk		helixdesk
server		server
tasks		tasks
tests		tests
.gitignore		.gitignore
CHALLENGE.md		CHALLENGE.md
Dockerfile		Dockerfile
README.md		README.md
app.py		app.py
baseline.py		baseline.py
config.yaml		config.yaml
evaluate.py		evaluate.py
inference.py		inference.py
openenv.yaml		openenv.yaml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
train.py		train.py
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

HelixDesk OpenEnv

The RL Problem

Setup

Quick Start

Run rule-based agent (no learning)

Run random baseline

Train with Stable-Baselines3 PPO

Evaluate an agent

Gymnasium Compatibility

State Space (42 dimensions)

Action Space (MultiDiscrete[3, 4, 6, 3])

Reward Signals (12 components)

How to Extend

Add an employee

Add a category

Tune parameters

Project Structure

Tasks

Baseline Benchmark Results

Docker

Hackathon Validation Proofs

Running Tests

HuggingFace Space

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages