Citation-Agent: RL Citation Benchmark

title	Citation Benchmark
colorFrom	blue
colorTo	indigo
sdk	docker
app_port	7860
pinned	false

Citation-Agent: RL Citation Benchmark

Citation-Agent is an OpenEnv-compliant environment for training and evaluating a reinforcement learning citation agent on academic literature review and citation selection.

Given a claim, the agent must navigate a localized research database to search across papers, read abstracts, traverse citation graphs, and ultimately select the correct paper that supports the claim.

The Challenge

Most agent benchmarks focus on web scraping or bash terminal navigation. This environment tests an agent's ability to reason over complex scholarly graphs. The agent must:

Differentiate between ArXiv IDs and Semantic Scholar Corpus IDs.
Follow citation intents.
Run SQL-backed queries to find core papers.
Avoid getting stuck in infinite search loops.

RL Architecture

environment.py: RL environment with actions, transitions, and dense reward logic.
tasks.py: 50 tasks with deterministic grader checks.
rl_agent.py: Q-learning policy for citation actions.
evaluation.py: Hybrid evaluation with programmatic checks and optional LLM scoring.
hf_entry.py: Gradio app entrypoint for Hugging Face Spaces deployment.

How to Run

Option 1: Running Locally (Recommended)

Install Dependencies: Requires Python 3.10+.
```
pip install -r requirements.txt
```

Set Environment Variables: We use the Hugging Face Router with the google/gemma-4-31B-it model.

PowerShell:

$env:HF_TOKEN="your_huggingface_api_key"
$env:MODEL_NAME="google/gemma-4-31B-it"
$env:API_BASE_URL="https://router.huggingface.co/v1"

Bash/Zsh:

export HF_TOKEN="your_huggingface_api_key"
export MODEL_NAME="google/gemma-4-31B-it"
export API_BASE_URL="https://router.huggingface.co/v1"

Train + Evaluate the RL Agent (CLI):
```
python inference.py --episodes-per-task 120
```
Add --include-llm-score to enable LLM-based trajectory scoring.

Useful controls:
- Resume from saved checkpoint:
```
python inference.py --load-checkpoint --checkpoint-path rl_policy.json
```
- Evaluate only (skip train):
```
python inference.py --load-checkpoint --skip-train
```
- Force a fresh retrain:
```
python inference.py --force-retrain
```
- CI fast mode (reduced train/eval set, no LLM scoring):
```
python inference.py --ci-fast
```
Run the Hugging Face app locally:
```
python hf_entry.py
```
Then open http://localhost:7860.

The Space uses full_policy.json by default so it should return a result immediately instead of training on the first click.

Option 2: Running via Docker (Hugging Face Spaces validation)

To test if your environment works with the automated Hugging Face Spaces pipeline:

docker build -t citation-agent .
docker run --env HF_TOKEN="your_token" --env MODEL_NAME="google/gemma-4-31B-it" --env API_BASE_URL="https://router.huggingface.co/v1" citation-agent

Option 3: Deploy Directly to Hugging Face Space

Create a Space ID (for example your-username/citation-rl-agent).
Set your token:
- PowerShell:
```
$env:HF_TOKEN="your_hf_token"
```

Deploy:

python deploy_hf_space.py --space-id your-username/citation-rl-agent

Optional private Space:

python deploy_hf_space.py --space-id your-username/citation-rl-agent --private

After upload, open https://huggingface.co/spaces/your-username/citation-rl-agent.

OpenEnv Compliance Checklist

Real-world task: Accurately simulates literature review and academic source verification.
OpenEnv Spec Compliance:
- Contains a valid openenv.yaml.
- environment.py uses Pydantic Models (Observation, Action, Reward) and implements the required reset(), state(), and step(action) methods.
- Observation Space: JSON containing current_claim, search_results, last_abstract, citations_data, message, and step_count.
- Action Space: action_type (search, read_abstract, get_citations, submit), query, and paper_id.
50 Distinct Tasks with Graders:
- tasks.py loads 50 distinct Easy, Medium, and Hard tasks from the SQLite database.
- Includes a deterministic Grader class that checks the submitted paper_id against the ground truth.
Database Architecture:
- To stay under the 1GB limit for free HF Spaces, environment.py uses huggingface-hub to automatically pull the citation_db.sqlite database from the ruby56/Citation-Database dataset on startup.
Continuous Reward Function:
- Uses dense shaping. The agent gets a -0.05 step penalty to encourage efficiency, and -0.1 to -0.2 penalties for formatting errors. It gets the final grader score (+1.0 max) on submit.
Hybrid Evaluation:
- Uses deterministic programmatic checks and optional LLM scoring to produce the final benchmark score.
Production Dockerfile:
- Standard Dockerfile that installs dependencies and runs hf_entry.py directly for Hugging Face Spaces.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
citations_links.txt		citations_links.txt
database_builder.py		database_builder.py
dataset_links.txt		dataset_links.txt
deploy_hf_space.py		deploy_hf_space.py
environment.py		environment.py
evaluation.py		evaluation.py
full_policy.json		full_policy.json
generate_tasks.py		generate_tasks.py
hf_entry.py		hf_entry.py
inference.py		inference.py
openenv.yaml		openenv.yaml
openenv_entry.py		openenv_entry.py
requirements.txt		requirements.txt
rl_agent.py		rl_agent.py
tasks.py		tasks.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Citation-Agent: RL Citation Benchmark

The Challenge

RL Architecture

How to Run

Option 1: Running Locally (Recommended)

Option 2: Running via Docker (Hugging Face Spaces validation)

Option 3: Deploy Directly to Hugging Face Space

OpenEnv Compliance Checklist

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Citation-Agent: RL Citation Benchmark

The Challenge

RL Architecture

How to Run

Option 1: Running Locally (Recommended)

Option 2: Running via Docker (Hugging Face Spaces validation)

Option 3: Deploy Directly to Hugging Face Space

OpenEnv Compliance Checklist

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages