Project Page | Paper | Demo
AutoGEO is a framework for Automatic Generative Engine Optimization (GEO) that helps web content gain higher visibility in LLM-generated answers. Our paper has been accepted by ICLR 2026.
📄 Paper: "What Generative Search Engines Like and How to Optimize Web Content Cooperatively"
👥 Authors: Yujiang Wu*, Shanshan Zhong*, Yubin Kim, Chenyan Xiong (*Equal contribution)
AutoGEO automatically extracts content preference rules from generative engines and rewrites documents to maximize visibility while preserving accuracy.
How GEO models work:
- Input: Target document
- Output: Rewritten document with higher visibility in generative engine (GE) responses
- Goal: Maximize visibility without harming GE utility
Three core components:
- Rule Extraction — Automatically mines content preferences from GEs.
- AutoGEOAPI — Prompt-based GEO model using extracted rules
- AutoGEOMini — Cost-effective GEO model trained with reinforcement learning
Evaluation metrics: GEO score (visibility) and GEU score (utility)
- 🔥 [2026-01-28]: Cheers! Our paper has been accepted by ICLR 2026!
- 🔥 [2026-01-17]: We have released our AutoGEOMini Demo. Feel free to try it out!
- 🔥 [2026-01-17]: We have released our checkpoints (E-commerce, GEO-Bench, Researchy-GEO).
- 🔥 [2025-12-08]: We have released our code and datasets (E-commerce, GEO-Bench, Researchy-GEO).
- 🔥 [2025-10-11]: Our paper is now available on arXiv. Check it out!
For using AutoGEOAPI and rule extraction:
# Clone the repository
git clone --recursive https://github.com/cxcscmu/AutoGEO
cd AutoGEO
# Run installation script
bash install.sh
# Activate environment
conda activate autogeo
# Configure API keys (required)
nano keys.env # Add your API keysOptional: For training AutoGEOMini models:
# First complete Option 1, then:
conda activate autogeo
bash install_mini.sh- CUDA-compatible GPU * 2 (A100 40GB+ recommended)
- ~4h for SFT and ~48h for GRPO on Researchy-GEO
Rewrite a document using AutoGEOAPI:
from autogeo.rewriters import rewrite_document
rewritten_text = rewrite_document(
document="AutoGEO automatically extracts content preference rules from generative engines and rewrites documents to maximize visibility while preserving accuracy.",
dataset="Researchy-GEO", # Options: E-commerce, GEO-Bench, Researchy-GEO
engine_llm="gemini" # Options: gemini, gpt, claude
)
print(rewritten_text)Extract content preference rules from a generative engine (example: Gemini on E-commerce):
python -m autogeo.extract_rules \
--dataset E-commerce \
--engine_llm gemini-2.5-flash-liteRules are saved to: data/E-commerce/rule_sets/gemini-2.5-flash-lite/.
Tips:
- Reduce concurrency if hitting API rate limits:
--max_workers 4 - Test on a small subset:
--num_examples 10
Use extracted or custom rules for rewriting:
from autogeo.rewriters import rewrite_document
rewritten_text = rewrite_document(
document="Your document text here",
rule_path=f"data/{dataset}/rule_sets/{engine_llm}/merged_rules.json"
)Custom rules format: JSON file with root key "filtered_rules"
AutoGEO provides a unified evaluation framework for all models.
Model types:
vanilla— Original documents (baseline)autogeo_api— Rewritten documents generated by prompt-based GEO modelautogeo_mini— Rewritten documents generated by cost-effective GEO model
Evaluate baseline:
python -m autogeo.evaluate \
--model vanilla \
--dataset E-commerce \
--engine_llm gemini-2.5-flash-liteEvaluate AutoGEOAPI:
python -m autogeo.evaluate \
--model autogeo_api \
--dataset E-commerce \
--engine_llm gemini-2.5-flash-liteTips:
- Include GEU score:
--need_geu_score - Test subset:
--num_examples 10
Train a cost-effective GEO model using reinforcement learning.
Step 1: Cold Start (Supervised Fine-Tuning)
bash run_cold_start.sh E-commerceUsing training data (data/E-commerce/RL/finetune.json) and starts LLaMA-Factory training. Checkpoint saved to outputs/E-commerce/cold_start.
Step 2: GRPO Training
bash run_grpo.sh E-commerceTrains the model using Group Relative Policy Optimization. Checkpoint saved to outputs/E-commerce/grpo.
If you encounter GRPO-related dependency errors, it is usually caused by version conflicts between LLaMA-Factory and open-r1. To resolve this, reinstall open-r1:
cd open-r1
GIT_LFS_SKIP_SMUDGE=1 pip install -e ".[dev]"
Step 3: Evaluation
python -m autogeo.evaluate \
--model autogeo_mini \
--model_path outputs/E-commerce/grpo \
--dataset E-commerce \
--engine_llm gemini-2.5-flash-liteDatasets:
- Researchy-GEO — Academic dataset
- E-commerce — Commercial dataset
- GEO-Bench — Benchmark from GEO
Generative Engines:
- Gemini (e.g.,
gemini-2.5-flash-lite) - GPT (e.g.,
gpt-4o-mini) - Claude (e.g.,
claude-3-5-sonnet-20241022)
Metrics:
- GEO Score — Visibility (position, token count, citation frequency)
- GEU Score — Utility (citation quality, keypoint coverage, response quality)
We thank the authors of GEO, AutoRule, LLaMA-Factory, open-r1, and DeepResearchGym for their inspiring works. We also thank Qwen3 and DeepSeek-R1 for their excellent models.
If you find AutoGEO useful, please cite:
@article{wu2025generative,
title={What Generative Search Engines Like and How to Optimize Web Content Cooperatively},
author={Wu, Yujiang and Zhong, Shanshan and Kim, Yubin and Xiong, Chenyan},
journal={arXiv preprint arXiv:2510.11438},
year={2025}
}