Skip to content

[ICLR'26] AutoGEO: a framework to automatically learn generative engine preferences, and rewrite web contents for more traction.

License

Notifications You must be signed in to change notification settings

cxcscmu/AutoGEO

Repository files navigation

AutoGEO

Project Page | Paper | Demo

AutoGEO is a framework for Automatic Generative Engine Optimization (GEO) that helps web content gain higher visibility in LLM-generated answers. Our paper has been accepted by ICLR 2026.

📄 Paper: "What Generative Search Engines Like and How to Optimize Web Content Cooperatively"
👥 Authors: Yujiang Wu*, Shanshan Zhong*, Yubin Kim, Chenyan Xiong (*Equal contribution)

🔍 Overview

AutoGEO automatically extracts content preference rules from generative engines and rewrites documents to maximize visibility while preserving accuracy.

How GEO models work:

  • Input: Target document
  • Output: Rewritten document with higher visibility in generative engine (GE) responses
  • Goal: Maximize visibility without harming GE utility

Three core components:

  1. Rule Extraction — Automatically mines content preferences from GEs.
  2. AutoGEOAPI — Prompt-based GEO model using extracted rules
  3. AutoGEOMini — Cost-effective GEO model trained with reinforcement learning

Evaluation metrics: GEO score (visibility) and GEU score (utility)

News

🚀 Installation

For using AutoGEOAPI and rule extraction:

# Clone the repository
git clone --recursive https://github.com/cxcscmu/AutoGEO
cd AutoGEO

# Run installation script
bash install.sh

# Activate environment
conda activate autogeo

# Configure API keys (required)
nano keys.env  # Add your API keys

Optional: For training AutoGEOMini models:

# First complete Option 1, then:
conda activate autogeo
bash install_mini.sh

⚠️ Note: AutoGEOMini requires:

  • CUDA-compatible GPU * 2 (A100 40GB+ recommended)
  • ~4h for SFT and ~48h for GRPO on Researchy-GEO

⚡ Quick Start

Rewrite a document using AutoGEOAPI:

from autogeo.rewriters import rewrite_document

rewritten_text = rewrite_document(
    document="AutoGEO automatically extracts content preference rules from generative engines and rewrites documents to maximize visibility while preserving accuracy.",
    dataset="Researchy-GEO",   # Options: E-commerce, GEO-Bench, Researchy-GEO
    engine_llm="gemini"        # Options: gemini, gpt, claude
)

print(rewritten_text)

🧩 Rule Extraction

Extract content preference rules from a generative engine (example: Gemini on E-commerce):

python -m autogeo.extract_rules \
    --dataset E-commerce \
    --engine_llm gemini-2.5-flash-lite

Rules are saved to: data/E-commerce/rule_sets/gemini-2.5-flash-lite/. Tips:

  • Reduce concurrency if hitting API rate limits: --max_workers 4
  • Test on a small subset: --num_examples 10

Use extracted or custom rules for rewriting:

from autogeo.rewriters import rewrite_document

rewritten_text = rewrite_document(
    document="Your document text here",
    rule_path=f"data/{dataset}/rule_sets/{engine_llm}/merged_rules.json"
)

Custom rules format: JSON file with root key "filtered_rules"

🧩 AutoGEOAPI

AutoGEO provides a unified evaluation framework for all models.

Model types:

  • vanilla — Original documents (baseline)
  • autogeo_api — Rewritten documents generated by prompt-based GEO model
  • autogeo_mini — Rewritten documents generated by cost-effective GEO model

Evaluate baseline:

python -m autogeo.evaluate \
    --model vanilla \
    --dataset E-commerce \
    --engine_llm gemini-2.5-flash-lite

Evaluate AutoGEOAPI:

python -m autogeo.evaluate \
    --model autogeo_api \
    --dataset E-commerce \
    --engine_llm gemini-2.5-flash-lite

Tips:

  • Include GEU score: --need_geu_score
  • Test subset: --num_examples 10

🧩 AutoGEOMini

Train a cost-effective GEO model using reinforcement learning.

Step 1: Cold Start (Supervised Fine-Tuning)

bash run_cold_start.sh E-commerce

Using training data (data/E-commerce/RL/finetune.json) and starts LLaMA-Factory training. Checkpoint saved to outputs/E-commerce/cold_start.

Step 2: GRPO Training

bash run_grpo.sh E-commerce

Trains the model using Group Relative Policy Optimization. Checkpoint saved to outputs/E-commerce/grpo.

If you encounter GRPO-related dependency errors, it is usually caused by version conflicts between LLaMA-Factory and open-r1. To resolve this, reinstall open-r1:

cd open-r1
GIT_LFS_SKIP_SMUDGE=1 pip install -e ".[dev]"

Step 3: Evaluation

python -m autogeo.evaluate \
    --model autogeo_mini \
    --model_path outputs/E-commerce/grpo \
    --dataset E-commerce \
    --engine_llm gemini-2.5-flash-lite

📚 Supported Datasets & Engines & Metrics

Datasets:

  • Researchy-GEO — Academic dataset
  • E-commerce — Commercial dataset
  • GEO-Bench — Benchmark from GEO

Generative Engines:

  • Gemini (e.g., gemini-2.5-flash-lite)
  • GPT (e.g., gpt-4o-mini)
  • Claude (e.g., claude-3-5-sonnet-20241022)

Metrics:

  • GEO Score — Visibility (position, token count, citation frequency)
  • GEU Score — Utility (citation quality, keypoint coverage, response quality)

🙏 Acknowledgements

We thank the authors of GEO, AutoRule, LLaMA-Factory, open-r1, and DeepResearchGym for their inspiring works. We also thank Qwen3 and DeepSeek-R1 for their excellent models.

📖 Citation

If you find AutoGEO useful, please cite:

@article{wu2025generative,
  title={What Generative Search Engines Like and How to Optimize Web Content Cooperatively},
  author={Wu, Yujiang and Zhong, Shanshan and Kim, Yubin and Xiong, Chenyan},
  journal={arXiv preprint arXiv:2510.11438},
  year={2025}
}

Releases

No releases published

Packages

No packages published

Languages