Advancing Universal Scientific Research Intelligence via Evolving Polymathic Synthesis
- [2026-03-14] UniScientist-30B-A3B is now available for download on HuggingFace and ModelScope.
- [2026-03-11] We release the full inference trajectories of UniScientist on the FrontierScience-Research benchmark. Check the
trajectory/folder for details.
UniScientist advances universal scientific research intelligence through a unified paradigm. By reassigning LLMs as cross-disciplinary generators and human experts as high-precision verifiers, it produces research-grade data spanning 50+ scientific disciplines with structured, rubric-based supervision. A 30B-parameter model trained on this data achieves highly competitive performance across five research benchmarks. Read the blog first for a better overall impression.
UniScientist formalizes open-ended scientific research as Active Evidence Integration and Model Abduction, and proposes the Evolving Polymathic Synthesis paradigm for synthesizing high-quality research problems and evaluation rubrics at scale.
The approach comprises three key components:
- Evolving Polymathic Synthesis — A human-LLM collaborative data paradigm that generates research-grade scientific problems across 50+ disciplines, each accompanied by co-evolved rubrics refined through completeness, consistency, and distinguishability checks.
- Agentic Research Loop — The model conducts scientific research by iteratively acquiring evidence, deriving formally-justified results, and updating hypotheses via abductive inference, using tools including
web_search,google_scholar,page_fetching, andcode_interpreter. - Report Aggregation — Given multiple candidate research reports, the model learns to synthesize a consolidated report integrating the best elements, enabling research quality to self-evolve over time.
UniScientist-30B-A3B achieves top-tier performance across all five benchmarks, surpassing much larger proprietary models:
- FrontierScience-Research: 28.3 (surpassing Claude Opus 4.5 at 17.5, GPT-5.2 at 25.2), reaching 33.3 with test-time scaling (Aggr@8)
- FrontierScience-Olympiad: 66.0 without tools, 71.0 with tools + Aggr@8 (matching Claude Opus 4.5)
- DeepResearch Bench: 46.0 (vs. Perplexity Deep Research 42.3, OpenAI Deep Research 47.0)
- DeepResearch Bench II: 48.0 (surpassing OpenAI Deep Research 45.4, Gemini-3-Pro Deep Research 44.6)
- ResearchRubrics: 59.9 (comparable to OpenAI Deep Research 59.7, Gemini Deep Research 61.5)
UniScientist/
├── local_deploy.sh # Step 1: Deploy local LLM via vLLM
├── inference_local_qwen.sh # Step 2: Run agentic inference (repeat for multiple rollouts)
├── inference_local_qwen.py # Agentic inference engine
├── inference_local_aggregate.py # Step 3: Aggregate multiple rollouts into a final report
├── tools/
│ ├── tool_search.py # Google web search (via Serper API)
│ ├── tool_scholar.py # Google Scholar search (via Serper API)
│ ├── tool_visit.py # Webpage reader (via Jina Reader API) with LLM summarization
│ └── tool_code.py # Python code interpreter
├── trajectory/
│ ├── uniscientist_research_traj.jsonl # Single-rollout trajectories (with tools)
│ ├── uniscientist_research_no_tool_traj.jsonl # Single-rollout trajectories (without tools)
│ └── uniscientist_research_aggregate8_traj.jsonl # Aggregated trajectories (Aggr@8)
├── data/ # Place your input data here (see Data Format below)
├── requirements.txt
└── .gitignore
pip install -r requirements.txtEdit local_deploy.sh to set MODEL_PATH to your model weights, then:
bash local_deploy.shThis starts a vLLM OpenAI-compatible server on port 8000. Wait until the server is ready before proceeding.
Edit inference_local_qwen.sh to fill in your API keys and configuration, then run it multiple times to collect diverse rollouts:
# Run N times to collect N rollouts
bash inference_local_qwen.sh
bash inference_local_qwen.sh
bash inference_local_qwen.shEach run produces (or appends to) a .jsonl output file named <STORED_MODEL_NAME>_<BENCHMARK>.jsonl.
Merge multiple rollout results into a single comprehensive report:
python inference_local_aggregate.py \
--model-name "UniScientist-30B-A3B" \
--data-paths rollout_1.jsonl rollout_2.jsonl rollout_3.jsonl \
--benchmark research \
--rollout-num 1 \
--llm-max-concurrency 32The following API keys are required for the tool suite:
| Key | Service | Description |
|---|---|---|
SERPER_KEY_ID |
Serper | Google web search & Google Scholar |
JINA_API_KEYS |
Jina Reader | Webpage content reading |
OPENROUTER_API_KEY |
OpenRouter | LLM-based webpage summarization |
Place your input data in the data/ directory as .jsonl files:
{"problem": "Your research question here", "answer": "Ground truth answer / rubrics (optional)"}| Argument | Required | Default | Description |
|---|---|---|---|
--model-name |
Yes | - | Model identifier for naming the output file |
--data-paths |
Yes | - | One or more rollout .jsonl files to aggregate |
--benchmark |
No | research |
Benchmark name for naming the output file |
--rollout-num |
No | 1 |
Number of aggregation passes per question cluster |
--local-base-url |
No | http://localhost:8000/v1 |
vLLM server endpoint |
--output-path |
No | auto-generated | Custom output file path |
--llm-max-concurrency |
No | 32 |
Max concurrent LLM requests |
If you find UniScientist useful in your research, please cite:
@misc{unipat2026uniscientist,
title = {UniScientist: Advancing Universal Scientific Research Intelligence},
author = {Baixuan Li and Jialong Wu and Yida Zhao and Wendong Xu and Xuanzhong Chen and Huifeng Yin and Liang Chen and Wentao Zhang and Kuan Li},
year = {2026},
url = {https://unipat.ai/blog/UniScientist}
}We are continuously expanding the Universal Scientific Research Dataset to cover additional disciplines and research paradigms. We welcome collaborations with research teams interested in advancing scientific research intelligence. Reach out at contact@unipat.ai.