A local AI-powered research paper management tool with multi-agent processing, interactive reading, and Notion sync.
- Batch Import — paste arXiv URLs, a Router Agent distributes them to sub-agents that run in parallel
- Auto-processing pipeline — LaTeX download → section parsing → note generation → auto-tagging → RAG indexing
- Reading view — side-by-side PDF + Agent Q&A panel with RAG-backed answers citing paper sections
- Smart notes — notes follow your custom
note_skill.mdtemplate; update them from Q&A with one click - Note editor — edit Markdown notes directly in the app, download as
.mdanytime - Notion sync — push any note to a Notion database page with rich formatting, math, and metadata
- arXiv source-first parsing — expands multi-file LaTeX projects before note generation and Q&A
git clone <this-repo>
cd paperlab
chmod +x setup.sh start.sh
./setup.sh # installs Python venv + Node deps
# edit backend/.env and set OPENROUTER_API_KEY
./start.sh # launches API on :8000, frontend on :5173# Required
OPENROUTER_API_KEY=sk-or-...
OPENROUTER_BASE_URL=https://openrouter.ai/api/v1
OPENROUTER_MODEL=openai/gpt-5.1-chat
OPENROUTER_AGENT_MODEL=
# Optional per-task overrides
OPENROUTER_NOTE_MODEL=
OPENROUTER_QA_MODEL=
OPENROUTER_TAG_MODEL=
# Optional — for Notion sync
NOTION_API_KEY=secret_...
NOTION_DATABASE_ID=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxEdit backend/skills/note_skill.md. This is your personal note template.
- Use
{{title}},{{authors}},{{date}},{{institution}},{{arxiv_id}},{{tags}}for auto-fill - Mark sections with
[AGENT: ...]and the agent will fill them in on import - Changes take effect on next import or reprocess (use the ↺ button in the table)
Run the CLI from the repo root with the backend virtualenv:
backend/.venv/bin/python backend/cli.py ingest https://arxiv.org/abs/1707.06347
backend/.venv/bin/python backend/cli.py read 1707.06347 --mode quick
backend/.venv/bin/python backend/cli.py ask 1707.06347 "What is the core contribution?"
backend/.venv/bin/python backend/cli.py note refresh 1707.06347 --focus method
backend/.venv/bin/python backend/cli.py note show 1707.06347The CLI agent uses tool calling to inspect local paper assets (outline, sections, source_files, notes, and chat history) instead of stuffing the whole paper into one prompt.
- Go to https://www.notion.so/my-integrations
- Click + New integration, give it a name (e.g. "PaperLab")
- Select your workspace, enable Read content, Update content, Insert content
- Copy the Internal Integration Token → set as
NOTION_API_KEYin.env
Create a new database in Notion with these properties:
| Property | Type | Notes |
|---|---|---|
| Name | Title | Paper title |
| Tags | Multi-select | e.g. LLMs算法, RL |
| arXiv ID | Rich text | e.g. 1707.06347 |
| Institution | Rich text | e.g. OpenAI |
| Type | Select | method / analysis / survey / benchmark |
| Importance | Number | 1–5 stars |
| arXiv Link | URL | auto-filled |
| Synced At | Date | auto-filled |
In your Notion database: Share → find your integration → Invite
From the database URL:
https://notion.so/yourworkspace/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX?v=...
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This is your Database ID
Set as NOTION_DATABASE_ID in .env.
In PaperLab, open any paper's reading view and click Sync to Notion. The note (Markdown → Notion blocks) plus all metadata will be pushed. Subsequent syncs update the existing page.
paperlab/
├── backend/
│ ├── main.py # FastAPI app (REST + WebSocket)
│ ├── config.py # Settings via pydantic-settings
│ ├── agents/
│ │ ├── router.py # Main Router Agent (batch orchestration)
│ │ ├── downloader.py # arXiv LaTeX download + clean + parse
│ │ ├── note_gen.py # Note Generation Agent (uses note_skill.md)
│ │ ├── rag.py # ChromaDB vector store for Q&A
│ │ ├── qa_agent.py # Reading Q&A Agent (RAG + streaming)
│ │ └── notion_sync.py # Notion export (Markdown → Blocks)
│ ├── db/
│ │ └── database.py # SQLAlchemy async + models (Paper, Note, Chat)
│ ├── skills/
│ │ └── note_skill.md # Your personal note template ← edit this!
│ └── requirements.txt
├── frontend/
│ └── src/
│ ├── App.tsx # Root + WebSocket listener
│ ├── store/ # Zustand global state
│ ├── api/client.ts # API calls
│ ├── views/
│ │ ├── TableView.tsx # Paper list with filters
│ │ └── ReadingView.tsx # Split: PDF + Chat/Notes
│ └── components/
│ ├── Sidebar.tsx
│ ├── TopBar.tsx
│ ├── ImportModal.tsx
│ └── SettingsModal.tsx
├── notes/ # Markdown note files (also usable in Obsidian)
├── data/
│ ├── latex/ # Expanded LaTeX + preserved source assets per paper
│ ├── chroma/ # ChromaDB vector indices
│ └── paperlab.db # SQLite database
├── setup.sh
└── start.sh
URL Input
│
▼
Router Agent ← validates + deduplicates arXiv IDs
│
├─[concurrent, max 3]─►
│
▼
Download Agent ← fetches tar.gz from arxiv.org/src/{id}
│ expands multi-file .tex project, preserves informative source assets
▼
Meta Agent ← OpenRouter model extracts: title, authors, abstract
│ auto-assigns: tags, importance, type, institution
▼
Note Agent ← reads note_skill.md template
│ fills [AGENT:...] sections using LaTeX content
│ saves to notes/{arxiv_id}.md + DB
▼
RAG Agent ← splits LaTeX into sections
embeds with all-MiniLM-L6-v2 (local)
stores in per-paper ChromaDB collection
ready for Q&A
User Question
│
▼
PaperRAG.retrieve() ← embed question, top-5 cosine similar chunks
│
▼
QA Agent ← OpenRouter model reads: abstract + retrieved sections + chat history
│ answers with §section citations
▼
User reads answer
│
▼ (optional)
Update Note Agent ← rewrites 个人思考 section based on entire Q&A session
The notes/ directory contains plain Markdown files named {arxiv_id}.md.
Point Obsidian at this folder as a vault — all notes are immediately usable,
editable, and linkable from Obsidian, and changes sync back to PaperLab on next open.
- arXiv papers without source files will fail with a clear error message; use the ↺ reprocess button after the source is available
- PDF viewing uses the arXiv PDF iframe — no annotation/highlight persistence yet
- The ChromaDB embedder (
all-MiniLM-L6-v2) downloads ~90MB on first use - Notion API rate limits: large notes (200+ blocks) may take 5–10s to sync
MIT