📄 PaperLab

A local AI-powered research paper management tool with multi-agent processing, interactive reading, and Notion sync.

Features

Batch Import — paste arXiv URLs, a Router Agent distributes them to sub-agents that run in parallel
Auto-processing pipeline — LaTeX download → section parsing → note generation → auto-tagging → RAG indexing
Reading view — side-by-side PDF + Agent Q&A panel with RAG-backed answers citing paper sections
Smart notes — notes follow your custom note_skill.md template; update them from Q&A with one click
Note editor — edit Markdown notes directly in the app, download as .md anytime
Notion sync — push any note to a Notion database page with rich formatting, math, and metadata
arXiv source-first parsing — expands multi-file LaTeX projects before note generation and Q&A

Quick Start

git clone <this-repo>
cd paperlab
chmod +x setup.sh start.sh
./setup.sh          # installs Python venv + Node deps
# edit backend/.env and set OPENROUTER_API_KEY
./start.sh          # launches API on :8000, frontend on :5173

Open http://localhost:5173

Configuration

`backend/.env`

# Required
OPENROUTER_API_KEY=sk-or-...
OPENROUTER_BASE_URL=https://openrouter.ai/api/v1
OPENROUTER_MODEL=openai/gpt-5.1-chat
OPENROUTER_AGENT_MODEL=

# Optional per-task overrides
OPENROUTER_NOTE_MODEL=
OPENROUTER_QA_MODEL=
OPENROUTER_TAG_MODEL=

# Optional — for Notion sync
NOTION_API_KEY=secret_...
NOTION_DATABASE_ID=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Customise your Note Skill

Edit backend/skills/note_skill.md. This is your personal note template.

Use {{title}}, {{authors}}, {{date}}, {{institution}}, {{arxiv_id}}, {{tags}} for auto-fill
Mark sections with [AGENT: ...] and the agent will fill them in on import
Changes take effect on next import or reprocess (use the ↺ button in the table)

CLI Reading Agent

Run the CLI from the repo root with the backend virtualenv:

backend/.venv/bin/python backend/cli.py ingest https://arxiv.org/abs/1707.06347
backend/.venv/bin/python backend/cli.py read 1707.06347 --mode quick
backend/.venv/bin/python backend/cli.py ask 1707.06347 "What is the core contribution?"
backend/.venv/bin/python backend/cli.py note refresh 1707.06347 --focus method
backend/.venv/bin/python backend/cli.py note show 1707.06347

The CLI agent uses tool calling to inspect local paper assets (outline, sections, source_files, notes, and chat history) instead of stuffing the whole paper into one prompt.

Notion Integration

1. Create a Notion Integration

Go to https://www.notion.so/my-integrations
Click + New integration, give it a name (e.g. "PaperLab")
Select your workspace, enable Read content, Update content, Insert content
Copy the Internal Integration Token → set as NOTION_API_KEY in .env

2. Set up your Notion Database

Create a new database in Notion with these properties:

Property	Type	Notes
Name	Title	Paper title
Tags	Multi-select	e.g. LLMs算法, RL
arXiv ID	Rich text	e.g. 1707.06347
Institution	Rich text	e.g. OpenAI
Type	Select	method / analysis / survey / benchmark
Importance	Number	1–5 stars
arXiv Link	URL	auto-filled
Synced At	Date	auto-filled

3. Share database with integration

In your Notion database: Share → find your integration → Invite

4. Get the Database ID

From the database URL:

https://notion.so/yourworkspace/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX?v=...
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                                This is your Database ID

Set as NOTION_DATABASE_ID in .env.

5. Sync

In PaperLab, open any paper's reading view and click Sync to Notion. The note (Markdown → Notion blocks) plus all metadata will be pushed. Subsequent syncs update the existing page.

Architecture

paperlab/
├── backend/
│   ├── main.py              # FastAPI app (REST + WebSocket)
│   ├── config.py            # Settings via pydantic-settings
│   ├── agents/
│   │   ├── router.py        # Main Router Agent (batch orchestration)
│   │   ├── downloader.py    # arXiv LaTeX download + clean + parse
│   │   ├── note_gen.py      # Note Generation Agent (uses note_skill.md)
│   │   ├── rag.py           # ChromaDB vector store for Q&A
│   │   ├── qa_agent.py      # Reading Q&A Agent (RAG + streaming)
│   │   └── notion_sync.py   # Notion export (Markdown → Blocks)
│   ├── db/
│   │   └── database.py      # SQLAlchemy async + models (Paper, Note, Chat)
│   ├── skills/
│   │   └── note_skill.md    # Your personal note template ← edit this!
│   └── requirements.txt
├── frontend/
│   └── src/
│       ├── App.tsx           # Root + WebSocket listener
│       ├── store/            # Zustand global state
│       ├── api/client.ts     # API calls
│       ├── views/
│       │   ├── TableView.tsx # Paper list with filters
│       │   └── ReadingView.tsx # Split: PDF + Chat/Notes
│       └── components/
│           ├── Sidebar.tsx
│           ├── TopBar.tsx
│           ├── ImportModal.tsx
│           └── SettingsModal.tsx
├── notes/                   # Markdown note files (also usable in Obsidian)
├── data/
│   ├── latex/               # Expanded LaTeX + preserved source assets per paper
│   ├── chroma/              # ChromaDB vector indices
│   └── paperlab.db          # SQLite database
├── setup.sh
└── start.sh

Agent Pipeline (per paper)

URL Input
   │
   ▼
Router Agent          ← validates + deduplicates arXiv IDs
   │
   ├─[concurrent, max 3]─►
   │
   ▼
Download Agent        ← fetches tar.gz from arxiv.org/src/{id}
   │                    expands multi-file .tex project, preserves informative source assets
   ▼
Meta Agent            ← OpenRouter model extracts: title, authors, abstract
   │                    auto-assigns: tags, importance, type, institution
   ▼
Note Agent            ← reads note_skill.md template
   │                    fills [AGENT:...] sections using LaTeX content
   │                    saves to notes/{arxiv_id}.md + DB
   ▼
RAG Agent             ← splits LaTeX into sections
                        embeds with all-MiniLM-L6-v2 (local)
                        stores in per-paper ChromaDB collection
                        ready for Q&A

Q&A Flow

User Question
   │
   ▼
PaperRAG.retrieve()   ← embed question, top-5 cosine similar chunks
   │
   ▼
QA Agent              ← OpenRouter model reads: abstract + retrieved sections + chat history
   │                    answers with §section citations
   ▼
User reads answer
   │
   ▼ (optional)
Update Note Agent     ← rewrites 个人思考 section based on entire Q&A session

Notes as Obsidian vault

The notes/ directory contains plain Markdown files named {arxiv_id}.md. Point Obsidian at this folder as a vault — all notes are immediately usable, editable, and linkable from Obsidian, and changes sync back to PaperLab on next open.

Limitations

arXiv papers without source files will fail with a clear error message; use the ↺ reprocess button after the source is available
PDF viewing uses the arXiv PDF iframe — no annotation/highlight persistence yet
The ChromaDB embedder (all-MiniLM-L6-v2) downloads ~90MB on first use
Notion API rate limits: large notes (200+ blocks) may take 5–10s to sync

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📄 PaperLab

Features

Quick Start

Configuration

`backend/.env`

Customise your Note Skill

CLI Reading Agent

Notion Integration

1. Create a Notion Integration

2. Set up your Notion Database

3. Share database with integration

4. Get the Database ID

5. Sync

Architecture

Agent Pipeline (per paper)

Q&A Flow

Notes as Obsidian vault

Limitations

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
backend		backend
frontend		frontend
.gitignore		.gitignore
README.md		README.md
setup.sh		setup.sh
start.sh		start.sh

Folders and files

Latest commit

History

Repository files navigation

📄 PaperLab

Features

Quick Start

Configuration

backend/.env

Customise your Note Skill

CLI Reading Agent

Notion Integration

1. Create a Notion Integration

2. Set up your Notion Database

3. Share database with integration

4. Get the Database ID

5. Sync

Architecture

Agent Pipeline (per paper)

Q&A Flow

Notes as Obsidian vault

Limitations

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`backend/.env`

Packages