Skip to content

chengliu01/paperlab

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

📄 PaperLab

A local AI-powered research paper management tool with multi-agent processing, interactive reading, and Notion sync.


Features

  • Batch Import — paste arXiv URLs, a Router Agent distributes them to sub-agents that run in parallel
  • Auto-processing pipeline — LaTeX download → section parsing → note generation → auto-tagging → RAG indexing
  • Reading view — side-by-side PDF + Agent Q&A panel with RAG-backed answers citing paper sections
  • Smart notes — notes follow your custom note_skill.md template; update them from Q&A with one click
  • Note editor — edit Markdown notes directly in the app, download as .md anytime
  • Notion sync — push any note to a Notion database page with rich formatting, math, and metadata
  • arXiv source-first parsing — expands multi-file LaTeX projects before note generation and Q&A

Quick Start

git clone <this-repo>
cd paperlab
chmod +x setup.sh start.sh
./setup.sh          # installs Python venv + Node deps
# edit backend/.env and set OPENROUTER_API_KEY
./start.sh          # launches API on :8000, frontend on :5173

Open http://localhost:5173


Configuration

backend/.env

# Required
OPENROUTER_API_KEY=sk-or-...
OPENROUTER_BASE_URL=https://openrouter.ai/api/v1
OPENROUTER_MODEL=openai/gpt-5.1-chat
OPENROUTER_AGENT_MODEL=

# Optional per-task overrides
OPENROUTER_NOTE_MODEL=
OPENROUTER_QA_MODEL=
OPENROUTER_TAG_MODEL=

# Optional — for Notion sync
NOTION_API_KEY=secret_...
NOTION_DATABASE_ID=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Customise your Note Skill

Edit backend/skills/note_skill.md. This is your personal note template.

  • Use {{title}}, {{authors}}, {{date}}, {{institution}}, {{arxiv_id}}, {{tags}} for auto-fill
  • Mark sections with [AGENT: ...] and the agent will fill them in on import
  • Changes take effect on next import or reprocess (use the ↺ button in the table)

CLI Reading Agent

Run the CLI from the repo root with the backend virtualenv:

backend/.venv/bin/python backend/cli.py ingest https://arxiv.org/abs/1707.06347
backend/.venv/bin/python backend/cli.py read 1707.06347 --mode quick
backend/.venv/bin/python backend/cli.py ask 1707.06347 "What is the core contribution?"
backend/.venv/bin/python backend/cli.py note refresh 1707.06347 --focus method
backend/.venv/bin/python backend/cli.py note show 1707.06347

The CLI agent uses tool calling to inspect local paper assets (outline, sections, source_files, notes, and chat history) instead of stuffing the whole paper into one prompt.


Notion Integration

1. Create a Notion Integration

  1. Go to https://www.notion.so/my-integrations
  2. Click + New integration, give it a name (e.g. "PaperLab")
  3. Select your workspace, enable Read content, Update content, Insert content
  4. Copy the Internal Integration Token → set as NOTION_API_KEY in .env

2. Set up your Notion Database

Create a new database in Notion with these properties:

Property Type Notes
Name Title Paper title
Tags Multi-select e.g. LLMs算法, RL
arXiv ID Rich text e.g. 1707.06347
Institution Rich text e.g. OpenAI
Type Select method / analysis / survey / benchmark
Importance Number 1–5 stars
arXiv Link URL auto-filled
Synced At Date auto-filled

3. Share database with integration

In your Notion database: Share → find your integration → Invite

4. Get the Database ID

From the database URL:

https://notion.so/yourworkspace/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX?v=...
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                                This is your Database ID

Set as NOTION_DATABASE_ID in .env.

5. Sync

In PaperLab, open any paper's reading view and click Sync to Notion. The note (Markdown → Notion blocks) plus all metadata will be pushed. Subsequent syncs update the existing page.


Architecture

paperlab/
├── backend/
│   ├── main.py              # FastAPI app (REST + WebSocket)
│   ├── config.py            # Settings via pydantic-settings
│   ├── agents/
│   │   ├── router.py        # Main Router Agent (batch orchestration)
│   │   ├── downloader.py    # arXiv LaTeX download + clean + parse
│   │   ├── note_gen.py      # Note Generation Agent (uses note_skill.md)
│   │   ├── rag.py           # ChromaDB vector store for Q&A
│   │   ├── qa_agent.py      # Reading Q&A Agent (RAG + streaming)
│   │   └── notion_sync.py   # Notion export (Markdown → Blocks)
│   ├── db/
│   │   └── database.py      # SQLAlchemy async + models (Paper, Note, Chat)
│   ├── skills/
│   │   └── note_skill.md    # Your personal note template ← edit this!
│   └── requirements.txt
├── frontend/
│   └── src/
│       ├── App.tsx           # Root + WebSocket listener
│       ├── store/            # Zustand global state
│       ├── api/client.ts     # API calls
│       ├── views/
│       │   ├── TableView.tsx # Paper list with filters
│       │   └── ReadingView.tsx # Split: PDF + Chat/Notes
│       └── components/
│           ├── Sidebar.tsx
│           ├── TopBar.tsx
│           ├── ImportModal.tsx
│           └── SettingsModal.tsx
├── notes/                   # Markdown note files (also usable in Obsidian)
├── data/
│   ├── latex/               # Expanded LaTeX + preserved source assets per paper
│   ├── chroma/              # ChromaDB vector indices
│   └── paperlab.db          # SQLite database
├── setup.sh
└── start.sh

Agent Pipeline (per paper)

URL Input
   │
   ▼
Router Agent          ← validates + deduplicates arXiv IDs
   │
   ├─[concurrent, max 3]─►
   │
   ▼
Download Agent        ← fetches tar.gz from arxiv.org/src/{id}
   │                    expands multi-file .tex project, preserves informative source assets
   ▼
Meta Agent            ← OpenRouter model extracts: title, authors, abstract
   │                    auto-assigns: tags, importance, type, institution
   ▼
Note Agent            ← reads note_skill.md template
   │                    fills [AGENT:...] sections using LaTeX content
   │                    saves to notes/{arxiv_id}.md + DB
   ▼
RAG Agent             ← splits LaTeX into sections
                        embeds with all-MiniLM-L6-v2 (local)
                        stores in per-paper ChromaDB collection
                        ready for Q&A

Q&A Flow

User Question
   │
   ▼
PaperRAG.retrieve()   ← embed question, top-5 cosine similar chunks
   │
   ▼
QA Agent              ← OpenRouter model reads: abstract + retrieved sections + chat history
   │                    answers with §section citations
   ▼
User reads answer
   │
   ▼ (optional)
Update Note Agent     ← rewrites 个人思考 section based on entire Q&A session

Notes as Obsidian vault

The notes/ directory contains plain Markdown files named {arxiv_id}.md. Point Obsidian at this folder as a vault — all notes are immediately usable, editable, and linkable from Obsidian, and changes sync back to PaperLab on next open.


Limitations

  • arXiv papers without source files will fail with a clear error message; use the ↺ reprocess button after the source is available
  • PDF viewing uses the arXiv PDF iframe — no annotation/highlight persistence yet
  • The ChromaDB embedder (all-MiniLM-L6-v2) downloads ~90MB on first use
  • Notion API rate limits: large notes (200+ blocks) may take 5–10s to sync

License

MIT

About

这是辅助阅读论文的 code cli agent 项目,提供有限的工具🔧,多轮对话生成笔记。可以传送至 notion api

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors