GitHub - polyvia-ai/polyvia: Multimodal Document Agents over 100K+ files — enterprise agents for large-scale retrieval, research and automation over multimodal docs.

Polyvia: Multimodal Document Agents over 100K+ files

We build enterprise agents for large-scale retrieval, research and automation over multimodal docs.

Docs · Quickstart · Python SDK · TypeScript SDK · Polyvia Platform · Homepage

We’re releasing Polyvia 1, as two products:

Polyvia API: Multimodal Document Retrieval API (for developers of AI agents) - available now.
Polyvia Platform: Research and Automation Agent over 100K+ multimodal docs (for knowledge workers in enterprises) - coming soon.

We index your unstructured & visual & multimodal docs (PDFs, charts, slides, complex tables, infographics, scans, handwriting, invoices, and more) into multimodal knowledge ontology, with agents running on top for retrieval, research and automation — every answer grounded in a cited source page, in sub-200ms.

Why Polyvia

1. Fast over 100K+ multimodal docs. Agentic, file-by-file search (Claude Code, Claude Cowork, Codex) works only up to ~100 multimodal files — past that it's too slow, and at scale you still need retrieval. Polyvia does sub-200ms search over 100K+ files, every answer grounded in a cited source page.

2. End-to-end — no need for extractors or PDF parsers. When you build large-scale multimodal RAG over a company's files, the only infra available today is visual extractors / PDF parsers (Reducto, LlamaIndex). There's no end-to-end infra for large-scale multimodal document retrieval — until Polyvia: VLM Visual Extractor → Multimodal Knowledge Ontology (mapping all your company's data and processes) → Self-Improving Retrieval Agent.

3. All unstructured, visual and multimodal data inputs in one API. Available now: PDFs, charts, infographics, complex multi-page tables, slides, pictures, handwriting, scans, invoices, audio. Coming soon: video, healthcare scans / EHR, chemical & molecular data, CAD & technical drawings, heatmaps.

What people build with it

Multimodal RAG inside your own agent — retrieval-as-a-tool over large doc sets.
Data-room / due-diligence search — query 100+ visual-heavy PDFs jointly (PE, IB, M&A).
Counterparty & credit monitoring — EBITDA, opex, revenue across hundreds of borrower reports.
Image-based claim processing — describe claim photos in the context of a policy.
Cross-engagement slide search — find answers buried in thousands of slides.

Install

pip install polyvia        # Python 3.9+
npm  install polyvia       # Node 18+

Quickstart

Grab a key in Polyvia Platform → Settings → API. Ingest a batch into a group, then ask one question across the whole corpus — answers cite the exact page in each document.

Python SDK

from polyvia import Polyvia

client = Polyvia(api_key="poly_<key>")  # or set POLYVIA_API_KEY

# Ingest a batch into a group, then ask one question across all of it.
items = client.ingest.batch(
    ["q1.pdf", "q2.pdf", "q3.pdf", "q4.pdf"],
    group="FY24 Earnings",
)
for item in items:
    client.ingest.wait(item.task_id)

print(client.query("How did revenue trend across the four quarters?",
                   group="FY24 Earnings").answer)

JavaScript/TypeScript SDK

import { Polyvia } from "polyvia";

const client = new Polyvia({ apiKey: "poly_<key>" });

const items = await client.ingest.batch(
  ["q1.pdf", "q2.pdf", "q3.pdf", "q4.pdf"],
  { group: "FY24 Earnings" },
);
await Promise.all(items.map((i) => client.ingest.wait(i.task_id)));

const answer = await client.query(
  "How did revenue trend across the four quarters?",
  { group: "FY24 Earnings" },
);
console.log(answer.answer);

Scope a query three ways: a single document_id (fastest), a group / group_ids, or the whole workspace (no scope).

More examples

Runnable scripts live in examples/. A few highlights:

Example	What it shows
`query_scopes.py`	All four query scopes — workspace, document, group, many groups
`groups_and_documents.py`	Create/find/list groups; list, get, move and delete documents
`batch_group.py`	Ingest a batch into a group, then query across it
`async_client.py`	`AsyncPolyvia` — the same surface, awaitable
`agent_tool.py`	Expose Polyvia retrieval as a tool to your own agent
`curl.sh`	The same loop over raw HTTP, no SDK

Querying across scopes, for example:

# whole workspace · a group (by name) · one document (fastest) · many groups (by id)
client.query("What risks recur across all reports?")
client.query("How did revenue trend?", group="FY24 Earnings")
client.query("Executive summary?", document_id="doc_<id>")
client.query("Compare the deals.", group_ids=["g_<id>", "g_<id>"])

→ Full Developer Docs

Use it from an agent

MCP — connect Claude Code (or any MCP client) to the hosted Polyvia MCP server in one line, so your agent can retrieve over your documents as a tool:

claude mcp add --transport http polyvia https://app.polyvia.ai/mcp \
  --header "Authorization: Bearer poly_<your-key>"

Agent Skills — install Polyvia skills into Claude Code, Cursor, and other agent clients:

npx skills add polyvia-ai/skills

→ MCP docs · Agent Skills

Roadmap

	Product	For	Status
Polyvia-1.1	Polyvia API — Multimodal Document Retrieval API	Developers of AI agents	Available now
Polyvia-1.2	Polyvia Platform — Research & Automation Agent over 100K+ multimodal docs	Knowledge workers in enterprises	Coming soon
Later	Polyvia Agents — build your own agent for automating processes on large volumes of multimodal docs	Builders & Teams	Planned
Later	More modalities — video, healthcare scans / EHR, chemical & molecular data, CAD & technical drawings, heatmaps	Builders & teams	Planned

Release log

We update this as we ship — latest first. Full notes at docs.polyvia.ai/versions.

Polyvia-1.1 — Polyvia API · available now

REST API v1 — ingest, documents, groups, query, usage, rate-limits; async ingestion with task polling and grounded citations.
Python SDK — pip install polyvia; typed sync and async clients, batch ingestion, idempotent groups, structured errors.
TypeScript SDK — npm install polyvia; fully typed, ESM/CJS, Node 18+.
MCP server — claude mcp add --transport http polyvia https://app.polyvia.ai/mcp --header "Authorization: Bearer poly_<your-key>".
Agent Skills — npx skills add polyvia-ai/skills for Claude Code, Cursor, and other agent clients.
Visual Document Modalities — Visual Document Intelligence + Audio: charts, graphs & plots, infographics, complex multi-page tables, slides & decks, reports & filings, scanned & photographed pages, invoices & forms, handwriting & annotations, diagrams & flowcharts, photos & images, and audio (calls, meetings, recordings).

Up next

Polyvia-1.2 — Polyvia Platform — Research & Automation Agent over 100K+ multimodal docs, for knowledge workers in enterprises.
More modalities (coming soon) — healthcare scans / EHR, chemical & molecular data, CAD & technical drawings, video, heatmaps.
Polyvia Agents — build your own agent for automating processes on large volumes of multimodal documents.

SDKs & reference

	Install	Source
Python	`pip install polyvia`	docs.polyvia.ai/products/python-sdk
TypeScript	`npm install polyvia`	docs.polyvia.ai/products/js-sdk
REST API	—	docs.polyvia.ai/api-reference
MCP	hosted · `app.polyvia.ai/mcp`	docs.polyvia.ai/products/mcp
Agent Skills	`npx skills add polyvia-ai/skills`	docs.polyvia.ai/products/skills

Supported inputs: PDFs · Word/PowerPoint/Excel (DOCX/PPTX/XLSX) · Markdown · text · images · audio. Charts, infographics, complex multi-page tables, slides, scans and handwriting are first-class.

Resources

Runnable snippets (Python, TypeScript, raw HTTP, MCP, agent-tool) live in examples/ — see the examples guide. See also CHANGELOG · CONTRIBUTING · SECURITY.

New to Polyvia? See what it does at polyvia.ai, or start free at app.polyvia.ai.

📚 Docs · 🖥️ Platform · ✉️ mateusz@polyvia.ai · senyao@polyvia.ai

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
assets		assets
examples		examples
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
README.md		README.md
SECURITY.md		SECURITY.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Polyvia: Multimodal Document Agents over 100K+ files

Why Polyvia

What people build with it

Install

Quickstart

Python SDK

JavaScript/TypeScript SDK

More examples

Use it from an agent

Roadmap

Release log

Polyvia-1.1 — Polyvia API · available now

Up next

SDKs & reference

Resources

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Polyvia: Multimodal Document Agents over 100K+ files

Why Polyvia

What people build with it

Install

Quickstart

Python SDK

JavaScript/TypeScript SDK

More examples

Use it from an agent

Roadmap

Release log

Polyvia-1.1 — Polyvia API · available now

Up next

SDKs & reference

Resources

License

About

Resources

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages