Argus

Multi-tenant document ingestion platform. Connects to cloud storage providers, watches for changes, and builds a searchable knowledge base with versioning, content extraction, AI summaries, and vector embeddings.

What It Does

Watch — Connects to cloud storage providers via flux and monitors files the tenant has registered for observation
Extract — Pulls document content using format-specific extractors; delegates OCR to a Tesseract sidecar over gRPC for scanned/handwritten documents
Version — Tracks every revision of every document, maintaining a complete history
Enrich — Generates AI summaries via zyn and vector embeddings via vex for each document version
Index — Stores extracted content, summaries, and embeddings in OpenSearch, providing full-text and semantic search across all ingested documents for a given tenant

Storage Providers

Each provider implements a common interface and is developed independently:

Google Drive
OneDrive / SharePoint
Dropbox
Amazon S3
Google Cloud Storage
Azure Blob Storage

Supported Document Types

Category	Formats
Documents	PDF, DOCX, DOC, ODT, RTF, TXT, Markdown
Spreadsheets	XLSX, XLS, CSV, ODS
Presentations	PPTX, PPT, ODP
Images (OCR)	PNG, JPEG, TIFF, BMP, WebP
Scanned Documents	PDF (image-only), multi-page TIFF

Architecture

┌──────────────┐     ┌──────────────┐     ┌──────────────┐
│   Provider   │────▶│   Ingestion  │────▶│  Enrichment  │
│   Watchers   │     │   Pipeline   │     │   Pipeline   │
│  (flux)      │     │              │     │  (zyn + vex) │
└──────────────┘     └──────┬───────┘     └──────┬───────┘
                            │                    │
                     ┌──────▼───────┐     ┌──────▼───────┐
                     │   Tesseract  │     │  OpenSearch   │
                     │   Sidecar    │     │   Cluster     │
                     │   (gRPC)     │     │              │
                     └──────────────┘     └──────────────┘

Provider Watchers — flux capacitors monitor registered files/folders across cloud storage providers
Ingestion Pipeline — Extracts content, normalises formats, manages document versions; delegates OCR to a Tesseract sidecar via gRPC
Enrichment Pipeline — Generates AI summaries (zyn) and vector embeddings (vex) per document version
OpenSearch — Full-text and semantic search index per tenant

The system is designed for horizontal scalability from day one — pipelines are queue-driven and stateless, allowing independent scaling of ingestion, OCR, and enrichment workloads.

Tech Stack

Component	Technology
Language	Go
Framework	sum
Configuration	flux
LLM Orchestration	zyn
Embeddings	vex
OCR	Tesseract (gRPC sidecar)
Search & Storage	OpenSearch
Database	PostgreSQL
Object Storage	MinIO (dev) / S3-compatible (prod)
Observability	OpenTelemetry

Development

make dev        # Start local infrastructure
make run        # Run the application
make test       # Run tests
make check      # Run tests + lint

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.github/workflows		.github/workflows
admin		admin
api		api
cmd		cmd
config		config
events		events
internal		internal
migrations		migrations
models		models
proto		proto
stores		stores
testing		testing
tools		tools
.air.toml		.air.toml
.env.example		.env.example
.gitignore		.gitignore
.golangci.yml		.golangci.yml
.goreleaser.yml		.goreleaser.yml
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Argus

What It Does

Storage Providers

Supported Document Types

Architecture

Tech Stack

Development

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Argus

What It Does

Storage Providers

Supported Document Types

Architecture

Tech Stack

Development

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages