GitHub - Arseni1919/ragy: Temporal memory for your AI agents, financial research, and knowledge workflows.

Most RAG systems are stateless. Every query re-fetches, re-embeds, re-ranks.

ragy is different: index any topic across a full time window once, then use semantic similarity to retrieve the right days — not all days. Temporal memory for your AI agents, financial research, and knowledge workflows.

Why ragy?

Standard RAG retrieves documents. ragy retrieves moments in time.

You define a query (e.g. "Fed interest rate signals") and a time window (e.g. 365 days). ragy fetches and embeds every day's content once, stores it in a local vector database, and lets you instantly answer: "which days in the past year most resembled this?" — ranked by cosine similarity, plotted on a timeline.

Key difference from other RAG projects:

	Standard RAG	ragy
Data model	Documents	Days in time
Re-fetch per query?	Yes	No — indexed once
Time-aware?	No	Yes — date is a first-class dimension
Visualization	None	Similarity timeline (`xray`)
Scheduling	Manual	Built-in APScheduler
Agent integration	Varies	Native MCP server

What You Can Build

🌐 Enterprise data collection — Use Bright Data for large-scale web scraping, structured data extraction, and professional-grade data collection. Index competitive intelligence, market research, or any web data with advanced scraping capabilities.

📈 Financial research — Index a year of market news with yfinance, query "Fed pivot signals", get back the 10 most semantically similar trading days with similarity scores and related tickers on a timeline. Track multiple stocks with scheduled daily updates — no API costs.

🔍 Competitive intelligence — Schedule daily indexing of topics with Bright Data or Tavily. Ask "what weeks had the most activity around X?" Retrieve content ranked by relevance, not recency.

🤖 AI agent long-term memory — Give your Claude / LangGraph / n8n agent a persistent temporal knowledge base via MCP. No search API call on every turn — just semantic retrieval from your local index.

💹 Stock monitoring dashboard — Create collections for each stock you track (AAPL, NVDA, TSLA) using yfinance source. Query across time to find similar market conditions, earnings patterns, or news sentiment shifts.

See It in Action

The `xray` command — your temporal similarity radar

ragy> xray
Collection: Stock_JNJ
Query: high price expected
Top K: 10

Returns a ranked list of dates with similarity scores, plotted on a timeline. This is the fastest way to understand when your topic was most relevant.

Financial Data Search

ragy> search_yfin
Search query: tesla stock
Max results: 3

Returns financial news articles with related ticker symbols (e.g., TSLA), publisher information, and direct links to Yahoo Finance. Perfect for quick market research or monitoring specific stocks.

Quick Start

One-command install

curl -fsSL https://raw.githubusercontent.com/Arseni1919/ragy/main/install.sh | bash

Manual setup

git clone https://github.com/Arseni1919/ragy.git
cd ragy
uv sync

Create a .env file — API keys optional depending on data source:

# For Bright Data (advanced scraping)
BRIGHT_DATA_API_KEY="your-key-here"
BRIGHT_DATA_ZONE="your-zone"

# For Tavily (general web search)
TAVILY_API_KEY="your-key-here"   # Get free at tavily.com

# Note: yfinance (financial data) needs no API key

Start the stack:

# Terminal 1 — API
uv run uvicorn ragy_api.main:app --reload

# Terminal 2 — CLI
uv run ragy

First run note: embedding models (~80MB) download automatically in the background. First query may take 10–30 seconds; subsequent runs are instant.

Docker Quick Start

The fastest way to run RAGY:

git clone https://github.com/Arseni1919/ragy.git
cd ragy
cp .env.example .env
# Add your Tavily API key to .env
docker-compose up -d

Access at http://localhost:8000/docs

See README-DOCKER.md for complete Docker documentation.

Core Workflow

1. Index a topic over time

ragy> create_index
Query: artificial intelligence news
Collection name: ai_2024
Number of days: 365

This fetches, embeds, and stores one entry per day for the past 365 days. Run once. Query forever.

2. Retrieve semantically relevant days

ragy> extract
Collection: ai_2024
Query: transformer architecture improvements
Top K: 5

Returns the 5 days most semantically similar to your query — not keyword matches, not most recent, but most relevant.

3. Visualize similarity over time

ragy> xray
Collection: ai_2024
Query: open source model releases
Top K: 10

Plots similarity scores as a timeline. Instantly see if your topic had a spike, a gradual trend, or scattered activity.

4. Schedule automatic updates

ragy> create_job
Query: tech news
Collection name: daily_tech
Data source: bright_data   # or 'tavily' or 'yfinance'
Interval type: day
Interval amount: 1

Example with yfinance for financial monitoring:

ragy> create_job
Query: nvidia stock news
Collection name: nvidia_tracker
Data source: yfinance       # financial news + quotes with tickers
Interval type: day
Interval amount: 1

ragy updates your collection every day at the scheduled hour. Your index stays current without manual work. Choose bright_data for advanced scraping, tavily for general web search, or yfinance for financial data (stocks, news, quotes) — no API key needed for yfinance.

Data Sources

ragy supports multiple data sources for indexing and search. Choose based on your use case:

Bright Data (Advanced Web Scraping)

Best for: Structured data extraction, large-scale web scraping, alternative search
Requires: API key and zone credentials
Content: Web pages, structured data, search results
Use when: Need more control over scraping, bypassing restrictions, or professional-grade data collection

Tavily (General Web Search)

Best for: News, articles, general web content, research
Requires: API key (free tier available at tavily.com)
Content: Web pages, news articles, blog posts
Use when: Tracking general topics, news, or web content

yfinance (Financial Data)

Best for: Stock market news, financial data, company information
Requires: No API key needed (uses Yahoo Finance)
Content: Financial news articles with related tickers, stock quotes, company data
Use when: Monitoring stocks, tracking financial news, building investment research tools

Example comparison:

# Advanced scraping → use Bright Data
ragy> create_job
Query: competitor analysis
Data source: bright_data  # Advanced web scraping capabilities

# General tech news → use Tavily
ragy> create_job
Query: artificial intelligence breakthroughs
Data source: tavily

# Stock-specific news → use yfinance
ragy> create_job
Query: nvidia earnings
Data source: yfinance    # Returns news + NVDA ticker info

CLI Reference

21 commands across 5 categories:

Index Management

Command	Description
`create_index`	Build temporal vector index for a query + time window
`delete_index`	Remove a collection
`upload_csv`	Import from CSV (`date`, `content` columns required)

Query & Extract

Command	Description
`extract`	Retrieve top-K days by semantic similarity
`search`	Live web search via Tavily
`search_yfin`	Search financial data via yfinance (stocks, news, quotes)

Inspect

Command	Description
`list`	All collections
`status`	Document count for a collection
`sample`	Inspect a single document by index
`head_index` / `tail_index`	Preview first / last 5 documents
`stats`	Full database overview
`xray`	Similarity timeline plot

Scheduling

Command	Description
`jobs`	List scheduled jobs
`create_job`	Set up recurring index updates
`delete_job`	Remove a job

System

Command	Description
`health`	API health check
`info`	Embedding model details
`change_emb`	Swap embedding model
`help` / `exit` / `shutdown`	Utility commands

REST API

19 endpoints. Full Swagger UI at http://localhost:8000/docs.

Category	Key Endpoints
Index	`POST /api/v1/index/create` (SSE streaming)
Extract	`POST /api/v1/extract/data` · `POST /api/v1/extract/all`
Database	`GET /api/v1/database/stats` · `GET /api/v1/database/collection/{name}/distribution`
Search	`POST /api/v1/search/web` (Tavily) · `POST /api/v1/search/yfinance` (financial data)
Scheduler	`POST /api/v1/system/scheduler/jobs/create`
Upload	`POST /api/v1/upload/csv`

MCP Integration

ragy ships a native MCP server, letting Claude Desktop (and any MCP-compatible agent) query your temporal knowledge base directly.

Available tools

Tool	Description
`list_collections`	See all indexed topics
`extract_all`	Retrieve relevant days by semantic similarity
`search_web`	Live web search via Tavily
`get_database_stats`	Database overview
`health_check`	API status

Setup

1. Start the API:

uv run uvicorn ragy_api.main:app --host 0.0.0.0 --port 8000

2. Add to claude_desktop_config.json:

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json

{
  "mcpServers": {
    "ragy": {
      "command": "uv",
      "args": ["run", "ragy-mcp"],
      "cwd": "/absolute/path/to/ragy"
    }
  }
}

3. Ask Claude:

"Which days in my ai_2024 collection are most similar to 'GPT-4 level breakthroughs'?"
"Show me database stats"
"Search the web for recent LLM benchmark results"

Architecture

graph TD
    D[Search Engines<br/>Bright Data / Tavily / yfinance] -->|fetch + index| B
    C[Embeddings<br/>HuggingFace / Ollama] -->|encode| B
    A[zvec<br/>local vector store] <-->|store / retrieve| B[FastAPI Backend]
    B --> E[CLI Client]
    B --> F[MCP Server]
    B --> G[HTTP / REST]
    H[APScheduler] -->|daily update| B

    style B fill:#0066FF,color:#fff
    style E fill:#00DDFF,color:#000
    style F fill:#00DDFF,color:#000

Data flows:

Index: Bright Data/Tavily/yfinance → FastAPI → Embeddings → zvec (runs once)
Query: CLI / MCP / HTTP → FastAPI → Embeddings → zvec → ranked results
Schedule: APScheduler → FastAPI → Bright Data/Tavily/yfinance → zvec (runs daily)

All processing is local. Only search API calls (Bright Data, Tavily, yfinance) go to the network.

Configuration

# Required (only for Tavily search)
TAVILY_API_KEY="..."           # Get free key at tavily.com
                               # Note: yfinance search works without any API key

# Optional — sensible defaults shown
HF_EMB_MODEL="google/embeddinggemma-300m"
DB_PATH="./ragy_db"
DB_PROVIDER="zvec"             # Vector database: "zvec" (default) or "chromadb"
RAGY_MAX_CONCURRENT=10
API_HOST="0.0.0.0"
API_PORT=8000
SCHEDULER_ENABLED=true
SCHEDULER_HOUR=2               # Daily update hour (UTC)
SCHEDULER_TIMEZONE="UTC"
JOBS_DB_PATH="./ragy_jobs.db"

# Optional — for Bright Data integration
BRIGHT_DATA_API_KEY="..."
BRIGHT_DATA_ZONE="..."

Data Source Selection:

Use bright_data source for advanced scraping: requires BRIGHT_DATA_API_KEY and BRIGHT_DATA_ZONE in .env
Use tavily source when creating jobs: requires TAVILY_API_KEY in .env
Use yfinance source for financial data: no API key needed, works out of the box

Vector Database:

zvec (default): High-performance C++ vector database from Alibaba, optimized for speed and memory
chromadb: Python-based alternative, can be switched via DB_PROVIDER="chromadb" in .env

To use Ollama embeddings instead of HuggingFace:

ragy> change_emb
# Select: ollama
# Model: nomic-embed-text

CSV Upload Format

Bring your own data — any time-series corpus works:

date,content,title,url
2024-01-15,Full text of article or document...,Optional title,https://...
2024-01-16,...

date and content are required. Any additional columns are stored as metadata and returned in query results.

Project Structure

ragy/
├── ragy_api/          # FastAPI backend (19 endpoints)
│   ├── services/      # Business logic
│   └── routers/       # Route handlers
├── ragy_cli/          # Terminal interface (21 commands)
├── ragy_mcp/          # MCP server (5 tools)
├── conn_db/           # Database factory (zvec/chromadb)
├── conn_zvec/         # zvec connector (default)
├── conn_emb_hugging_face/
├── conn_emb_ollama/
├── conn_tavily/       # Tavily search API
├── conn_bright_data/  # Bright Data scraping API
├── sample_data/       # Sample datasets to try immediately
└── pyproject.toml

Contributing

git fork https://github.com/Arseni1919/ragy
git checkout -b feature/your-feature
uv sync
# make changes
git commit -m "feat: your feature"
git push origin feature/your-feature
# open PR

See CLAUDE.md for code conventions, testing guidelines, and development workflows.

Acknowledgments

zvec — high-performance C++ vector database
Bright Data — web scraping and data collection
Tavily — web search API
yfinance — Yahoo Finance data access
Sentence Transformers — embedding models
FastAPI — web framework
Rich — terminal formatting

MIT License · Issues · Discussions

Made with ❤️ by Arseniy

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
conn_bright_data		conn_bright_data
conn_db		conn_db
conn_emb_hugging_face		conn_emb_hugging_face
conn_emb_ollama		conn_emb_ollama
conn_llm		conn_llm
conn_tavily		conn_tavily
conn_yfinance		conn_yfinance
conn_zvec		conn_zvec
docs/screenshots		docs/screenshots
ragy_api		ragy_api
ragy_cli		ragy_cli
ragy_mcp		ragy_mcp
sample_data		sample_data
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
Dockerfile		Dockerfile
LICENSE		LICENSE
MIGRATION_COMPLETE.md		MIGRATION_COMPLETE.md
README-DOCKER.md		README-DOCKER.md
README.md		README.md
ZVEC_MIGRATION.md		ZVEC_MIGRATION.md
clean_migrate_to_zvec.py		clean_migrate_to_zvec.py
clean_migrate_to_zvec_auto.py		clean_migrate_to_zvec_auto.py
demo_seed.py		demo_seed.py
docker-compose.yml		docker-compose.yml
install.sh		install.sh
main.py		main.py
migrate_chromadb_to_zvec.py		migrate_chromadb_to_zvec.py
pyproject.toml		pyproject.toml
test_api_zvec.sh		test_api_zvec.sh
test_cli_commands.sh		test_cli_commands.sh
test_max_results.py		test_max_results.py
test_yfinance_integration.py		test_yfinance_integration.py
test_zvec_integration.py		test_zvec_integration.py
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

Why ragy?

What You Can Build

See It in Action

The xray command — your temporal similarity radar

Financial Data Search

Quick Start

One-command install

Manual setup

Docker Quick Start

Core Workflow

1. Index a topic over time

2. Retrieve semantically relevant days

3. Visualize similarity over time

4. Schedule automatic updates

Data Sources

Bright Data (Advanced Web Scraping)

Tavily (General Web Search)

yfinance (Financial Data)

CLI Reference

Index Management

Query & Extract

Inspect

Scheduling

System

REST API

MCP Integration

Available tools

Setup

Architecture

Configuration

CSV Upload Format

Project Structure

Contributing

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

The `xray` command — your temporal similarity radar

Packages