Repo Summarizer API (FastAPI)

API service that accepts a public GitHub repository URL and returns:

a human-readable summary
main technologies used
brief project structure description

It downloads the repo as a ZIP, filters/chooses the most relevant files (README/docs/configs/tree + selected code), fits them into the LLM context window, and calls an LLM to generate the summary.

Docker Compose architecture

Mermaid source (GitHub Web / VS Code)

flowchart LR
  U["User / Browser"] -->|HTTP :8501| S["Streamlit UI"]
  S -->|POST /summarize| A["FastAPI API :8000"]
  A -->|Download ZIP| G["GitHub Repo"]
  A -->|Embeddings| E["Embedding API"]
  A -->|Completion| L["LLM Provider"]

  subgraph DockerCompose
    S
    A
  end

API + RAG flow

Mermaid source (GitHub Web / VS Code)

sequenceDiagram
    autonumber
    participant U as User
    participant S as Streamlit UI
    participant A as FastAPI API
    participant G as GitHub
    participant R as RAG (chunk + retrieve)
    participant E as Embeddings (OpenAI)
    participant L as LLM (OpenAI/Nebius)

    U->>S: Enter repo URL + click Summarize
    S->>A: POST /summarize { github_url }
    A->>G: Download repo (ZIP)
    A->>A: Filter/score files (docs/config/entrypoints)
    A->>R: Chunk selected files

    alt LLM_PROVIDER = openai
        R->>E: Embed chunks + queries
        E-->>R: Vectors
        R-->>A: Top-K relevant chunks
    else LLM_PROVIDER = nebius
        R-->>A: Keyword-based Top-K chunks
    end

    A->>L: Prompt (tree + facts + Top-K chunks)
    L-->>A: JSON summary
    A-->>S: { summary, technologies, structure, evidence, confidence }

Screenshots

Example embedding:

Requirements

Python 3.10+

LLM providers

Supports OpenAI by default, and Nebius Token Factory optionally.

OpenAI (default)

Env vars:

OPENAI_API_KEY (required)
OPENAI_MODEL (optional, default: gpt-4o-mini)
OPENAI_EMBEDDING_MODEL (optional, default: text-embedding-3-small) Note: OPENAI_EMBEDDING_MODEL controls the embeddings model used for RAG retrieval when LLM_PROVIDER=openai (default: text-embedding-3-small).
OPENAI_BASE_URL (optional, default: https://api.openai.com/v1/)
LLM_PROVIDER (optional, default: openai)

Nebius Token Factory (optional)

Env vars:

NEBIUS_API_KEY (required)
NEBIUS_MODEL (optional, default: meta-llama/Meta-Llama-3.1-8B-Instruct-fast)
NEBIUS_BASE_URL (optional, default: https://api.tokenfactory.nebius.com/v1/)
LLM_PROVIDER=nebius

Install (local dev, no Docker)

python -m venv .venv
source .venv/bin/activate
pip install -r app/requirements.txt

Run locally (OpenAI)

export LLM_PROVIDER=openai
export OPENAI_API_KEY="YOUR_OPENAI_KEY"
# Optional:
export OPENAI_MODEL="gpt-4o-mini"
export OPENAI_EMBEDDING_MODEL="text-embedding-3-small"

uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload

Run locally (Nebius)

export LLM_PROVIDER=nebius
export NEBIUS_API_KEY="YOUR_NEBIUS_KEY"
# Optional:
export NEBIUS_MODEL="meta-llama/Meta-Llama-3.1-8B-Instruct-fast"

uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload

Test

curl -X POST http://localhost:8000/summarize   -H "Content-Type: application/json"   -d '{"github_url": "https://github.com/psf/requests"}'

Docker (optional)

# OpenAI:
export OPENAI_API_KEY="YOUR_OPENAI_KEY"
docker compose up --build

# Or build the API image directly:
docker build -f app/Dockerfile -t repo-summarizer-api:local .

Note: docker-compose includes an API healthcheck using GET /health and Streamlit waits for the API to become healthy.

Error format

On error:

{ "status": "error", "message": "..." }

Repo→LLM strategy (what we send)

Directory tree (depth-limited; ignores node_modules, dist, venv, binaries, etc.)
README + key docs
Dependency/config files
Deterministic extraction: dependencies + entrypoints + detected endpoints
RAG-selected code chunks: chunk selected important files and retrieve top relevant chunks for: what it does / how to run / endpoints / structure / deps

RAG retrieval (implementation)

To fit large repositories into the LLM context while keeping high signal, the service uses a lightweight RAG step:

select important files (README/docs/configs + entrypoints/routes)
chunk file contents with overlap
retrieve top‑K relevant chunks for fixed questions (what it does / how to run / endpoints / structure / deps)
OpenAI provider uses semantic retrieval via embeddings; Nebius falls back to keyword retrieval

The selected snippets (with evidence file names) are combined with a depth‑limited directory tree and deterministic facts before calling the LLM.

Tests (pytest)

Install (includes test deps)

pip install -r app/requirements.txt

Run

pytest -q

Create the first unit tests

This repo uses only local/mocked tests (no real calls to GitHub/OpenAI/Nebius).

Answers on submission questions

Q: Which model you chose and why? A: I chose gpt-4o-mini for Open AI and meta-llama/Meta-Llama-3.1-8B-Instruct-fast for Nebius because of the wish to keep balance between quality, speed and cost.

Q: Your approach to handling repository contents A: Exclude binaries/build artifacts/generated data; include directory tree + docs + dependency/config files; extract endpoints/entrypoints deterministically; then use RAG-selected top-K chunks (chunk important files and retrieve the most relevant snippets) to fit the LLM context window while keeping high signal.

Release images

Prebuilt container images are published to GitHub Container Registry (GHCR) for both API and UI.

You can pull them with:

docker pull ghcr.io/khab40/nebius-test-api:latest
docker pull ghcr.io/khab40/nebius-test-ui:latest

If you want docker‑compose to use the published images instead of building locally, set:

export REPO_SUMMARIZER_API_IMAGE=ghcr.io/khab40/nebius-test-api:latest
export REPO_SUMMARIZER_UI_IMAGE=ghcr.io/khab40/nebius-test-ui:latest

Then run:

docker compose up

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
.github/workflows		.github/workflows
app		app
docs		docs
scripts		scripts
tests		tests
ui		ui
.dockerignore		.dockerignore
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
nebius-test.code-workspace		nebius-test.code-workspace

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Repo Summarizer API (FastAPI)

Docker Compose architecture

API + RAG flow

Screenshots

Requirements

LLM providers

OpenAI (default)

Nebius Token Factory (optional)

Install (local dev, no Docker)

Run locally (OpenAI)

Run locally (Nebius)

Test

Docker (optional)

Error format

Repo→LLM strategy (what we send)

RAG retrieval (implementation)

Tests (pytest)

Install (includes test deps)

Run

Create the first unit tests

Answers on submission questions

Release images

About

Uh oh!

Releases 2

Packages

Uh oh!

Uh oh!

Contributors 2

Uh oh!

Languages

License

khab40/nebius-test

Folders and files

Latest commit

History

Repository files navigation

Repo Summarizer API (FastAPI)

Docker Compose architecture

API + RAG flow

Screenshots

Requirements

LLM providers

OpenAI (default)

Nebius Token Factory (optional)

Install (local dev, no Docker)

Run locally (OpenAI)

Run locally (Nebius)

Test

Docker (optional)

Error format

Repo→LLM strategy (what we send)

RAG retrieval (implementation)

Tests (pytest)

Install (includes test deps)

Run

Create the first unit tests

Answers on submission questions

Release images

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Uh oh!

Contributors 2

Uh oh!

Languages

Packages