Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 43 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# AGENTS.md

## Cursor Cloud specific instructions

### Overview

Reparsed is a single-service Python 3.12 FastAPI application that parses files and URLs into clean, structured, LLM-ready text. It requires PostgreSQL 16 for persistence and optionally connects to an external Ollama server for LLM classification.

### Running the dev server

1. **Start PostgreSQL** (if not already running):
```
sudo pg_ctlcluster 16 main start
```

2. **Start FastAPI with uvicorn** from the `api/` directory:
```
DATABASE_URL="postgresql+asyncpg://reparsed:reparsed@localhost:5432/reparsed" \
SESSION_SECRET="dev-secret-for-local-testing-only-32b" \
OLLAMA_BASE_URL="http://localhost:11434" \
PLAYWRIGHT_ENABLED=false \
uvicorn app.main:app --host 0.0.0.0 --port 17177 --reload
```
Comment on lines +12 to +23
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add Markdown fence languages and blank-line spacing around fenced blocks.

The fenced blocks on Line 12 and Line 17 violate MD031/MD040. Add surrounding blank lines and explicit language identifiers.

📄 Proposed doc fix
 1 # AGENTS.md
 2 
 3 ## Cursor Cloud specific instructions
 4 
 5 ### Overview
 6 
 7 Reparsed is a single-service Python 3.12 FastAPI application that parses files and URLs into clean, structured, LLM-ready text. It requires PostgreSQL 16 for persistence and optionally connects to an external Ollama server for LLM classification.
 8 
 9 ### Running the dev server
10 
11 1. **Start PostgreSQL** (if not already running):
-12    ```
+12
+    ```bash
13    sudo pg_ctlcluster 16 main start
14    ```
15 
16 2. **Start FastAPI with uvicorn** from the `api/` directory:
-17    ```
+17
+    ```bash
18    DATABASE_URL="postgresql+asyncpg://reparsed:reparsed@localhost:5432/reparsed" \
19    SESSION_SECRET="dev-secret-for-local-testing-only-32b" \
20    OLLAMA_BASE_URL="http://localhost:11434" \
21    PLAYWRIGHT_ENABLED=false \
22    uvicorn app.main:app --host 0.0.0.0 --port 17177 --reload
23    ```
🧰 Tools
🪛 markdownlint-cli2 (0.22.1)

[warning] 12-12: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


[warning] 12-12: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


[warning] 17-17: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


[warning] 17-17: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@AGENTS.md` around lines 12 - 23, The two fenced code blocks in AGENTS.md (the
one containing "sudo pg_ctlcluster 16 main start" and the multi-line
uvicorn/DATABASE_URL block) lack surrounding blank lines and language
identifiers; update each fenced block to have a blank line before and after and
add the appropriate language tag (e.g., ```bash) so the pg_ctlcluster
single-line block and the uvicorn environment/command block are fenced as
```bash with blank-line spacing around them.


The app starts on `http://localhost:17177`. Hot reload is enabled with `--reload`.

### Key caveats

- **No Ollama required for dev**: The app degrades gracefully without an Ollama server — `/v1/parse` returns `content_type: "generic"` with the deterministic extraction. Set `OLLAMA_BASE_URL` to any value; it only matters if you want Stage 2 LLM classification.
- **Playwright**: Set `PLAYWRIGHT_ENABLED=false` to skip headless Chromium startup if you don't need JS-rendering fallback. Playwright browsers are installed at `~/.cache/ms-playwright/`.
- **Database auto-creates tables**: On startup, `init_db()` runs `Base.metadata.create_all` so no manual migrations are needed. The DB user/password/database are all `reparsed` by default.
- **Session cookies**: `SESSION_COOKIE_SECURE` defaults to `false`, which is correct for local HTTP dev. Setting it to `true` over HTTP silently breaks login.
- **Static files and templates**: Served from `api/static/` and `api/templates/` respectively, resolved relative to `api/app/main.py`.
- **No automated tests**: The repository currently has no test suite. Validation is done via manual API calls and browser testing.
- **No linter config**: No `.flake8`, `pyproject.toml` linter config, or pre-commit hooks are present in the repository.

### Endpoints for quick validation

- `GET /healthz` — returns `ok` (plain text)
- `GET /v1/health` — returns JSON with model readiness status
- `GET /v1/content-types` — lists all detected content types
- `GET /api-docs` — interactive Swagger UI
- `POST /v1/parse` — core parsing endpoint (requires API key via `Authorization: Bearer rp_live_...`)