diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 0000000..37ae7b0 --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,43 @@ +# AGENTS.md + +## Cursor Cloud specific instructions + +### Overview + +Reparsed is a single-service Python 3.12 FastAPI application that parses files and URLs into clean, structured, LLM-ready text. It requires PostgreSQL 16 for persistence and optionally connects to an external Ollama server for LLM classification. + +### Running the dev server + +1. **Start PostgreSQL** (if not already running): + ``` + sudo pg_ctlcluster 16 main start + ``` + +2. **Start FastAPI with uvicorn** from the `api/` directory: + ``` + DATABASE_URL="postgresql+asyncpg://reparsed:reparsed@localhost:5432/reparsed" \ + SESSION_SECRET="dev-secret-for-local-testing-only-32b" \ + OLLAMA_BASE_URL="http://localhost:11434" \ + PLAYWRIGHT_ENABLED=false \ + uvicorn app.main:app --host 0.0.0.0 --port 17177 --reload + ``` + + The app starts on `http://localhost:17177`. Hot reload is enabled with `--reload`. + +### Key caveats + +- **No Ollama required for dev**: The app degrades gracefully without an Ollama server — `/v1/parse` returns `content_type: "generic"` with the deterministic extraction. Set `OLLAMA_BASE_URL` to any value; it only matters if you want Stage 2 LLM classification. +- **Playwright**: Set `PLAYWRIGHT_ENABLED=false` to skip headless Chromium startup if you don't need JS-rendering fallback. Playwright browsers are installed at `~/.cache/ms-playwright/`. +- **Database auto-creates tables**: On startup, `init_db()` runs `Base.metadata.create_all` so no manual migrations are needed. The DB user/password/database are all `reparsed` by default. +- **Session cookies**: `SESSION_COOKIE_SECURE` defaults to `false`, which is correct for local HTTP dev. Setting it to `true` over HTTP silently breaks login. +- **Static files and templates**: Served from `api/static/` and `api/templates/` respectively, resolved relative to `api/app/main.py`. +- **No automated tests**: The repository currently has no test suite. Validation is done via manual API calls and browser testing. +- **No linter config**: No `.flake8`, `pyproject.toml` linter config, or pre-commit hooks are present in the repository. + +### Endpoints for quick validation + +- `GET /healthz` — returns `ok` (plain text) +- `GET /v1/health` — returns JSON with model readiness status +- `GET /v1/content-types` — lists all detected content types +- `GET /api-docs` — interactive Swagger UI +- `POST /v1/parse` — core parsing endpoint (requires API key via `Authorization: Bearer rp_live_...`)