From a244dcaa79a18d5fbf05811a4ab39f92a9ab8734 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Yves=20Desgagn=C3=A9?= Date: Sat, 9 May 2026 15:10:41 -0400 Subject: [PATCH] Optimize review-user-guide, review-architecture, analyze-db skills --- skills/analyze-db/SKILL.md | 1550 +++++---------------------- skills/review-architecture/SKILL.md | 1504 ++++---------------------- skills/review-user-guide/SKILL.md | 1524 ++++---------------------- 3 files changed, 658 insertions(+), 3920 deletions(-) diff --git a/skills/analyze-db/SKILL.md b/skills/analyze-db/SKILL.md index dd30612..ae9673d 100644 --- a/skills/analyze-db/SKILL.md +++ b/skills/analyze-db/SKILL.md @@ -4,1352 +4,302 @@ description: Analyze, document, map, or scan the database schema. Use when the u allowed-tools: Bash(php:*), Bash(python:*), Bash(ruby:*), Bash(npm:*), Bash(npx:*), Bash(mysql:*), Bash(psql:*), Bash(sqlite3:*), Bash(mongosh:*), Bash(redis-cli:*), Bash(bq:*), Bash(curl:*), Bash(awk:*), Bash(basename:*), Bash(cat:*), Bash(cut:*), Bash(date:*), Bash(diff:*), Bash(dirname:*), Bash(echo:*), Bash(find:*), Bash(grep:*), Bash(head:*), Bash(jq:*), Bash(ls:*), Bash(mkdir:*), Bash(sed:*), Bash(sort:*), Bash(tail:*), Bash(tee:*), Bash(tr:*), Bash(uniq:*), Bash(wc:*), Bash(which:*), Bash(xargs:*), Read, Write, Glob, Grep, mcp__postgres__query, mcp__postgres__list_tables, mcp__postgres__describe_table, mcp__postgres__list_schemas, mcp__mysql__mysql_query, mcp__mongodb__find, mcp__mongodb__aggregate, mcp__mongodb__count, mcp__mongodb__list-databases, mcp__mongodb__list-collections, mcp__mongodb__collection-schema, mcp__redis__*, mcp__bigquery__* --- -## Purpose +# Analyze Database Schema -Analyze this project and generate a `docs/DB.md` file with **complete database schema documentation** for running queries. +Analyze the project and generate `docs/DB.md` with **complete database schema documentation** ready for use by the `query-db` skill. -## MCP Tools with Fallbacks - -This skill uses database MCP tools when available and falls back to CLI commands if they are unavailable or return errors. - -| Database | MCP Tools | CLI Fallback | -| --- | --- | --- | -| PostgreSQL | `mcp__postgres__list_tables`, `describe_table`, `list_schemas`, `query` | `psql` | -| MySQL | `mcp__mysql__mysql_query` | `mysql` | -| MongoDB | `mcp__mongodb__list-databases`, `list-collections`, `collection-schema`, `find` | `mongosh` | -| Redis | `mcp__redis__info`, `dbsize`, `scan_keys`, `type`, `get`, `hgetall`, `json_get` | `redis-cli` | -| SQLite | No MCP — CLI only | `sqlite3` | -| BigQuery | `mcp__bigquery__query`, `list_tables`, `get_table_schema` | `bq` | -| Elasticsearch | No MCP — CLI only | `curl` | - -**Prefer MCP tools** when available — they handle connection management and provide structured output. For schema analysis, MCP tools like `mcp__postgres__list_tables` and `mcp__postgres__describe_table` are especially useful. If MCP tools return errors, fall back to the CLI. - -**IMPORTANT: Document ALL tables/collections/indices.** Do not filter or skip any tables. Developers need full schema documentation, not just "important" tables. - -## Environment Variables - -This skill assumes database connection environment variables are already set. The following variables are used: - -### MySQL - -- `MYSQL_HOST` - Database host -- `MYSQL_PORT` - Database port -- `MYSQL_USER` - Database user -- `MYSQL_PASS` - Database password -- `MYSQL_DB` - Database name - -### PostgreSQL - -- `PGHOST` - Database host -- `PGPORT` - Database port -- `PGUSER` - Database user -- `PGPASSWORD` - Database password -- `PGDATABASE` - Database name - -### MongoDB - -- `MONGODB_URI` - Full connection URI (e.g., `mongodb://localhost:27017/dbname`) - -### Elasticsearch - -- `ES_URL` - Elasticsearch URL (e.g., `http://localhost:9200`) -- `ES_API_KEY` - Optional API key for authentication - -### Redis - -- `REDIS_URL` - Redis connection URL (e.g., `redis://localhost:6379`) - -### BigQuery - -- `BQ_PROJECT` - GCP project ID -- `BQ_DATASETS` - Comma-separated list of BigQuery datasets (e.g., `archive_2023,archive_2024,archive_2025`) - -## CLI Command Reference - -Use these exact command formats: - -### MySQL - -```bash -MYSQL_PWD="$MYSQL_PASS" mysql -h "$MYSQL_HOST" -P "$MYSQL_PORT" -u "$MYSQL_USER" "$MYSQL_DB" -e "SQL_QUERY" -``` - -### PostgreSQL - -```bash -psql -c "SQL_QUERY" -``` - -### MongoDB - -```bash -mongosh "$MONGODB_URI" --eval "JS_CODE" -``` - -### Elasticsearch - -```bash -curl -s "$ES_URL/index/_endpoint" -H "Content-Type: application/json" -d 'JSON_BODY' -``` - -### Redis - -```bash -redis-cli -u "$REDIS_URL" COMMAND -``` - -### BigQuery - -```bash -bq query --use_legacy_sql=false --format=prettyjson --project_id="$BQ_PROJECT" "STANDARD_SQL_QUERY" -``` - -```bash -bq ls --project_id="$BQ_PROJECT" "$DATASET" -``` - -```bash -bq show --schema --project_id="$BQ_PROJECT" "$DATASET.table_name" -``` - -## Steps - -### 0. Check for existing docs/DB.md - -Before starting, check if `docs/DB.md` already exists. - -**If the file exists:** - -- Read the existing file to understand what was previously documented -- **You MUST still execute ALL steps 1-9.** Do not assume the existing file is accurate. Code and schemas change. -- For each step, compare what the existing file claims vs what the fresh analysis finds -- Merge new findings with existing data: - - Preserve manually added notes or corrections - - Update row counts, enum distributions, and date ranges from fresh queries - - Add any new tables/fields found in code - - Remove tables/fields no longer present in code or database - - Flag any discrepancies between existing documentation and current state -- Update the "Last verified" timestamp - -**If the file does not exist:** - -- Proceed with fresh analysis (steps 1-9) - -### 1. Detect language and framework - -Check for these indicators: - ---- - -#### PHP - -**Symfony (Doctrine ORM):** - -- `composer.json` with `doctrine/orm` or `doctrine/doctrine-bundle` -- `src/Entity/` directory -- `config/packages/doctrine.yaml` -- `migrations/` directory - -**Laravel (Eloquent):** - -- `composer.json` with `laravel/framework` -- `app/Models/` directory -- `database/migrations/` -- `config/database.php` - -**Doctrine ODM (MongoDB):** - -- `composer.json` with `doctrine/mongodb-odm` -- `src/Document/` directory - ---- - -#### Python - -**Django:** - -- `manage.py` in root -- `settings.py` with `DATABASES` config -- `models.py` files in app directories -- `*/migrations/` directories - -**Flask/SQLAlchemy:** - -- `requirements.txt` or `pyproject.toml` with `sqlalchemy` or `flask-sqlalchemy` -- `models.py` or `models/` directory -- `alembic/` or `migrations/` for Alembic - -**FastAPI:** - -- `requirements.txt` with `fastapi` and `sqlalchemy` -- Similar structure to Flask - -**Django + MongoDB (Djongo/MongoEngine):** - -- `settings.py` with `djongo` or `mongoengine` - -**PyMongo/Motor:** - -- `requirements.txt` with `pymongo` or `motor` - ---- - -#### Ruby - -**Ruby on Rails (ActiveRecord):** - -- `Gemfile` with `rails` -- `app/models/` directory -- `db/migrate/` directory -- `db/schema.rb` or `db/structure.sql` -- `config/database.yml` - -**Mongoid (MongoDB):** - -- `Gemfile` with `mongoid` -- `config/mongoid.yml` - ---- - -#### Go - -**GORM:** - -- `go.mod` with `gorm.io/gorm` -- Struct definitions with `gorm:` tags -- `models/` or `internal/models/` directory - -**sqlx/database-sql:** - -- `go.mod` with `github.com/jmoiron/sqlx` -- SQL files or embedded queries - -**MongoDB (mongo-driver):** - -- `go.mod` with `go.mongodb.org/mongo-driver` - -**ent:** - -- `go.mod` with `entgo.io/ent` -- `ent/schema/` directory - ---- - -#### Node.js / TypeScript - -**TypeORM:** - -- `package.json` with `typeorm` -- `src/entity/` or `entities/` directory -- `ormconfig.json` or `data-source.ts` - -**Prisma:** - -- `prisma/schema.prisma` file -- `package.json` with `@prisma/client` - -**Sequelize:** - -- `package.json` with `sequelize` -- `models/` directory -- `migrations/` directory - -**Mongoose (MongoDB):** - -- `package.json` with `mongoose` -- Schema definitions with `new Schema()` - -**Drizzle:** - -- `package.json` with `drizzle-orm` -- `drizzle/` directory or schema files - -**Knex.js:** - -- `package.json` with `knex` -- `knexfile.js` or `knexfile.ts` -- `migrations/` directory - ---- - -#### Java / Kotlin - -**Spring Boot + JPA/Hibernate:** - -- `pom.xml` or `build.gradle` with `spring-boot-starter-data-jpa` -- `@Entity` annotated classes -- `application.properties` or `application.yml` with `spring.datasource` -- `src/main/java/**/entity/` or `**/model/` directories - -**Spring Data MongoDB:** - -- `pom.xml` with `spring-boot-starter-data-mongodb` -- `@Document` annotated classes - ---- - -#### .NET / C\# - -**Entity Framework Core:** - -- `*.csproj` with `Microsoft.EntityFrameworkCore` -- `DbContext` classes -- `Migrations/` directory -- `appsettings.json` with connection strings - -**MongoDB.Driver:** - -- `*.csproj` with `MongoDB.Driver` - ---- - -#### Rust - -**Diesel:** - -- `Cargo.toml` with `diesel` -- `diesel.toml` config -- `migrations/` directory -- `schema.rs` - -**SeaORM:** - -- `Cargo.toml` with `sea-orm` -- `entity/` directory - -**SQLx:** - -- `Cargo.toml` with `sqlx` -- `.sqlx/` directory or `migrations/` - ---- - -### 2. Detect database type(s) - -Based on framework detection, identify which databases are used: - -**SQL Databases (MySQL/PostgreSQL/SQLite):** - -- Check connection strings in config files -- Look for database driver dependencies -- Check environment files (`.env`, `.env.example`) - -**MongoDB:** - -- ODM dependencies (Doctrine ODM, Mongoose, MongoEngine, Mongoid, etc.) -- MongoDB connection strings -- Document/collection definitions - -**Elasticsearch:** - -- Elasticsearch client dependencies -- Index mapping configurations -- `fos_elastica.yaml`, `elasticsearch.yml`, or similar - -**Redis:** - -- Redis client dependencies -- Cache/session configuration -- Key pattern definitions in code - -**BigQuery:** - -- `BQ_PROJECT` and `BQ_DATASETS` environment variables set -- BigQuery client dependencies (e.g., `google-cloud-bigquery` in Python, `@google-cloud/bigquery` in Node.js) -- BigQuery connection configuration in code - -### 3. Extract schema information - -#### For SQL ORMs - -| Framework | Entity Location | Migration Location | Schema Command | -| --------- | --------------- | ------------------ | -------------- | -| Symfony/Doctrine | `src/Entity/` | `migrations/` | `php bin/console doctrine:mapping:info` | -| Laravel/Eloquent | `app/Models/` | `database/migrations/` | `php artisan model:show` | -| Django | `*/models.py` | `*/migrations/` | `python manage.py inspectdb` | -| Rails/ActiveRecord | `app/models/` | `db/migrate/` | Read `db/schema.rb` | -| TypeORM | `src/entity/` | `migrations/` | Check entity decorators | -| Prisma | `prisma/schema.prisma` | Prisma migrations | Read schema.prisma directly | -| Spring JPA | `**/entity/` | Flyway/Liquibase | Check `@Entity` classes | -| EF Core | `Models/` or `Entities/` | `Migrations/` | Check DbContext | -| GORM | `models/` | Migration files | Check struct tags | -| Diesel | `src/models.rs` | `migrations/` | Read `schema.rs` | - -Look for: - -- Column definitions and types -- Primary keys and indexes -- Foreign key relationships -- Unique constraints - -**IMPORTANT:** ORM entities/models may not cover all tables. Join tables, framework-generated tables (sessions, migrations, jobs, cache), and raw SQL tables may not have model classes. You MUST also enumerate all tables directly from the schema file (e.g., `db/schema.rb`, `schema.prisma`) or the live database in Step 7, then cross-reference to ensure no table is missing from the documentation. - -#### For MongoDB - -| Framework | Document Location | Schema Definition | -| --------- | ----------------- | ----------------- | -| Doctrine ODM | `src/Document/` | `@ODM\` annotations | -| Mongoose | `models/` | `new Schema({...})` | -| MongoEngine | `models.py` | `Document` class fields | -| Mongoid | `app/models/` | `field :name, type:` | -| Spring Data MongoDB | `**/document/` | `@Document` annotation | - -Look for: - -- Field definitions and types -- References and embedded documents -- Indexes - -#### For Elasticsearch - -- Index mapping definitions -- Field types and analyzers -- Nested object structures - -#### For Redis - -- Key naming patterns in code -- Data structure usage (String, Hash, Set, ZSet, List, HyperLogLog) -- TTL patterns - -### 4. Extract business logic context - -Find across all frameworks: - -- Constants and enums (status codes, types) -- Repository/DAO methods (common query patterns) -- Validation rules -- Comments and docstrings explaining field meanings -- Soft delete patterns (`deleted_at`, `is_deleted`) -- Multi-tenancy patterns (`tenant_id`, `organization_id`) -- **BI dashboards, report generators, and analytics endpoints** — capture common business questions (e.g., "how many buyers this month?", "revenue by country?") and record which tables, joins, and filters are used. These become the "Common Business Questions" section in the output. - -### 5. Generate docs/DB.md (Initial Draft) - -Create the directory if needed: - -```bash -mkdir -p docs -``` - -Write an initial `docs/DB.md` with the appropriate template based on detected database type(s). - -### 6. Check database connectivity - -Before connecting to the database, verify the required environment variables are set and the CLI tool is available. - -**How to check:** Run a simple connectivity test using the CLI tool. If it fails, output the appropriate setup instructions below and ask the user to configure it. - ---- - -#### MySQL CLI Test - -```bash -MYSQL_PWD="$MYSQL_PASS" mysql -h "$MYSQL_HOST" -P "$MYSQL_PORT" -u "$MYSQL_USER" "$MYSQL_DB" -e "SELECT 1" -``` - -**Required environment variables:** - -- `MYSQL_HOST` - Database host -- `MYSQL_PORT` - Database port -- `MYSQL_USER` - Database user -- `MYSQL_PASS` - Database password -- `MYSQL_DB` - Database name - ---- - -#### PostgreSQL CLI Test - -```bash -psql -c "SELECT 1" -``` - -**Required environment variables:** - -- `PGHOST` - Database host -- `PGPORT` - Database port -- `PGUSER` - Database user -- `PGPASSWORD` - Database password -- `PGDATABASE` - Database name - ---- - -#### MongoDB CLI Test - -```bash -mongosh "$MONGODB_URI" --eval "db.runCommand({ping: 1})" -``` - -**Required environment variables:** - -- `MONGODB_URI` - Full connection URI - ---- - -#### Elasticsearch CLI Test - -```bash -curl -s "$ES_URL/_cluster/health" -# Or with API key: -curl -s -H "Authorization: ApiKey $ES_API_KEY" "${ES_URL}/_cluster/health" -``` - -**Required environment variables:** - -- `ES_URL` - Elasticsearch URL -- `ES_API_KEY` - Optional API key - ---- - -#### Redis CLI Test - -```bash -redis-cli -u "$REDIS_URL" PING -``` - -**Required environment variables:** - -- `REDIS_URL` - Redis connection URL - ---- - -#### BigQuery CLI Test - -```bash -bq query --use_legacy_sql=false --project_id="$BQ_PROJECT" "SELECT 1" -``` - -**Required environment variables:** - -- `BQ_PROJECT` - GCP project ID -- `BQ_DATASETS` - Comma-separated list of datasets - -**If this fails:** Tell the user to run `gcloud auth application-default login` and `gcloud auth application-default set-quota-project $BQ_PROJECT`. - ---- - -**After outputting instructions:** Ask the user to confirm when they have set the environment variables. Wait for their confirmation before proceeding to step 7. - -**If the user declines or cannot provide database credentials:** Skip steps 7 and 8. Proceed directly to step 9 using only the code-based analysis from steps 3-4. The verification status in docs/DB.md MUST reflect this (see verification timestamp formats below). - ---- - -### 7. Connect to database and verify schema - -Connect via CLI to gather live data and verify the schema analysis. - -**CRITICAL: Enumerate ALL tables/collections/indices first.** Before doing anything else in this step, list every table (or collection/index) in the database. Compare this list against what you documented from code in Steps 3-4. Any table present in the database but missing from your documentation MUST be added. Do NOT skip join tables, migration tracking tables, session tables, queue tables, or any other table — every single table must appear in the final documentation. - -**Performance safeguards for large tables:** - -- Always use **estimated counts** from system tables, never `COUNT(*)` on large tables -- Use **LIMIT** on all queries -- For enum sampling, query a **small sample** or use indexed columns only -- Consider running against a **read replica** if available -- If a table has >10M rows, note it as "large table" and be extra cautious - -#### For MySQL - -**List ALL tables and row counts (uses estimates, instant). Every table returned here MUST appear in docs/DB.md:** - -```bash -MYSQL_PWD="$MYSQL_PASS" mysql -h "$MYSQL_HOST" -P "$MYSQL_PORT" -u "$MYSQL_USER" "$MYSQL_DB" -e " -SELECT table_name, table_rows -FROM information_schema.tables -WHERE table_schema = DATABASE() -ORDER BY table_rows DESC;" -``` - -**Check indexes:** - -```bash -MYSQL_PWD="$MYSQL_PASS" mysql -h "$MYSQL_HOST" -P "$MYSQL_PORT" -u "$MYSQL_USER" "$MYSQL_DB" -e "SHOW INDEX FROM table_name;" -``` - -**Get date ranges for time-series tables:** - -```bash -MYSQL_PWD="$MYSQL_PASS" mysql -h "$MYSQL_HOST" -P "$MYSQL_PORT" -u "$MYSQL_USER" "$MYSQL_DB" -e " -SELECT MIN(created_at) as earliest, MAX(created_at) as latest FROM orders;" -``` - -#### For PostgreSQL - -**List ALL tables and row counts (uses estimates, instant). Every table returned here MUST appear in docs/DB.md:** - -```bash -psql -c "SELECT schemaname, relname, n_live_tup FROM pg_stat_user_tables ORDER BY n_live_tup DESC;" -``` - -**Check indexes:** - -```bash -psql -c "SELECT indexname, indexdef FROM pg_indexes WHERE tablename = 'table_name';" -``` - -**Get date ranges:** - -```bash -psql -c "SELECT MIN(created_at) as earliest, MAX(created_at) as latest FROM orders;" -``` - -#### For MongoDB - -**List ALL collections and document counts. Every collection returned here MUST appear in docs/DB.md:** - -```bash -mongosh "$MONGODB_URI" --eval "db.getCollectionNames().forEach(c => print(c + ': ' + db[c].estimatedDocumentCount()))" -``` - -**List indexes:** - -```bash -mongosh "$MONGODB_URI" --eval "db.collection.getIndexes()" -``` - -**Get date ranges:** - -```bash -mongosh "$MONGODB_URI" --eval "db.orders.aggregate([ - { \$group: { _id: null, earliest: { \$min: '\$createdAt' }, latest: { \$max: '\$createdAt' } } } -])" -``` - -**Sample document structure:** - -```bash -mongosh "$MONGODB_URI" --eval "db.collection.findOne()" -``` - -#### For Elasticsearch - -**List ALL indices and document counts. Every index returned here MUST appear in docs/DB.md:** - -```bash -curl -s "$ES_URL/_cat/indices?v&h=index,docs.count,store.size" -``` - -**Get mapping:** - -```bash -curl -s "$ES_URL/index_name/_mapping" | jq -``` - -#### For Redis - -**Get database size:** - -```bash -redis-cli -u "$REDIS_URL" DBSIZE -``` +**Document EVERY table/collection/index without exception** — including join tables, migration trackers, session tables, queue tables, cache tables, framework-internal tables. Developers need full schema docs, not just "important" ones. -**Sample key patterns:** - -```bash -redis-cli -u "$REDIS_URL" SCAN 0 MATCH "user:*" COUNT 10 -``` - -**Check TTLs:** - -```bash -redis-cli -u "$REDIS_URL" TTL key_name -``` - -#### For BigQuery - -**List ALL datasets (from BQ_DATASETS env var). Every dataset listed MUST be documented:** - -```bash -for ds in $(echo "$BQ_DATASETS" | tr ',' ' '); do - echo "=== Dataset: $ds ===" - bq ls --project_id="$BQ_PROJECT" "$ds" -done -``` - -**Get table schema:** - -```bash -bq show --schema --format=prettyjson --project_id="$BQ_PROJECT" "$DATASET.table_name" -``` - -**Get table info (row count, size, partitioning):** - -```bash -bq show --project_id="$BQ_PROJECT" "$DATASET.table_name" -``` - -**Get date ranges:** - -```bash -bq query --use_legacy_sql=false --project_id="$BQ_PROJECT" " -SELECT MIN(created_at) as earliest, MAX(created_at) as latest -FROM \`$BQ_PROJECT.$DATASET.orders\`" -``` - -### 8. Sample enum/status field values - -For each enum or status field identified, query the actual values and their distribution. - -**Use safe sampling for large tables (>1M rows):** - -#### MySQL - -**For small tables (<1M rows) - full count is OK:** - -```bash -MYSQL_PWD="$MYSQL_PASS" mysql -h "$MYSQL_HOST" -P "$MYSQL_PORT" -u "$MYSQL_USER" "$MYSQL_DB" -e " -SELECT status, COUNT(*) as count FROM orders GROUP BY status ORDER BY count DESC;" -``` - -**For large tables (>1M rows) - use sampling:** - -```bash -MYSQL_PWD="$MYSQL_PASS" mysql -h "$MYSQL_HOST" -P "$MYSQL_PORT" -u "$MYSQL_USER" "$MYSQL_DB" -e " -SELECT status, COUNT(*) as count FROM orders -WHERE created_at >= NOW() - INTERVAL 30 DAY -GROUP BY status ORDER BY count DESC;" -``` - -**For very large tables - just get distinct values:** - -```bash -MYSQL_PWD="$MYSQL_PASS" mysql -h "$MYSQL_HOST" -P "$MYSQL_PORT" -u "$MYSQL_USER" "$MYSQL_DB" -e " -SELECT DISTINCT status FROM orders LIMIT 20;" -``` - -#### PostgreSQL - -**For small tables:** +## MCP Tools with Fallbacks -```bash -psql -c "SELECT status, COUNT(*) as count FROM orders GROUP BY status ORDER BY count DESC;" -``` +Prefer MCP tools when available — they handle connection management. Fall back to CLI on errors. -**For large tables - use sampling:** +| Database | MCP Tools | CLI Fallback | +| --- | --- | --- | +| PostgreSQL | `mcp__postgres__list_tables`, `describe_table`, `list_schemas`, `query` | `psql` | +| MySQL | `mcp__mysql__mysql_query` | `mysql` | +| MongoDB | `mcp__mongodb__list-databases`, `list-collections`, `collection-schema`, `find` | `mongosh` | +| Redis | `mcp__redis__*` | `redis-cli` | +| SQLite | (no MCP) | `sqlite3` | +| BigQuery | `mcp__bigquery__*` | `bq` | +| Elasticsearch | (no MCP) | `curl` | + +## Connection Environment Variables + +| Database | Variables | +| -------- | --------- | +| MySQL | `MYSQL_HOST`, `MYSQL_PORT`, `MYSQL_USER`, `MYSQL_PASS`, `MYSQL_DB` | +| PostgreSQL | `PGHOST`, `PGPORT`, `PGUSER`, `PGPASSWORD`, `PGDATABASE` | +| MongoDB | `MONGODB_URI` | +| Elasticsearch | `ES_URL`, `ES_API_KEY` (optional) | +| Redis | `REDIS_URL` | +| BigQuery | `BQ_PROJECT`, `BQ_DATASETS` (comma-separated list, e.g. `archive_2023,archive_2024,archive_2025`) | -```bash -psql -c "SELECT status, COUNT(*) as count FROM orders -WHERE created_at >= NOW() - INTERVAL '30 days' -GROUP BY status ORDER BY count DESC;" -``` +## CLI Command Reference -#### MongoDB +| Database | Connect / Query | List schema | +| -------- | --------------- | ----------- | +| MySQL | `MYSQL_PWD="$MYSQL_PASS" mysql -h "$MYSQL_HOST" -P "$MYSQL_PORT" -u "$MYSQL_USER" "$MYSQL_DB" -e ""` | `SELECT table_name, table_rows FROM information_schema.tables WHERE table_schema = DATABASE() ORDER BY table_rows DESC;` (estimates, instant) | +| PostgreSQL | `psql -c ""` | `SELECT schemaname, relname, n_live_tup FROM pg_stat_user_tables ORDER BY n_live_tup DESC;` | +| SQLite | `sqlite3 ""` | `SELECT name FROM sqlite_master WHERE type='table';` | +| MongoDB | `mongosh "$MONGODB_URI" --eval ""` | `db.getCollectionNames().forEach(c => print(c + ': ' + db[c].estimatedDocumentCount()))` | +| Elasticsearch | `curl -s "$ES_URL/"` (add `-H "Authorization: ApiKey $ES_API_KEY"` if set) | `curl -s "$ES_URL/_cat/indices?v&h=index,docs.count,store.size"` | +| Redis | `redis-cli -u "$REDIS_URL" ` | `DBSIZE`, `SCAN 0 MATCH COUNT 100` | +| BigQuery | `bq query --use_legacy_sql=false --format=prettyjson --project_id="$BQ_PROJECT" ""` | `bq ls --project_id="$BQ_PROJECT" "$DATASET"`; `bq show --schema --format=prettyjson --project_id="$BQ_PROJECT" "$DATASET."` | -**For small collections:** +## Steps -```bash -mongosh "$MONGODB_URI" --eval "db.orders.aggregate([ - { \$group: { _id: '\$status', count: { \$sum: 1 } } }, - { \$sort: { count: -1 } } -])" -``` +### Step 0 — Check for existing `docs/DB.md` + +If the file exists, read it but **still execute every step**. Code and schemas drift. After fresh analysis, merge findings: + +- Preserve manual notes/corrections. +- Update row counts, enum distributions, date ranges from fresh queries. +- Add new tables/fields found in code; remove tables/fields no longer present. +- Flag discrepancies; update the "Last verified" timestamp. + +### Step 1 — Detect language and framework + +Match these signals (run each only as needed): + +| Language | Framework | Detection signals | +| -------- | --------- | ----------------- | +| PHP | Symfony / Doctrine ORM | `composer.json` has `doctrine/orm` or `doctrine/doctrine-bundle`; `src/Entity/`; `config/packages/doctrine.yaml`; `migrations/` | +| PHP | Laravel / Eloquent | `composer.json` has `laravel/framework`; `app/Models/`; `database/migrations/`; `config/database.php` | +| PHP | Doctrine ODM (MongoDB) | `composer.json` has `doctrine/mongodb-odm`; `src/Document/` | +| Python | Django | `manage.py`; `settings.py` with `DATABASES`; `models.py` in apps; `*/migrations/` | +| Python | Flask / FastAPI + SQLAlchemy | `requirements.txt`/`pyproject.toml` has `sqlalchemy` or `flask-sqlalchemy`; `models.py` or `models/`; `alembic/` | +| Python | Django + MongoDB | `settings.py` has `djongo` or `mongoengine` | +| Python | PyMongo / Motor | `requirements.txt` has `pymongo` or `motor` | +| Ruby | Rails / ActiveRecord | `Gemfile` has `rails`; `app/models/`; `db/migrate/`; `db/schema.rb` or `db/structure.sql`; `config/database.yml` | +| Ruby | Mongoid | `Gemfile` has `mongoid`; `config/mongoid.yml` | +| Go | GORM | `go.mod` has `gorm.io/gorm`; structs with `gorm:` tags; `models/` or `internal/models/` | +| Go | sqlx | `go.mod` has `github.com/jmoiron/sqlx` | +| Go | mongo-driver | `go.mod` has `go.mongodb.org/mongo-driver` | +| Go | ent | `go.mod` has `entgo.io/ent`; `ent/schema/` | +| Node / TS | TypeORM | `package.json` has `typeorm`; `src/entity/` or `entities/`; `ormconfig.json` or `data-source.ts` | +| Node / TS | Prisma | `prisma/schema.prisma`; `package.json` has `@prisma/client` | +| Node / TS | Sequelize | `package.json` has `sequelize`; `models/`; `migrations/` | +| Node / TS | Mongoose | `package.json` has `mongoose`; `new Schema(...)` patterns | +| Node / TS | Drizzle | `package.json` has `drizzle-orm`; `drizzle/` | +| Node / TS | Knex | `package.json` has `knex`; `knexfile.js`/`knexfile.ts`; `migrations/` | +| Java / Kotlin | Spring Boot + JPA/Hibernate | `pom.xml`/`build.gradle` has `spring-boot-starter-data-jpa`; `@Entity` classes; `application.properties`/`application.yml` with `spring.datasource`; `**/entity/` or `**/model/` | +| Java / Kotlin | Spring Data MongoDB | `spring-boot-starter-data-mongodb`; `@Document` classes | +| .NET / C# | EF Core | `*.csproj` has `Microsoft.EntityFrameworkCore`; `DbContext` classes; `Migrations/`; `appsettings.json` with connection strings | +| .NET / C# | MongoDB.Driver | `*.csproj` has `MongoDB.Driver` | +| Rust | Diesel | `Cargo.toml` has `diesel`; `diesel.toml`; `migrations/`; `schema.rs` | +| Rust | SeaORM | `Cargo.toml` has `sea-orm`; `entity/` | +| Rust | SQLx | `Cargo.toml` has `sqlx`; `.sqlx/` or `migrations/` | + +### Step 2 — Detect database type(s) + +Identify each DB used by inspecting: + +- **SQL (MySQL/PostgreSQL/SQLite)** — connection strings in config/`.env`/`.env.example`; SQL driver dependencies. +- **MongoDB** — ODM dependencies (Mongoose, Doctrine ODM, MongoEngine, Mongoid), connection strings, document/collection definitions. +- **Elasticsearch** — Elasticsearch client deps; index mappings; `fos_elastica.yaml`/`elasticsearch.yml`. +- **Redis** — Redis client deps; cache/session config; key-pattern definitions. +- **BigQuery** — `BQ_PROJECT`/`BQ_DATASETS` set; `google-cloud-bigquery` (Python) or `@google-cloud/bigquery` (Node) deps. + +### Step 3 — Extract schema from code + +| Framework | Entity location | Migration location | Schema source | +| --------- | --------------- | ------------------ | ------------- | +| Symfony / Doctrine | `src/Entity/` | `migrations/` | `php bin/console doctrine:mapping:info` | +| Laravel / Eloquent | `app/Models/` | `database/migrations/` | `php artisan model:show` | +| Django | `*/models.py` | `*/migrations/` | `python manage.py inspectdb` | +| Rails / ActiveRecord | `app/models/` | `db/migrate/` | `db/schema.rb` | +| TypeORM | `src/entity/` | `migrations/` | entity decorators | +| Prisma | `prisma/schema.prisma` | (Prisma migrations) | `schema.prisma` | +| Spring JPA | `**/entity/` | Flyway / Liquibase | `@Entity` classes | +| EF Core | `Models/` or `Entities/` | `Migrations/` | `DbContext` | +| GORM | `models/` | migration files | struct tags | +| Diesel | `src/models.rs` | `migrations/` | `schema.rs` | -**For large collections - use sampling:** +For SQL: extract column types, primary keys, indexes, foreign keys, unique constraints. -```bash -mongosh "$MONGODB_URI" --eval "db.orders.aggregate([ - { \$sample: { size: 10000 } }, - { \$group: { _id: '\$status', count: { \$sum: 1 } } }, - { \$sort: { count: -1 } } -])" -``` +For **MongoDB ODMs** — Doctrine ODM (`@ODM\` annotations in `src/Document/`); Mongoose (`new Schema({...})` in `models/`); MongoEngine (`Document` subclass in `models.py`); Mongoid (`field :name, type:` in `app/models/`); Spring Data MongoDB (`@Document` in `**/document/`). Extract field types, references, embedded documents, indexes. -#### Elasticsearch +For **Elasticsearch**: index mappings, field types/analyzers, nested object structures. -Elasticsearch aggregations are generally safe - they use approximate counts: +For **Redis**: key naming patterns in code, data structures used (String/Hash/Set/ZSet/List/HyperLogLog), TTL patterns. -```bash -curl -s "$ES_URL/orders/_search" -H "Content-Type: application/json" -d '{ - "size": 0, - "aggs": { - "status_values": { - "terms": { "field": "status.keyword", "size": 20 } - } - } -}' -``` +**IMPORTANT — code is not exhaustive.** ORM entities don't cover join tables, framework tables (sessions, migrations, jobs, cache), or raw-SQL tables. Always reconcile against the live database in Step 7. -#### BigQuery +### Step 4 — Extract business-logic context -**For enum sampling (always use LIMIT or APPROX functions to control cost):** +Look for: -```bash -bq query --use_legacy_sql=false --project_id="$BQ_PROJECT" " -SELECT status, COUNT(*) as count -FROM \`$BQ_PROJECT.$DATASET.orders\` -GROUP BY status -ORDER BY count DESC -LIMIT 20;" -``` +- Constants and enums (status codes, types). +- Repository/DAO methods (common query patterns). +- Validation rules. +- Comments/docstrings explaining field meanings. +- Soft-delete patterns (`deleted_at`, `is_deleted`). +- Multi-tenancy patterns (`tenant_id`, `organization_id`). +- **BI dashboards, report generators, analytics endpoints** — capture common business questions and the tables/joins/filters used. These become the "Common Business Questions" section. -**For very large tables — use APPROX_COUNT_DISTINCT or sample:** +### Step 5 — Generate initial `docs/DB.md` draft ```bash -bq query --use_legacy_sql=false --project_id="$BQ_PROJECT" " -SELECT status, APPROX_COUNT_DISTINCT(id) as approx_count -FROM \`$BQ_PROJECT.$DATASET.orders\` -GROUP BY status -ORDER BY approx_count DESC;" -``` - -### 9. Update docs/DB.md with verified data - -Update the `docs/DB.md` file with the live data gathered. - -**Completeness check:** Before writing, verify that every table/collection/index returned by the database in Step 7 has a row in the "All Tables" (or "All Collections" / "All Indices") section. If any are missing, add them now. There must be a 1:1 correspondence between database objects and documented rows. - -**Large Table Warnings:** For tables with >1M rows, add a row to the "Large Table Warnings" section. For tables with >10M rows, mark them as "VERY LARGE — always filter by date/indexed column" and list the specific indexed columns to filter on. - -**Common Business Questions:** Scan the codebase for BI dashboards, report generators, analytics endpoints, and recurring query patterns. Document these as common business questions with the correct tables, joins, and filters. This helps future query-db users avoid common mistakes. - -**Add row/document counts** to table/collection listings: - -| Table | Purpose | Rows | Key Fields | -| ------ | --------------- | ----- | ------------------ | -| orders | Customer orders | ~1.2M | status, created_at | - -**Replace enum guesses with actual values and counts:** - -| Table.Field | Value | Meaning | Count | -| ------------- | ----- | --------- | ------- | -| orders.status | 1 | Completed | 850,000 | -| orders.status | 0 | Pending | 120,000 | - -**Document actual indexes:** - -| Table | Index | Columns | Notes | -| ------ | ------------------ | ---------- | -------------------------- | -| orders | idx_orders_created | created_at | Use for date range queries | - -**Add date ranges:** - -| Table.Field | Range | -| ----------------- | --------------------- | -| orders.created_at | 2019-01-15 to present | - -**Add verification timestamp** at the top of the file using the appropriate format: - -If database connection was available (steps 7-8 completed): - -```markdown -# Database Schema Documentation - -> **Last verified**: YYYY-MM-DD — verified against live database -``` - -If NO database connection was available (steps 7-8 skipped): - -```markdown -# Database Schema Documentation - -> **Last verified**: YYYY-MM-DD — derived from code analysis only (not verified against live database) -``` - ---- - -## Template for SQL Databases (MySQL/PostgreSQL) - -```markdown -# Database Schema Documentation - -> **Last verified**: YYYY-MM-DD — verified against live database / derived from code analysis only (not verified against live database) - -## Database Type - -MySQL / PostgreSQL (select one) - -## CLI Command - - -- MySQL: `MYSQL_PWD="$MYSQL_PASS" mysql -h "$MYSQL_HOST" -P "$MYSQL_PORT" -u "$MYSQL_USER" "$MYSQL_DB"` -- PostgreSQL: `psql` - -## Framework - -[Detected framework, e.g., "Symfony/Doctrine", "Django", "Rails/ActiveRecord"] - -## Database Overview - -Brief description of what data this system holds. - -## All Tables - - - -| Table | Purpose | Key Fields for Filtering/Grouping | -|-------|---------|-----------------------------------| -| (one row per table — list ALL of them) | | | - -## Field Mappings & Enums - -| Table.Field | Value | Meaning | -|-------------|-------|---------| -| order.status | 0 | Pending | -| ... | ... | ... | - -## Relationships - -- `order.user_id → user.id` -- `order_item.order_id → order.id` - -## Date/Time Fields - -| Table.Field | Purpose | Notes | -|-------------|---------|-------| -| order.created_at | Order creation | Use for daily/monthly reports | - -## Money/Numeric Fields - -| Table.Field | Unit | Notes | -|-------------|------|-------| -| order.total | cents | Divide by 100 for display | - -## Soft Deletes - -Tables using soft delete pattern: - -- `users.deleted_at` -- `orders.deleted_at` - -**Important**: Add `WHERE deleted_at IS NULL` to exclude soft-deleted records. - -## Multi-Tenancy - -If applicable, note tenant isolation: - -- Filter by `organization_id` or `tenant_id` - -## Framework / Infrastructure Tables - -Tables managed by the framework (not domain models). Still included for completeness: - -- Migration tracking: `...` -- Sessions: `...` -- Job queues: `...` -- Cache: `...` - -## Large Table Warnings - - - -| Table | Rows | Required Safeguards | -|-------|------|---------------------| -| (list tables with >1M rows — add specific safeguards for each) | | | - -## Query Anti-Patterns - -Common mistakes that cause slow or incorrect queries: - -| # | Anti-Pattern | Why It's Bad | Do Instead | -|---|-------------|--------------|------------| -| 1 | `SELECT * FROM large_table` without WHERE | Full table scan on millions of rows | Always filter by indexed column or date range | -| 2 | `COUNT(*)` on large tables without date filter | Scans entire table; can take minutes | Add `WHERE created_at >= DATE_SUB(NOW(), INTERVAL 30 DAY)` | -| 3 | Unfiltered JOIN between two large tables | Creates cartesian-like explosion | Add date range filters on both sides of the JOIN | -| 4 | `GROUP BY` on non-indexed columns of large tables | Full scan + temp table sort | Use indexed columns or filter to reduce dataset first | -| 5 | Using application tables instead of BI/analytics tables | Soft deletes cause undercounting; slower queries | Check if a denormalized analytics table exists | - -## Common Business Questions - - - -| # | Question | Tables Involved | Key Filters | -|---|----------|----------------|-------------| -| (document common questions found in analytics code, dashboards, or report generators) | | | | - -## Common Query Patterns - -### Daily Order Summary - -~~~sql -SELECT DATE(created_at) as day, COUNT(*) as orders, SUM(total)/100 as revenue -FROM orders -WHERE created_at >= DATE_SUB(NOW(), INTERVAL 30 DAY) - AND deleted_at IS NULL -GROUP BY DATE(created_at); -~~~ -``` - ---- - -## Template for MongoDB - -```markdown -# Database Schema Documentation - -> **Last verified**: YYYY-MM-DD — verified against live database / derived from code analysis only (not verified against live database) - -## Database Type - -MongoDB - -## CLI Command - - -`mongosh "$MONGODB_URI"` - -## Framework - -[Detected framework, e.g., "Mongoose", "Doctrine ODM", "MongoEngine"] - -## Database Overview - -Brief description of what data this system holds. - -## All Collections - - - -| Collection | Purpose | Key Fields for Filtering/Grouping | -|------------|---------|-----------------------------------| -| (one row per collection — list ALL of them) | | | - -## Field Mappings & Enums - -| Collection.Field | Value | Meaning | -|------------------|-------|---------| -| orders.status | "pending" | Awaiting processing | -| ... | ... | ... | - -## References (Relationships) - -- `orders.customerId → customers._id` -- `orderItems.orderId → orders._id` - -## Embedded Documents - -| Collection | Embedded Field | Structure | -|------------|----------------|-----------| -| orders | items | Array of {productId, quantity, price} | - -## Date Fields - -| Collection.Field | Purpose | -|------------------|---------| -| orders.createdAt | Order creation timestamp | - -## Indexes - -List important indexes for query optimization. - -## Query Anti-Patterns - -| # | Anti-Pattern | Why It's Bad | Do Instead | -|---|-------------|--------------|------------| -| 1 | `db.collection.find({})` without limit | Returns all documents; can exhaust memory | Always add `.limit()` or use `$match` in aggregation | -| 2 | `$lookup` between two large collections | Effectively an unindexed nested loop join | Filter both collections first with `$match`, ensure foreign field is indexed | -| 3 | Large `allowDiskUse` aggregations without `$match` | Scans entire collection to disk | Add `$match` as first pipeline stage | - -## Common Aggregation Patterns - -### Daily Revenue - -~~~javascript -db.orders.aggregate([ - { $match: { createdAt: { $gte: ISODate("2024-01-01") } } }, - { $group: { _id: { $dateToString: { format: "%Y-%m-%d", date: "$createdAt" } }, - total: { $sum: "$total" }, count: { $sum: 1 } } }, - { $sort: { _id: -1 } } -]) -~~~ -``` - ---- - -## Template for Elasticsearch - -```markdown -# Database Schema Documentation - -> **Last verified**: YYYY-MM-DD — verified against live database / derived from code analysis only (not verified against live database) - -## Database Type - -Elasticsearch - -## CLI Command - - -`curl -s "$ES_URL"` - -## Framework - -[Detected framework, e.g., "FOSElastica", "elasticsearch-py", "elastic4s"] - -## Index Overview - -Brief description of what data is indexed. - -## All Indices - -| Index | Purpose | Key Fields | -|-------|---------|------------| -| products | Product catalog | name, category, price, stock | -| ... | ... | ... | - -## Field Mappings - -| Index.Field | Type | Notes | -|-------------|------|-------| -| products.price | scaled_float | Factor 100 (cents) | -| products.name | text + keyword | Use .keyword for aggregations | - -## Date Fields - -| Index.Field | Format | -|-------------|--------| -| orders.timestamp | epoch_millis | - -## Nested Objects - -| Index | Nested Field | Structure | -|-------|--------------|-----------| -| orders | items | Array of order line items | - -## Query Anti-Patterns - -| # | Anti-Pattern | Why It's Bad | Do Instead | -|---|-------------|--------------|------------| -| 1 | Large `size` value (>10000) | Heap pressure, slow response | Use `scroll` or `search_after` for pagination | -| 2 | Deep pagination with `from` + `size` | ES limits `from + size` to 10000 by default | Use `search_after` for deep pagination | -| 3 | `match_all` without `size: 0` on large indices | Returns all documents | Use `size: 0` for aggregation-only queries | - -## Common Query Patterns - -### Category Aggregation - -~~~json -{ - "size": 0, - "aggs": { - "by_category": { - "terms": { "field": "category.keyword" }, - "aggs": { - "avg_price": { "avg": { "field": "price" } } - } - } - } -} -~~~ -``` - ---- - -## Template for Redis - -```markdown -# Database Schema Documentation - -> **Last verified**: YYYY-MM-DD — verified against live database / derived from code analysis only (not verified against live database) - -## Database Type - -Redis - -## CLI Command - - -`redis-cli -u "$REDIS_URL"` - -## Framework - -[Detected framework, e.g., "ioredis", "redis-py", "Predis"] - -## Data Overview - -Brief description of what data is stored. - -## Key Patterns - -| Pattern | Type | Purpose | -|---------|------|---------| -| `user:{id}` | Hash | User profile data | -| `user:{id}:sessions` | Set | Active session IDs | -| `orders:daily:{date}` | Sorted Set | Orders by timestamp | -| `cache:product:{id}` | String (JSON) | Product cache | -| `stats:pageviews` | HyperLogLog | Unique visitor count | - -## Data Structures - -### user:{id} (Hash) - -| Field | Description | -|-------|-------------| -| email | User email | -| name | Display name | -| created_at | Unix timestamp | - -### orders:daily:{date} (Sorted Set) - -- Score: Unix timestamp -- Member: Order ID - -## TTL Patterns - -| Pattern | TTL | Notes | -|---------|-----|-------| -| `cache:*` | 3600 | 1 hour cache | -| `session:*` | 86400 | 24 hour sessions | - -## Query Anti-Patterns - -| # | Anti-Pattern | Why It's Bad | Do Instead | -|---|-------------|--------------|------------| -| 1 | `KEYS *` in production | Blocks Redis (single-threaded) for seconds on large databases | Use `SCAN 0 MATCH pattern COUNT 100` for iteration | -| 2 | `FLUSHDB` / `FLUSHALL` without confirmation | Deletes all data instantly | Use targeted `DEL` or `UNLINK` for specific keys | - -## Common Query Patterns - -### Get user with recent orders - -~~~redis -HGETALL user:123 -ZREVRANGE orders:user:123 0 9 WITHSCORES -~~~ - -### Daily active users - -~~~redis -PFCOUNT stats:dau:2024-01-15 -~~~ +mkdir -p docs ``` ---- +Write the initial draft using the per-DB template (see "Document Templates" below). -## Template for BigQuery +### Step 6 — Verify connectivity -```markdown -# Database Schema Documentation +Test connectivity using the simplest CLI ping per DB: -> **Last verified**: YYYY-MM-DD — verified against live BigQuery / derived from code analysis only (not verified against live database) +| Database | Test command | +| -------- | ------------ | +| MySQL | `MYSQL_PWD="$MYSQL_PASS" mysql -h "$MYSQL_HOST" -P "$MYSQL_PORT" -u "$MYSQL_USER" "$MYSQL_DB" -e "SELECT 1"` | +| PostgreSQL | `psql -c "SELECT 1"` | +| MongoDB | `mongosh "$MONGODB_URI" --eval "db.runCommand({ping: 1})"` | +| Elasticsearch | `curl -s "$ES_URL/_cluster/health"` | +| Redis | `redis-cli -u "$REDIS_URL" PING` | +| BigQuery | `bq query --use_legacy_sql=false --project_id="$BQ_PROJECT" "SELECT 1"` (if it fails, ask the user to run `gcloud auth application-default login` and `gcloud auth application-default set-quota-project $BQ_PROJECT`) | -## Database Type +If a test fails, output the missing env var(s) and ask the user to set them. Wait for confirmation. -BigQuery +If the user declines or can't provide credentials, **skip Steps 7-8** and proceed to Step 9 using code-based analysis only. The `Last verified` line in `DB.md` MUST reflect this (see Step 9 timestamp formats). -## CLI Command +### Step 7 — Connect and verify the schema - -`bq query --use_legacy_sql=false --format=prettyjson --project_id="$BQ_PROJECT"` +**CRITICAL — enumerate ALL objects first.** List every table / collection / index in the live database before anything else. Compare against what you documented from code in Steps 3-4. Add anything missing. -## Datasets +**Performance safeguards for large tables:** -| Dataset | Period | Description | -|---------|--------|-------------| -| archive_2023 | 2023-01-01 to 2023-12-31 | Year 2023 archived data | -| archive_2024 | 2024-01-01 to 2024-12-31 | Year 2024 archived data | -| archive_2025 | 2025-01-01 to 2025-12-31 | Year 2025 archived data | +- Use estimated counts from system tables (`information_schema.tables.table_rows`, `pg_stat_user_tables.n_live_tup`, `estimatedDocumentCount()`); never `COUNT(*)` on large tables. +- Always `LIMIT` ad-hoc sampling queries. +- For enum sampling, query a small sample or use indexed columns only. +- Prefer a read replica when available. +- Tables >10M rows = "VERY LARGE — always filter by date/indexed column". -## All Tables (per dataset) +Use the schema commands from the "CLI Command Reference" table above, then for each table/collection capture: - +- **Indexes** — MySQL: `SHOW INDEX FROM
`. PostgreSQL: `SELECT indexname, indexdef FROM pg_indexes WHERE tablename = '
';`. MongoDB: `db..getIndexes()`. Elasticsearch: `curl -s "$ES_URL//_mapping" | jq`. +- **Date ranges** — `SELECT MIN(created_at), MAX(created_at) FROM
;` (or MongoDB `$min`/`$max` aggregation). +- **Sample document** — MongoDB `db..findOne()`; Redis `HGETALL`/`TTL`. +- **BigQuery** — iterate datasets: `for ds in $(echo "$BQ_DATASETS" | tr ',' ' '); do echo "=== $ds ==="; bq ls --project_id="$BQ_PROJECT" "$ds"; done`. For each table: `bq show --schema --format=prettyjson --project_id="$BQ_PROJECT" "$DATASET.
"` and `bq show --project_id="$BQ_PROJECT" "$DATASET.
"` (row count, partitioning). -| Table | Purpose | Key Fields for Filtering/Grouping | -|-------|---------|-----------------------------------| -| (one row per table — list ALL of them) | | | +### Step 8 — Sample enum / status field values -## Field Mappings & Enums +Use safe sampling depending on table size: -| Dataset.Table.Field | Value | Meaning | -|---------------------|-------|---------| -| *.orders.status | 0 | Pending | -| ... | ... | ... | +| DB | Small table (<1M) | Large table (>1M) | Very large | +| -- | ----------------- | ----------------- | ---------- | +| MySQL/PostgreSQL | `SELECT status, COUNT(*) FROM TABLE GROUP BY status ORDER BY count DESC;` | Add `WHERE created_at >= NOW() - INTERVAL 30 DAY` (PG: `INTERVAL '30 days'`) | `SELECT DISTINCT status FROM TABLE LIMIT 20;` | +| MongoDB | `db.COLL.aggregate([{$group: {_id: "$status", count: {$sum: 1}}}, {$sort: {count: -1}}])` | Prepend `{$sample: {size: 10000}}` to the pipeline | (sampled) | +| Elasticsearch | `terms` aggregation with `size: 0` (always safe — uses approximate counts) | same | same | +| BigQuery | `SELECT status, COUNT(*) FROM \`$BQ_PROJECT.$DATASET.TABLE\` GROUP BY status ORDER BY count DESC LIMIT 20;` | Use `APPROX_COUNT_DISTINCT(ID)` and always include partition filter | `--dry_run` first to estimate cost | -## Relationships +### Step 9 — Update `docs/DB.md` with verified data -- `orders.user_id → users.id` -- `order_items.order_id → orders.id` +**Completeness check before writing:** every table/collection/index returned by Step 7 has a row in the "All Tables / Collections / Indices" section. There must be a 1:1 correspondence — no skipping framework or join tables. -## Date/Time Fields +**Add Large Table Warnings.** For tables >1M rows: list with safeguards. For >10M rows: mark "VERY LARGE — always filter by date/indexed column" and list specific indexed columns. -| Table.Field | Purpose | Notes | -|-------------|---------|-------| -| orders.created_at | Order creation | TIMESTAMP type — use TIMESTAMP functions | +**Common Business Questions** — from Step 4's BI/dashboard scan, document recurring analytics questions with the correct tables/joins/filters. Helps `query-db` users avoid common mistakes. -## Money/Numeric Fields +**Add row/document counts** to listings, **replace enum guesses with actual values + counts**, **document actual indexes**, **add date ranges**. -| Table.Field | Unit | Notes | -|-------------|------|-------| -| orders.total | cents | Divide by 100 for display | +**"Last verified" line at top of `docs/DB.md`:** -## Partitioning & Clustering +- Live DB verified: `> **Last verified**: YYYY-MM-DD — verified against live database` +- Code-only (Steps 7-8 skipped): `> **Last verified**: YYYY-MM-DD — derived from code analysis only (not verified against live database)` -| Dataset.Table | Partition Column | Clustering Columns | Notes | -|---------------|-----------------|-------------------|-------| -| *.orders | created_at | status, user_id | Always filter on created_at to reduce bytes scanned | +## Document Templates -## Cross-Dataset Query Pattern +`docs/DB.md` always starts with H1 `# Database Schema Documentation` and the "Last verified" line. The body sections depend on the DB type. Below are the required sections per DB. Fill them with discovered content; do not paste placeholder rows. -When querying across years, use UNION ALL: +### SQL (MySQL / PostgreSQL / SQLite) -~~~sql -WITH all_orders AS ( - SELECT * FROM \`project.archive_2024.orders\` - UNION ALL - SELECT * FROM \`project.archive_2025.orders\` -) -SELECT DATE(created_at) as day, COUNT(*) as total -FROM all_orders -WHERE created_at >= '2024-06-01' -GROUP BY day -ORDER BY day DESC; -~~~ +Required sections, in order: -## Query Anti-Patterns +1. **Database Type** — MySQL / PostgreSQL / SQLite. +2. **CLI Command** — used by `query-db` skill (e.g. `MYSQL_PWD="$MYSQL_PASS" mysql -h "$MYSQL_HOST" -P "$MYSQL_PORT" -u "$MYSQL_USER" "$MYSQL_DB"` or `psql`). +3. **Framework** — detected framework name. +4. **Database Overview** — one paragraph on what data this system holds. +5. **All Tables** — single table listing **every** table: `Table | Purpose | Key Fields for Filtering/Grouping | Rows`. +6. **Field Mappings & Enums** — `Table.Field | Value | Meaning | Count`. +7. **Relationships** — `parent.fk → child.pk` arrows. +8. **Date/Time Fields** — `Table.Field | Purpose | Notes` (TZ, granularity). +9. **Money/Numeric Fields** — `Table.Field | Unit | Notes` (e.g. cents, divide by 100). +10. **Soft Deletes** — list tables using `deleted_at`/`is_deleted`; remind to add `WHERE deleted_at IS NULL`. +11. **Multi-Tenancy** — note tenant isolation columns if applicable (`organization_id`, `tenant_id`). +12. **Framework / Infrastructure Tables** — migration tracking, sessions, queues, cache (still listed in All Tables; this section explains them). +13. **Large Table Warnings** — `Table | Rows | Required Safeguards`. +14. **Query Anti-Patterns** — `# | Anti-Pattern | Why It's Bad | Do Instead`. Standard rows: `SELECT *` without WHERE on large tables; unbounded `COUNT(*)`; unfiltered JOIN between large tables; `GROUP BY` on non-indexed columns; ignoring denormalized analytics tables. +15. **Common Business Questions** — `# | Question | Tables Involved | Key Filters` (from Step 4 BI scan). +16. **Common Query Patterns** — fenced SQL examples (e.g. Daily Order Summary with date filter + `deleted_at IS NULL`). -| # | Anti-Pattern | Why It's Bad | Do Instead | -|---|-------------|--------------|------------| -| 1 | Missing partition filter | Full table scan — expensive (billed by bytes scanned) | Always filter on partition column | -| 2 | `SELECT *` on wide tables | Scans all columns — BigQuery is columnar | Select only needed columns | -| 3 | `UNION ALL` across all datasets without date filter | Scans every year's data | Only include datasets relevant to the date range | -| 4 | Using `LIMIT` to reduce cost | LIMIT does NOT reduce bytes scanned | Use `WHERE` filters on partitioned/clustered columns | -| 5 | Not using `--dry_run` for large queries | No cost visibility before execution | Run `--dry_run` first to estimate bytes scanned | +### MongoDB -## Cost Estimation +Required sections: + +1. **Database Type** — MongoDB. +2. **CLI Command** — `mongosh "$MONGODB_URI"`. +3. **Framework** — Mongoose / Doctrine ODM / MongoEngine / Mongoid / Spring Data MongoDB. +4. **Database Overview**. +5. **All Collections** — `Collection | Purpose | Key Fields for Filtering/Grouping | Document Count`. +6. **Field Mappings & Enums** — `Collection.Field | Value | Meaning`. +7. **References (Relationships)** — `coll.fkField → otherColl._id`. +8. **Embedded Documents** — `Collection | Embedded Field | Structure`. +9. **Date Fields** — `Collection.Field | Purpose`. +10. **Indexes** — important indexes for query optimization. +11. **Query Anti-Patterns** — standard rows: unbounded `find({})`; `$lookup` between large collections without `$match` first; large `allowDiskUse` aggregations without `$match`. +12. **Common Aggregation Patterns** — fenced JS examples (e.g. Daily Revenue with `$match` first). -Before running queries on large tables, use `--dry_run`: +### Elasticsearch -~~~bash -bq query --use_legacy_sql=false --dry_run --project_id="$BQ_PROJECT" "QUERY" -~~~ +Required sections: -BigQuery pricing: ~$5/TB scanned. Use `--maximum_bytes_billed=1000000000` (1 GB) to cap cost. +1. **Database Type** — Elasticsearch. +2. **CLI Command** — `curl -s "$ES_URL"`. +3. **Framework** — FOSElastica / elasticsearch-py / elastic4s. +4. **Index Overview**. +5. **All Indices** — `Index | Purpose | Key Fields | Doc Count`. +6. **Field Mappings** — `Index.Field | Type | Notes` (e.g. `scaled_float` factor 100, `text + keyword`). +7. **Date Fields** — `Index.Field | Format` (epoch_millis, ISO). +8. **Nested Objects** — `Index | Nested Field | Structure`. +9. **Query Anti-Patterns** — standard rows: `size > 10000`; deep `from + size` pagination (>10000 limit); `match_all` without `size: 0` on large indices. +10. **Common Query Patterns** — fenced JSON examples (aggregations always with `size: 0`). -## Common Query Patterns +### Redis -### Daily Summary (Single Year) +Required sections: -~~~sql -SELECT DATE(created_at) as day, COUNT(*) as orders, SUM(total)/100 as revenue -FROM \`project.archive_2025.orders\` -WHERE created_at >= '2025-01-01' -GROUP BY day -ORDER BY day DESC; -~~~ +1. **Database Type** — Redis. +2. **CLI Command** — `redis-cli -u "$REDIS_URL"`. +3. **Framework** — ioredis / redis-py / Predis. +4. **Data Overview**. +5. **Key Patterns** — `Pattern | Type | Purpose` (e.g. `user:{id}` Hash, `cache:product:{id}` String/JSON, `stats:pageviews` HyperLogLog). +6. **Data Structures** — per-pattern detail (Hash fields, Sorted Set scores/members). +7. **TTL Patterns** — `Pattern | TTL | Notes`. +8. **Query Anti-Patterns** — standard rows: `KEYS *` in production (use `SCAN`); `FLUSHDB`/`FLUSHALL` without confirmation. +9. **Common Query Patterns** — `HGETALL`/`ZREVRANGE`/`PFCOUNT` examples in fenced blocks. -### Cross-Year Comparison +### BigQuery -~~~sql -WITH all_orders AS ( - SELECT * FROM \`project.archive_2024.orders\` - UNION ALL - SELECT * FROM \`project.archive_2025.orders\` -) -SELECT - EXTRACT(YEAR FROM created_at) as year, - EXTRACT(MONTH FROM created_at) as month, - COUNT(*) as orders, - SUM(total)/100 as revenue -FROM all_orders -GROUP BY year, month -ORDER BY year, month; -~~~ -``` +Required sections: ---- +1. **Database Type** — BigQuery. +2. **CLI Command** — `bq query --use_legacy_sql=false --format=prettyjson --project_id="$BQ_PROJECT"`. +3. **Datasets** — `Dataset | Period | Description`. +4. **All Tables (per dataset)** — `Table | Purpose | Key Fields | Rows`. Note any datasets with differing schemas. +5. **Field Mappings & Enums** — `Dataset.Table.Field | Value | Meaning` (use `*.table.field` if uniform across datasets). +6. **Relationships** — FK arrows. +7. **Date/Time Fields** — TIMESTAMP type notes. +8. **Money/Numeric Fields** — units. +9. **Partitioning & Clustering** — `Dataset.Table | Partition Column | Clustering Columns | Notes` — always filter on partition to reduce bytes scanned. +10. **Cross-Dataset Query Pattern** — fenced SQL with `UNION ALL` across yearly archives. +11. **Query Anti-Patterns** — missing partition filter; `SELECT *` on wide tables; `UNION ALL` across all datasets without date filter; `LIMIT` to reduce cost (it doesn't); skipping `--dry_run` for large queries. +12. **Cost Estimation** — note `--dry_run` workflow and `--maximum_bytes_billed=1000000000` cap. BigQuery pricing ~$5/TB scanned. +13. **Common Query Patterns** — fenced SQL: Daily Summary (single year) and Cross-Year Comparison. -## Template for Multi-Database Projects +### Multi-database projects -If the project uses multiple databases, create sections for each: +If multiple DBs are used, the file has one H1 + a "Databases Used" list, then one H2 section per database following the appropriate template above. Example: ```markdown # Database Schema Documentation @@ -1360,33 +310,21 @@ If the project uses multiple databases, create sections for each: 2. Redis (caching, sessions) 3. Elasticsearch (search) ---- - ## PostgreSQL - -[Include full SQL template sections here] - ---- +[full SQL template sections] ## Redis - -[Include full Redis template sections here] - ---- +[full Redis template sections] ## Elasticsearch - -[Include full Elasticsearch template sections here] +[full Elasticsearch template sections] ``` ---- - ## Rules -- Keep descriptions concise and focused on querying needs -- Include actual values from the codebase, not placeholders -- Note any gotchas (soft deletes, tenant isolation, TTLs, etc.) -- If multiple databases are used, include sections for each -- Document the CLI command to use for queries -- Identify the framework used for future reference -- **Document ALL tables/collections/indices without exception.** Every database object must have a row in the documentation. Do not skip join tables, migration tables, session tables, queue tables, cache tables, or any other table — they all go in the "All Tables" section. Group framework/infrastructure tables in their own section if desired, but they must still be listed. +- Keep descriptions concise and focused on querying needs. +- Use actual values from the codebase, not placeholders. +- Note gotchas (soft deletes, tenant isolation, TTLs, partitioning). +- Document the CLI command in every file (used by `query-db`). +- Identify the framework for future reference. +- **Document every table/collection/index without exception.** Join tables, migration trackers, session tables, queue tables, cache tables — all of them. Group framework/infrastructure tables in their own section if you like, but list them. diff --git a/skills/review-architecture/SKILL.md b/skills/review-architecture/SKILL.md index 523f579..27b3075 100644 --- a/skills/review-architecture/SKILL.md +++ b/skills/review-architecture/SKILL.md @@ -6,100 +6,28 @@ allowed-tools: Bash(gh:*), Bash(git:*), Bash(awk:*), Bash(basename:*), Bash(cat: # Review Architecture Documentation -Review the `docs/architecture.md` file in a repository and create or update it to match organizational standards. This skill deeply analyzes the codebase to ensure architecture documentation is accurate, complete, and reflects the actual implementation. Works for all repository types and languages. +Review or create `docs/architecture.md` to match organizational standards. Works for all repository types and languages. -## CRITICAL: Mandatory Analysis Tracking +## Phase Tracking -**You MUST maintain an analysis checklist throughout execution.** At each step, record what was found. This ensures consistent, reproducible results. +Use `TaskCreate` to track each phase below. Mark `in_progress` on entry, `completed` when results are recorded. Do NOT include the task list in the final output. -**Before starting, create this tracking structure and update it as you progress:** +**Required phases:** -```text -=== ANALYSIS CHECKPOINT LOG === -[ ] Step 1: Repository Information - - organization: (pending) - - repository: (pending) - - has_architecture_doc: (pending) - - has_docs_dir: (pending) - - doc_last_modified: (pending) - - code_last_modified: (pending) - -[ ] Step 2: Exemption Check - - existing_exemption: (pending) - - exempt_type_detected: (pending) - -[ ] Step 3: Project Type Detection - - project_type: (pending) - - ml_frameworks: (pending) - - has_model_files: (pending) - -[ ] Step 4/5: Deep Codebase Analysis (complete ALL applicable sub-checks) - For Standard Projects: - [ ] 4.1 Architecture Diagram - diagrams_found: (pending), referenced_in_doc: (pending) - [ ] 4.2 Software Units - modules_in_code: (pending), modules_in_doc: (pending), missing_from_doc: (pending) - [ ] 4.3 SOUP Validation - soup_json_exists: (pending), packages_in_lockfile: (pending), packages_in_soup: (pending), missing: (pending), stale: (pending) - [ ] 4.4 Critical Algorithms - algorithms_found: (pending), documented: (pending), undocumented: (pending) - [ ] 4.5 Risk Controls - auth_patterns: (pending), validation_patterns: (pending), error_handling: (pending), logging: (pending) - - For ML/DL Projects: - [ ] 5.1 Datasets - datasets_found: (pending), documented: (pending) - [ ] 5.2 Data Preprocessing - preprocessing_found: (pending), documented: (pending) - [ ] 5.3 Data Splits - splits_found: (pending), documented: (pending) - [ ] 5.4 Model Architecture - models_found: (pending), documented: (pending) - [ ] 5.5 Model Training - training_config_found: (pending), documented: (pending) - [ ] 5.6 Model Evaluation - metrics_found: (pending), documented: (pending) - [ ] 5.7 Model Deployment - deployment_found: (pending), documented: (pending) - -[ ] Step 6: Document Structure Validation - - h1_title_correct: (pending) - - required_sections_present: (pending) - - section_order_correct: (pending) - - toc_links_valid: (pending) - -[ ] Step 7: Report Generated - - all_checks_completed: (pending) - - issues_found: (pending) -=== END CHECKPOINT LOG === -``` - -**COMPLETION REQUIREMENT:** Before generating the final report, you MUST verify that ALL applicable checkpoints show actual values (not "pending"). If any checkpoint is still "pending", go back and complete that analysis step. - -**EVIDENCE REQUIREMENT:** For every check, you MUST record: - -1. **What was found in docs** - the exact text/claim from architecture.md -2. **What was found in code** - the actual code evidence (file paths, function names, imports) -3. **Comparison result** - MATCH, MISMATCH, or MISSING with specific details - -A bare "PASS" without evidence is not acceptable. If you cannot provide evidence, the check is incomplete. - -**DO NOT SKIP STEPS.** Even if an earlier check seems to suggest no issues, you MUST complete ALL steps. Issues are often only revealed when cross-referencing multiple sources. - -## Step 0: Read the Full Architecture Document - -**Before any code analysis**, read the entire `docs/architecture.md` (if it exists) and extract every factual claim that needs verification: - -```bash -cat docs/architecture.md 2>/dev/null -``` - -Create a **claims inventory** listing every verifiable claim in the document: - -- Module names and their stated purposes -- File paths referenced -- Dependencies listed -- Algorithms described -- Security measures claimed -- Diagram components shown +1. Repository info gathered +2. Exemption check (skip-or-proceed decision) +3. Project type detected (standard vs ML/DL) +4. Read existing doc + extract claims inventory +5. Deep codebase analysis (relevant sub-phases per project type) +6. Existing doc structure validated +7. Report generated +8. Doc written/updated (if approved) -This claims inventory becomes your verification checklist for Steps 4-5. Every claim must be checked against actual code. +**Evidence rule:** Every check must record (a) what the doc claims, (b) what the code shows (file:function), (c) MATCH / MISMATCH / MISSING. Bare "PASS" without code evidence is invalid. ## Architecture Document Types -There are two types of architecture documents based on project type: - -### Standard Projects - -Required H2 sections: +### Standard Projects — required H2 sections ```text ## Table of Contents @@ -110,9 +38,7 @@ Required H2 sections: ## Risk controls ``` -### ML/DL Projects - -For machine learning and deep learning projects, required H2 sections: +### ML/DL Projects — required H2 sections ```text ## Table of Contents @@ -129,177 +55,45 @@ For machine learning and deep learning projects, required H2 sections: ## MCP Tools with Fallbacks -This skill uses MCP tools when available and falls back gracefully if they are unavailable or return errors. - -### GitHub Access +Prefer MCP tools when available; fall back to CLI on errors. -**Prefer MCP tools** (`mcp__github__*`) when available. If MCP tools are not available (tool not found errors), **fall back to the `gh` CLI**. - -| Operation | MCP Tool | CLI Fallback | +| Operation | Preferred | Fallback | | --- | --- | --- | -| Get repo metadata | `mcp__github__get_file_contents` (path: `/`) for top-level structure; for richer metadata use the CLI fallback | `gh repo view --json owner,name,visibility,description` | | Get file contents | `mcp__github__get_file_contents` | `cat ` | -| Get repo owner/name | Parse from `git remote get-url origin` | `gh repo view --json owner,name` | - -### Library Documentation (Context7) +| Repo metadata | `gh repo view --json owner,name,visibility,description` | n/a | +| Library docs | `mcp__context7__*` | `WebSearch` → `mcp__fetch__fetch` | -Use `mcp__context7__resolve-library-id` then `mcp__context7__query-docs` to look up current documentation for libraries and frameworks found in the project. If Context7 is unavailable or returns errors (quota exceeded, timeouts), **fall back to `WebSearch`** and then `mcp__fetch__fetch` to retrieve documentation from official sources. Do not let Context7 failures block the review. - -## Step 1: Gather Repository Information - -Run these commands to collect repository metadata: +## Phase 1: Repository Info ```bash -# Get organization and repository name (fallback if MCP tools unavailable) gh repo view --json owner,name,visibility,description +ls -la docs/architecture.md docs/ 2>/dev/null +git log -1 --format="%ci" -- docs/architecture.md 2>/dev/null +git log -1 --format="%ci" -- src lib app pkg internal cmd 2>/dev/null | head -1 ``` -```bash -# Check if docs/architecture.md exists -ls -la docs/architecture.md 2>/dev/null || echo "No docs/architecture.md found" -``` - -```bash -# Check if docs directory exists -ls -la docs/ 2>/dev/null || echo "No docs directory found" -``` - -```bash -# Get last modified date of architecture.md vs source code -git log -1 --format="%ci" -- docs/architecture.md 2>/dev/null || echo "N/A" -git log -1 --format="%ci" -- src lib app pkg internal cmd 2>/dev/null | head -5 -``` - -Store these values: - -- `organization`: The owner/organization name -- `repository`: The repository name -- `has_architecture_doc`: true/false -- `has_docs_dir`: true/false -- `doc_last_modified`: Date of last architecture.md change -- `code_last_modified`: Date of most recent source code change - -## Step 2: Check if Architecture Documentation is Required - -Some repository types do not require architecture documentation. Detect these and create an exemption file instead of nonsensical documentation. - -### Check for Existing Exemption - -```bash -# Check if already marked as not required -head -5 docs/architecture.md 2>/dev/null | grep -q "Architecture documentation is not required" && echo "EXEMPT" || - echo "NOT_EXEMPT" -``` - -If the file already contains the exemption marker, **stop here** - no further action needed. - -### Detect Exempt Repository Types - -**Homebrew Taps:** - -```bash -# Check for Homebrew tap pattern -gh repo view --json name --jq '.name' | grep -qE "^homebrew-" && echo "HOMEBREW_TAP" -ls -la Formula/ Casks/ 2>/dev/null -``` - -**Claude Code Plugins:** - -```bash -# Check for Claude Code plugin -ls -la .claude-plugin/plugin.json skills/ commands/ 2>/dev/null -``` - -**Configuration/Dotfiles Repositories:** - -```bash -# Check if repo is mostly config files -find . -maxdepth 2 -type f \( -name "*.yml" -o -name "*.yaml" -o -name "*.json" -o -name "*.toml" -o -name ".*" \) 2>/dev/null | - wc -l -find . -maxdepth 2 -type f \( -name "*.py" -o -name "*.js" -o -name "*.ts" -o -name "*.go" -o -name "*.rs" -o -name "*.rb" -o -name "*.java" \) 2>/dev/null | - wc -l -``` - -**Documentation-Only Repositories:** - -```bash -# Check if repo is only documentation -find . -maxdepth 3 -type f -name "*.md" 2>/dev/null | wc -l -find . -maxdepth 3 -type f \( -name "*.py" -o -name "*.js" -o -name "*.ts" -o -name "*.go" -o -name "*.rs" -o -name "*.rb" \) 2>/dev/null | - wc -l -``` - -**GitHub Profile Repositories:** - -```bash -# Check if repo name matches owner (profile README repo) -OWNER=$(gh repo view --json owner --jq '.owner.login') -NAME=$(gh repo view --json name --jq '.name') -[ "$OWNER" = "$NAME" ] && echo "PROFILE_REPO" -``` - -**GitHub Actions:** - -```bash -# Check for GitHub Action -ls -la action.yml action.yaml 2>/dev/null -cat action.yml action.yaml 2>/dev/null | grep -q "runs:" && echo "GITHUB_ACTION" -``` - -**Terraform Modules:** - -```bash -# Check for Terraform module (no main application) -ls -la *.tf modules/ 2>/dev/null -find . -name "*.tf" -not -path "*/.terraform/*" 2>/dev/null | head -5 -``` - -**Ansible Roles/Playbooks:** - -```bash -# Check for Ansible -ls -la playbooks/ roles/ tasks/ handlers/ ansible.cfg 2>/dev/null -``` - -**Kubernetes/Helm Charts:** - -```bash -# Check for Helm chart or K8s manifests only -ls -la Chart.yaml values.yaml templates/ 2>/dev/null -find . -name "*.yaml" -path "*/templates/*" 2>/dev/null | head -5 -``` - -**Meta/Organization Repositories:** - -```bash -# Check for org-wide config repos -gh repo view --json name --jq '.name' | grep -qiE "^\.github$|^meta$|^org-|^team-|^-config$|-settings$" && echo "META_REPO" -``` - -### Exempt Repository Types +Record: organization, repository, has_architecture_doc, has_docs_dir, doc_last_modified, code_last_modified. Flag as STALE if code changed significantly after the last doc update. -| Type | Detection | Reason | -|--------------------|------------------------------------------------|---------------------------------------------| -| Homebrew Tap | `homebrew-*` name, `Formula/` or `Casks/` dirs | Package distribution, no application logic | -| Claude Code Plugin | `.claude-plugin/`, `skills/`, `commands/` dirs | Plugin config/prompts, no application logic | -| Dotfiles/Config | >80% config files, no source code | Configuration only | -| Documentation | Only `.md` files, no source code | No software architecture | -| GitHub Profile | Repo name matches owner | Profile README only | -| GitHub Action | `action.yml` with `runs:` | Simple action wrapper | -| Terraform Module | Only `.tf` files, no application | Infrastructure as code, not software | -| Ansible Role | `playbooks/`, `roles/`, `tasks/` | Automation scripts, not software | -| Helm Chart | `Chart.yaml`, `templates/` | K8s deployment config | -| Meta Repository | `.github`, `meta`, `org-*`, `*-config` | Org settings, no application | +## Phase 2: Exemption Check -### Create Exemption File +If `docs/architecture.md` already starts with "Architecture documentation is not required", **stop here**. -If the repository matches an exempt type, create the exemption file: +Otherwise check whether the repo qualifies for an exemption: -```bash -mkdir -p docs -``` +| Exempt Type | Detection signal(s) | Reason | +| ----------- | ------------------- | ------ | +| Homebrew Tap | Repo name `homebrew-*`; `Formula/` or `Casks/` dirs | Package distribution, no application logic | +| Claude Code Plugin | `.claude-plugin/plugin.json`; `skills/` and/or `commands/` dirs | Plugin config/prompts, no application logic | +| Dotfiles / Config | >80% config files (yaml/json/toml/dotfiles), no source code | Configuration only | +| Documentation-only | Only `.md` files, no source files | No software architecture | +| GitHub Profile | Repo name equals owner name | Profile README only | +| GitHub Action | `action.yml` / `action.yaml` with `runs:` | Simple action wrapper | +| Terraform Module | Only `.tf` files (no `.terraform/`), no app | Infrastructure as code | +| Ansible Role | `playbooks/`, `roles/`, `tasks/`, `ansible.cfg` | Automation, not software | +| Helm Chart | `Chart.yaml`, `templates/` | K8s deployment config | +| Meta Repository | Name matches `.github`, `meta`, `org-*`, `*-config`, `*-settings` | Org settings, no application | -**Exemption Template:** +If exempt, write `docs/architecture.md` with this content (substitute `{type}` and the message/link from the table below) and STOP: ```markdown # Architecture Design @@ -312,13 +106,11 @@ This repository is a **{type}** which does not contain application software requ ### Repository Type: {type} -{Description of why this type doesn't need architecture docs} +{Reason from table below} ## Documentation -For more information about this repository type, see: - -{Link to relevant documentation} +For more information about this repository type, see {link from table below}. ## When This Might Change @@ -332,14 +124,12 @@ Architecture documentation would be required if this repository evolves to inclu If the repository scope changes, remove this file and run the architecture review again. ``` -**Exemption Messages and Documentation Links by Type:** - -| Type | Message | Documentation | -| ---- | ------- | ------------- | +| Type | Reason text | Link | +| ---- | ----------- | ---- | | Homebrew Tap | Homebrew taps contain package formulae for distribution, not application source code. | [Homebrew Taps](https://docs.brew.sh/Taps) | | Claude Code Plugin | Claude Code plugins contain skill definitions and prompts, not application architecture. | [Claude Code Extensions](https://docs.anthropic.com/en/docs/claude-code/extensions) | -| Dotfiles/Config | This repository contains configuration files only, with no application logic to document. | N/A | -| Documentation | This repository contains documentation only, with no software architecture. | N/A | +| Dotfiles/Config | This repository contains configuration files only, with no application logic to document. | n/a | +| Documentation | This repository contains documentation only, with no software architecture. | n/a | | GitHub Profile | This is a GitHub profile README repository, not a software project. | [GitHub Profile README](https://docs.github.com/en/account-and-profile/setting-up-and-managing-your-github-profile/customizing-your-profile/managing-your-profile-readme) | | GitHub Action | GitHub Actions are simple workflow wrappers, not applications requiring architecture docs. | [Creating Actions](https://docs.github.com/en/actions/sharing-automations/creating-actions) | | Terraform Module | Terraform modules define infrastructure, not software architecture. | [Terraform Modules](https://developer.hashicorp.com/terraform/language/modules) | @@ -347,1113 +137,255 @@ If the repository scope changes, remove this file and run the architecture revie | Helm Chart | Helm charts define Kubernetes deployments, not software architecture. | [Helm Charts](https://helm.sh/docs/topics/charts/) | | Meta Repository | Meta repositories contain organization settings, not software projects. | [GitHub Organizations](https://docs.github.com/en/organizations) | -**After creating the exemption file, STOP** - do not proceed with architecture documentation steps. - -## Step 3: Detect Project Type (Standard vs ML/DL) - -Determine if this is a Machine Learning / Deep Learning project. - -### Detection Method 1: Repository Name Patterns - -```bash -gh repo view --json name --jq '.name' -``` - -ML/DL indicators in repository name: - -- Contains `-ml`, `-dl`, `-ai` -- Ends with `-model`, `-models` -- Contains `machine-learning`, `deep-learning` - -### Detection Method 2: ML/DL Framework Dependencies - -**Python projects:** - -```bash -# Check requirements.txt -cat requirements.txt 2>/dev/null | grep -iE "tensorflow|pytorch|torch|keras|scikit-learn|sklearn|xgboost|lightgbm|transformers|huggingface|jax|mlflow|wandb|optuna|numpy|pandas|scipy" -``` - -```bash -# Check pyproject.toml -cat pyproject.toml 2>/dev/null | grep -iE "tensorflow|pytorch|torch|keras|scikit-learn|sklearn|xgboost|lightgbm|transformers|huggingface|jax|mlflow|wandb|optuna" -``` - -```bash -# Check poetry.lock or requirements for ML framework presence -cat poetry.lock requirements.txt 2>/dev/null | grep -iE "^(tensorflow|torch|keras|scikit-learn)==" | head -10 -``` - -**Node.js projects:** - -```bash -cat package.json 2>/dev/null | jq -r '.dependencies, .devDependencies | keys[]' 2>/dev/null | grep -iE "tensorflow|brain|ml5|synaptic" -``` - -### Detection Method 3: ML/DL Directory Structure - -```bash -# Check for ML-specific directories -ls -la models/ model/ training/ train/ data/ datasets/ notebooks/ checkpoints/ weights/ experiments/ 2>/dev/null -``` - -```bash -# Check for Jupyter notebooks -find . -maxdepth 3 -name "*.ipynb" 2>/dev/null | wc -l -``` - -```bash -# Check for model files -find . -maxdepth 3 \( -name "*.h5" -o -name "*.pkl" -o -name "*.pt" -o -name "*.pth" -o -name "*.onnx" -o -name "*.pb" -o -name "*.safetensors" \) 2>/dev/null | - head -5 -``` - -### Detection Method 4: Code Pattern Analysis - -```bash -# Search for ML patterns in Python files -grep -rl "model\.fit\|model\.train\|DataLoader\|tf\.keras\|torch\.nn\|sklearn\." --include="*.py" . 2>/dev/null | wc -l -``` - -### Classification Rules - -**Classify as ML/DL project if ANY of these are true:** +## Phase 3: Detect Project Type (Standard vs ML/DL) -- Repository name matches ML/DL patterns -- ML frameworks found in dependencies (tensorflow >= any, pytorch/torch, keras, scikit-learn) -- Has `models/`, `training/`, `datasets/` directories with content -- Contains 3+ Jupyter notebooks -- Has model checkpoint files (.h5, .pt, .pth, .onnx, .pkl) -- 5+ files contain ML code patterns +Classify as **ML/DL** if ANY of: -**Otherwise, classify as Standard project.** +- Repo name contains `-ml`, `-dl`, `-ai`, `-model`, `machine-learning`, or `deep-learning`. +- Dependencies include any of: `tensorflow`, `pytorch`/`torch`, `keras`, `scikit-learn`/`sklearn`, `xgboost`, `lightgbm`, `transformers`, `huggingface`, `jax`, `mlflow`, `wandb`, `optuna`. Check `requirements.txt`, `pyproject.toml`, `poetry.lock`, `package.json`. +- ML directories with content: `models/`, `training/`, `datasets/`, `notebooks/`, `checkpoints/`, `weights/`, `experiments/`. +- 3+ Jupyter notebooks anywhere. +- Model checkpoint files: `*.h5`, `*.pkl`, `*.pt`, `*.pth`, `*.onnx`, `*.pb`, `*.safetensors`. +- 5+ files match patterns: `model\.fit`, `model\.train`, `DataLoader`, `tf\.keras`, `torch\.nn`, `sklearn\.`. -Store: +Otherwise classify as **standard**. Record: `project_type`, `ml_frameworks`, `has_model_files`. -- `project_type`: "ml_dl" or "standard" -- `ml_frameworks`: List of detected ML frameworks -- `has_model_files`: true/false - -## Step 4: Deep Codebase Analysis - Standard Projects - -### 4.1 Architecture Diagram Verification - -```bash -# Find existing diagram files -find . -maxdepth 3 \( -name "*.png" -o -name "*.svg" -o -name "*.drawio" -o -name "*.mmd" -o -name "*.mermaid" -o -name "*.puml" \) 2>/dev/null | - grep -iE "arch|diagram|overview|system|structure" -``` - -```bash -# Check if diagrams are referenced in architecture.md -grep -iE "\!\[.*\]\(.*\.(png|svg|drawio)\)" docs/architecture.md 2>/dev/null -grep -iE "```mermaid" docs/architecture.md 2>/dev/null -``` - -**Verification:** - -- If diagram exists, check modification date vs code changes -- Flag if diagram is older than significant code changes -- List components shown in diagram vs actual modules - -### 4.2 Software Units Deep Analysis - -**Discover actual module structure:** - -```bash -# Python packages -find . -name "__init__.py" -not -path "*/venv/*" -not -path "*/.venv/*" -not -path "*/node_modules/*" 2>/dev/null | - sed 's|/[^/]*$||' | sort -u -``` - -```bash -# Node.js/TypeScript modules -cat package.json 2>/dev/null | jq -r '.main, .exports | if type == "object" then keys[] else . end' 2>/dev/null -ls -la src/ lib/ 2>/dev/null -``` - -```bash -# Go packages -find . -name "*.go" -not -path "*/vendor/*" 2>/dev/null | xargs -I {} dirname {} | sort -u -``` - -```bash -# Rust crates -find . -name "Cargo.toml" 2>/dev/null | xargs -I {} dirname {} -``` - -**For each discovered module, extract:** - -```bash -# Python: Get module docstring and main classes/functions -head -30 {module}/__init__.py 2>/dev/null -grep -E "^class |^def |^async def " {module}/*.py 2>/dev/null | head -20 -``` +## Phase 4: Read Doc + Build Claims Inventory ```bash -# Node.js: Get exports -grep -E "^export |^module\.exports" {module}/index.{js,ts} {module}.{js,ts} 2>/dev/null | head -20 -``` - -```bash -# Go: Get package doc and exported functions -head -20 {module}/*.go 2>/dev/null | grep -E "^package |^// |^func [A-Z]" +cat docs/architecture.md 2>/dev/null ``` -**Cross-reference with documentation (MANDATORY - do not skip):** +Build a verifiable claims list from the doc — for every entry, you'll cross-check against code in Phase 5: -- Read the "Software units" section from docs/architecture.md -- Create a two-column comparison: modules listed in docs vs modules found in code -- For EACH module in docs: verify it exists in code and its description matches actual functionality -- For EACH module in code: verify it is documented -- Record the specific mismatches found (not just counts) - -### 4.3 Software of Unknown Provenance (SOUP) Validation - -**IMPORTANT:** The source of truth for SOUP data is `soup.json` (not `soup.md`). The `soup.md` file is auto-generated from `soup.json` and must never be edited directly. All validation and changes must target `soup.json`. +- Module names + stated purposes +- File paths referenced +- Listed dependencies +- Algorithms described +- Security measures claimed +- Diagram components -**Verify soup.json exists:** +## Phase 5: Deep Codebase Analysis -```bash -ls -la docs/soup.json soup.json 2>/dev/null || echo "No soup.json found" -``` +Run only the sub-phases matching the project type. Record exact counts and file:function evidence. -**Extract dependency list from lock files for comparison:** +### 5.A Standard Projects -```bash -# Python -cat poetry.lock 2>/dev/null | grep -E "^name = " | sed 's/name = "//;s/"//' | head -50 -cat requirements.txt 2>/dev/null | grep -v "^#" | cut -d'=' -f1 | cut -d'>' -f1 | cut -d'<' -f1 | head -50 -``` +**5.A.1 Architecture diagram.** Find existing diagrams: `*.png`/`*.svg`/`*.drawio`/`*.mmd`/`*.mermaid`/`*.puml` matching `arch|diagram|overview|system|structure`; check if referenced in `architecture.md` (`![...](...)` or ` ```mermaid`). Compare modification dates — flag if diagram is older than significant code changes. List components shown vs actual modules. -```bash -# Node.js -cat package.json 2>/dev/null | jq -r '.dependencies, .devDependencies | keys[]' 2>/dev/null | head -50 -``` +**5.A.2 Software units.** Discover module structure: -```bash -# Ruby -cat Gemfile.lock 2>/dev/null | grep -E "^ [a-z]" | awk '{print $1}' | head -50 -``` +- Python: `find . -name "__init__.py" -not -path "*/venv/*" -not -path "*/.venv/*"` → take parent dirs. +- Node/TypeScript: `package.json` `main`/`exports`; `src/`, `lib/`. +- Go: parent dirs of `*.go` (excluding `vendor/`). +- Rust: parent dirs of `Cargo.toml`. -```bash -# Go -cat go.mod 2>/dev/null | grep -E "^\t" | awk '{print $1}' | head -50 -``` +For each module, extract docstring (head of `__init__.py`), exported symbols (`^class`, `^def`, `export`, `func [A-Z]`). -**Validate soup.json content against actual code usage:** +**Cross-reference (mandatory):** two-column table of modules-in-doc vs modules-in-code; record specific mismatches with names, not just counts. -**Step 1: Read soup.json and extract all package entries:** +**5.A.3 SOUP validation.** `soup.json` is the source of truth; `soup.md` is auto-generated and must never be edited. ```bash -# Read soup.json to see all documented packages with their Risk Level, Requirements, and Verification Reasoning +ls -la docs/soup.json soup.json 2>/dev/null cat docs/soup.json soup.json 2>/dev/null ``` -**Step 2: For EACH package in soup.json, validate the three fields:** - -**You MUST validate EVERY package, not a sample.** For each package, record: - -- Package name -- Stated Requirements vs actual code usage found -- Stated Risk Level vs expected risk level for this type of package -- Stated Verification Reasoning vs whether it explains the specific choice - -For each package entry, run these commands to verify accuracy: - -```bash -# Find how the package is actually used in the codebase -grep -rn "require.*{package}\|import.*{package}\|from {package}\|use {package}" --include="*.py" --include="*.js" --include="*.ts" --include="*.rb" --include="*.go" --include="*.rs" . 2>/dev/null | grep -v node_modules | grep -v vendor | head -20 -``` - -Then validate: - -1. **Requirements field:** Does the stated purpose match the actual usage found above? - - BAD: AWS SDK with Requirements saying "image processing" - - GOOD: AWS SDK with Requirements saying "Cloud infrastructure API access" - -2. **Risk Level:** Is it appropriate for what the package does? - - | Package Type | Expected Risk Level | - | ---------------------------------- | ------------------- | - | Auth, crypto, security | High | - | Network, HTTP, API clients | High | - | Database, data storage | High | - | File system access | Medium | - | Logging, monitoring | Medium | - | UI, formatting, colors | Low | - | Dev tools, linters, test utilities | Low | - -3. **Verification Reasoning:** Does it explain why THIS package was chosen? - - BAD: Generic "popular library" - - GOOD: "Official AWS SDK maintained by Amazon" or "Only library supporting X protocol" - -**Step 3: Check completeness and staleness:** - -- All packages in lock files must be in soup.json -- Packages removed from lock files must be removed from soup.json - -**Cross-reference with architecture.md:** - -- Verify architecture.md references soup.md (the auto-generated file) and does not duplicate its content -- Flag any version numbers or dependency tables in architecture.md for removal - -### 4.4 Critical Algorithms Deep Analysis - -**Discover algorithm implementations:** - -```bash -# Search for algorithm-related files -find . -name "*algorithm*" -o -name "*crypto*" -o -name "*hash*" -o -name "*sort*" -o -name "*search*" -o -name "*calculate*" -o -name "*compute*" -o -name "*process*" -o -name "*engine*" 2>/dev/null | - grep -v node_modules | grep -v venv -``` - -```bash -# Search for cryptographic operations -grep -rn "crypto\|encrypt\|decrypt\|hash\|hmac\|sha\|md5\|aes\|rsa" --include="*.py" --include="*.js" --include="*.ts" --include="*.go" --include="*.rs" . 2>/dev/null | - grep -v node_modules | grep -v venv | head -20 -``` - -```bash -# Search for complex mathematical operations -grep -rn "matrix\|vector\|gradient\|derivative\|integral\|fourier\|transform" --include="*.py" --include="*.js" --include="*.ts" --include="*.go" --include="*.rs" . 2>/dev/null | - grep -v node_modules | grep -v venv | head -20 -``` - -```bash -# Search for custom data structures -grep -rn "class.*Tree\|class.*Graph\|class.*Queue\|class.*Stack\|class.*Heap" --include="*.py" --include="*.js" --include="*.ts" --include="*.go" --include="*.rs" . 2>/dev/null | - head -20 -``` - -**For each discovered algorithm, extract details:** - -- Read the file containing the algorithm -- Extract function/class signature -- Extract docstring/comments explaining the algorithm -- Note time/space complexity if documented +Extract dependency lists from lock files: `poetry.lock`, `requirements.txt`, `package.json`, `Gemfile.lock`, `go.mod`, `Cargo.lock`. -**Cross-reference with documentation:** +For **every** package in `soup.json` (not a sample), validate three fields: -- Compare documented algorithms vs discovered implementations -- Flag undocumented critical algorithms -- Verify file paths in docs match actual locations -- Check if complexity claims are accurate +1. **Requirements** — does the stated purpose match how the package is actually used? Use `grep -rn "require.*{pkg}\|import.*{pkg}\|from {pkg}\|use {pkg}" --include="*.py" --include="*.js" --include="*.ts" --include="*.rb" --include="*.go" --include="*.rs" .` to find actual usage, then compare: -### 4.5 Risk Controls Deep Analysis + - BAD: AWS SDK with Requirements "image processing" + - GOOD: AWS SDK with Requirements "Cloud infrastructure API access" -**Discover security measures:** - -```bash -# Authentication/Authorization patterns -grep -rn "auth\|login\|session\|token\|jwt\|oauth\|permission\|role\|acl" --include="*.py" --include="*.js" --include="*.ts" --include="*.go" . 2>/dev/null | - grep -v node_modules | grep -v test | head -20 -``` +2. **Risk Level** — appropriate for what the package does: -```bash -# Input validation patterns -grep -rn "validate\|sanitize\|escape\|filter\|whitelist\|blacklist" --include="*.py" --include="*.js" --include="*.ts" --include="*.go" . 2>/dev/null | - grep -v node_modules | head -20 -``` + | Package type | Expected Risk Level | + | ------------ | ------------------- | + | Auth, crypto, security | High | + | Network, HTTP, API clients | High | + | Database, data storage | High | + | File system access | Medium | + | Logging, monitoring | Medium | + | UI, formatting, colors | Low | + | Dev tools, linters, test utilities | Low | -```bash -# Error handling patterns -grep -rn "try:\|catch\|except\|error\|throw\|panic\|recover" --include="*.py" --include="*.js" --include="*.ts" --include="*.go" . 2>/dev/null | - grep -v node_modules | grep -v test | wc -l -``` +3. **Verification Reasoning** — explains why THIS specific package was chosen: + - BAD: "popular library" + - GOOD: "Official AWS SDK maintained by Amazon", "Only library supporting protocol X" -```bash -# Logging patterns -grep -rn "log\.\|logger\.\|logging\.\|console\.log\|fmt\.Print" --include="*.py" --include="*.js" --include="*.ts" --include="*.go" . 2>/dev/null | - grep -v node_modules | grep -v test | head -20 -``` +**Completeness/staleness:** every package in lock files must be in `soup.json`; packages removed from lock files must be removed from `soup.json`. -**Check for security configurations:** +**Architecture.md duplication check:** flag any version numbers or dependency tables in `architecture.md` for removal — it must reference `soup.md`, not duplicate it. -```bash -# Environment variables -grep -rn "process\.env\|os\.environ\|os\.Getenv\|env::" --include="*.py" --include="*.js" --include="*.ts" --include="*.go" --include="*.rs" . 2>/dev/null | - grep -v node_modules | head -20 -``` +**5.A.4 Critical algorithms.** Find candidates: ```bash -# Security headers/middleware -grep -rn "helmet\|cors\|csrf\|xss\|rate.limit\|security" --include="*.py" --include="*.js" --include="*.ts" --include="*.go" . 2>/dev/null | - grep -v node_modules | head -10 +find . \( -name "*algorithm*" -o -name "*crypto*" -o -name "*hash*" -o -name "*engine*" -o -name "*compute*" \) -not -path "*/node_modules/*" -not -path "*/venv/*" +grep -rn "encrypt\|decrypt\|hash\|hmac\|sha\|aes\|rsa" --include="*.py" --include="*.js" --include="*.ts" --include="*.go" --include="*.rs" . | grep -v node_modules | grep -v venv ``` -## Step 5: Deep Codebase Analysis - ML/DL Projects - -### 5.1 Datasets Deep Analysis +Also check for custom data structures (`class .*Tree|Graph|Queue|Stack|Heap`) and complex math (`matrix`, `gradient`, `fourier`, etc.). -**Discover dataset definitions:** +For each: signature, docstring, complexity if documented. Flag undocumented critical algorithms; verify file paths in doc match actual locations. -```bash -# Find dataset classes/loaders -grep -rn "class.*Dataset\|DataLoader\|tf\.data\|torch\.utils\.data" --include="*.py" . 2>/dev/null | grep -v venv | - head -20 -``` +**5.A.5 Risk controls.** Discover security measures: -```bash -# Find data directories and files -find data datasets raw processed -type f 2>/dev/null | head -30 -ls -la data/ datasets/ 2>/dev/null -``` +- **Auth/authz:** `auth`, `login`, `session`, `token`, `jwt`, `oauth`, `permission`, `role`, `acl`. +- **Input validation:** `validate`, `sanitize`, `escape`, `filter`, `whitelist`. +- **Error handling:** `try:`, `catch`, `except`, `throw`, `panic`/`recover`. +- **Logging:** `logger.`, `logging.`, `console.log`, `fmt.Print`. +- **Security middleware/headers:** `helmet`, `cors`, `csrf`, `xss`, `rate.limit`. +- **Env vars / secrets:** `process.env`, `os.environ`, `os.Getenv`, `env::`. -```bash -# Extract dataset statistics -wc -l data/*.csv datasets/*.csv 2>/dev/null -find data datasets -name "*.json" -exec wc -l {} \; 2>/dev/null | head -10 -``` +### 5.B ML/DL Projects -**For each dataset, extract:** +**5.B.1 Datasets.** Find dataset classes/loaders (`class .*Dataset`, `DataLoader`, `tf.data`, `torch.utils.data`); list `data/`, `datasets/`, `raw/`, `processed/` with file counts and sizes. For each: data format, features, labels, validation rules. Verify documented sources/sizes; flag undocumented. -- Read dataset class implementation -- Extract data loading logic -- Note data format, features, labels -- Extract any data validation rules +**5.B.2 Data preprocessing.** Find `def preprocess`, `def transform`, `def normalize`, `def augment`, `class .*Transform`, `Pipeline`, `Compose`. For each: input/output specs, parameters, augmentation techniques. Verify documented order matches code; check parameter defaults. -**Cross-reference with documentation:** +**5.B.3 Data splits.** Find `train_test_split`, `StratifiedKFold`, `KFold`, `random_split`. Extract ratios from `test_size`, `val_size`, `train_size` and config files (`config.yaml`/`yml`/`json`). Check for random seed. -- Compare documented datasets vs actual data files -- Verify dataset sizes/statistics -- Check data source URLs are still valid -- Flag undocumented datasets +**5.B.4 Model architecture.** Find `class .*Model`, `class .*Net`, `nn.Module`, `tf.keras.Model`. Extract layers (`nn.Linear`, `nn.Conv`, `Dense`, `Conv2D`, `LSTM`, `Transformer`, `Attention`); read forward pass; record input/output shapes. Verify documented layers match code. -### 5.2 Data Preprocessing Deep Analysis +**5.B.5 Model training.** Find training scripts (`train*.py`, `*training*.py`, `main.py`). Extract hyperparameters: `learning_rate`/`lr`, `batch_size`, `epochs`, `optimizer` (Adam/SGD/etc.), `loss`. Check argparse defaults and config files (`training_config.*`, `hyperparameters.*`). Verify docs match. -**Discover preprocessing code:** +**5.B.6 Model evaluation.** Find `eval*.py`, `*evaluate*.py`. Extract metrics: `accuracy`, `precision`, `recall`, `f1`, `auc`, `roc`, `mse`, `mae`. Look for `sklearn.metrics`, `torchmetrics`, `tf.keras.metrics`. Read saved results: `results.json`, `metrics.json`, `*evaluation*.json`. Verify documented benchmarks. -```bash -# Find preprocessing functions/classes -grep -rn "def preprocess\|def transform\|def normalize\|def augment\|def clean\|class.*Transform\|class.*Preprocess" --include="*.py" . 2>/dev/null | - grep -v venv | head -20 -``` +**5.B.7 Model deployment.** Find `deploy/`, `deployment/`, `serving/`, `inference/`, `Dockerfile`, `docker-compose*`, K8s manifests. Find inference code (`def predict`, `def inference`, `@app.route`, `@api`, FastAPI/Flask). Extract hardware requirements (`cuda`, `gpu`, `device`, `memory`). Verify deployment matches docs. -```bash -# Find preprocessing pipelines -grep -rn "Pipeline\|Compose\|Sequential.*transform" --include="*.py" . 2>/dev/null | grep -v venv | head -10 -``` +## Phase 6: Validate Existing Doc Structure -**For each preprocessing step, extract:** +If `docs/architecture.md` exists: -- Read the preprocessing function/class -- Extract input/output specifications -- Note any parameters or configurations -- Check for data augmentation techniques +1. **H1 must be exactly `# Architecture Design`.** +2. **Required H2 sections present in correct order** (per project type — see "Architecture Document Types" above). Additional H2 sections after the required ones are allowed. +3. **TOC links resolve** to actual headings. +4. **No version numbers or dep tables in architecture.md** — those belong in `soup.json`/`soup.md` only. -**Cross-reference with documentation:** +## Phase 7: Generate Report -- Compare documented preprocessing steps vs actual code -- Verify transformation order matches implementation -- Check if parameters in docs match code defaults +Pre-report verification: every applicable phase task is complete; cross-reference evidence is recorded; counts are exact. -### 5.3 Data Splits Deep Analysis +```text +## Architecture Documentation Review Report -**Discover split implementation:** +### Repository Info +- Organization: {org} +- Repository: {repo} +- Project Type: {standard / ml_dl} +- Document Status: {exists / missing / exempt} +- Last Doc Update: {date} +- Last Code Update: {date} +- Documentation Freshness: {CURRENT / STALE} -```bash -# Find train/test split code -grep -rn "train_test_split\|split\|StratifiedKFold\|KFold\|random_split" --include="*.py" . 2>/dev/null | grep -v venv | - head -15 -``` +### Structure Checks +- H1 title `# Architecture Design`: {PASS / FAIL — found "{actual}"} +- Required H2 sections present: {PASS / FAIL — list missing} +- Section order correct: {PASS / FAIL} +- TOC links valid: {PASS / FAIL} + +### Content Accuracy (per section) +For each required section: +- Status: {PASS / NEEDS UPDATE / MISSING} +- Issues: {specific problems} +- Discovered in code: {evidence} +- Documented: {claim from doc} + +### SOUP +- soup.json exists: {yes / no} +- architecture.md references soup.md (not duplicates): {yes / no} +- Total deps in lock files: {n} +- Documented in soup.json: {n} +- Missing from soup.json: {list} +- In soup.json but not in code: {list} +- Inaccurate Requirements: {list} +- Misclassified Risk Levels: {list} +- Weak Verification Reasoning: {list} -```bash -# Extract split ratios from code -grep -rn "test_size\|val_size\|train_size\|split.*=" --include="*.py" . 2>/dev/null | grep -v venv | head -15 -``` +### Summary +- Sections accurate: {n}/{total} +- Sections need update: {n} +- Sections missing: {n} +- Critical issues: {high-priority list} -```bash -# Check for split configuration files -cat config.yaml config.yml config.json 2>/dev/null | grep -iE "split|train|val|test" +### Proposed Changes +{Specific edits with before/after for each section} ``` -**Cross-reference with documentation:** +Ask before modifying: "I found the following issues with `docs/architecture.md`. Want me to fix them?" -- Compare documented split ratios vs actual code -- Verify split methodology description -- Check if cross-validation strategy matches +## Phase 8: Write or Update the Doc -### 5.4 Model Architecture Deep Analysis +After approval, write `docs/architecture.md` (`mkdir -p docs` first if needed). **Do not paste the templates verbatim — fill them with content discovered in Phase 5.** -**Discover model definitions:** +### Common skeleton (both project types) -```bash -# Find model classes -grep -rn "class.*Model\|class.*Net\|class.*Network\|nn\.Module\|tf\.keras\.Model" --include="*.py" . 2>/dev/null | - grep -v venv | head -20 -``` +```markdown +# Architecture Design -```bash -# Find model configuration -cat model_config.json model_config.yaml config/model.* 2>/dev/null -``` +## Table of Contents -**For each model, extract architecture details:** +- [{Each required section as a link}](#...) -```bash -# Read model class definition (first 100 lines) -# For each model file found above, read it to extract: -# - Layer definitions -# - Forward pass logic -# - Input/output shapes +{Required sections per project type — see structure above} ``` -```bash -# Extract layer specifications from code -grep -rn "nn\.Linear\|nn\.Conv\|Dense\|Conv2D\|LSTM\|Transformer\|Attention" --include="*.py" . 2>/dev/null | - grep -v venv | head -30 -``` +### Standard project — section content guidance -```bash -# Check for model summary/print -grep -rn "model\.summary\|print.*model\|torchsummary" --include="*.py" . 2>/dev/null | grep -v venv | head -5 -``` +- **Architecture diagram** — embed image (`![Architecture Diagram](./images/architecture.png)`) or fenced ` ```mermaid` block. Add a System Overview paragraph and a Component Interactions paragraph based on discovered modules and their imports. +- **Software units** — for each discovered module: Purpose (from docstring), Location (`path/to/module`), Key Components (classes/functions with docstring summaries), Internal Dependencies (other modules), External Dependencies (third-party packages). +- **Software of Unknown Provenance** — link to `soup.md` (auto-generated). Do NOT duplicate version numbers or dep tables. Include the SOUP fields explainer: + - **Risk Level** (per IEC 62304): Low (cannot lead to harm), Medium (reversible harm), High (irreversible harm). + - **Requirements**: "Why do you need this library?" — examples: "HTTP client for REST API", "CLI argument parsing", "Dependency" (transitive only). + - **Verification Reasoning**: "Why this library among alternatives?" — examples: "Industry standard with active maintenance", "Official SDK provided by vendor", "Dependency" (transitive only). + - Validation: Accuracy (Requirements match actual usage), Completeness (all lock-file packages present), Staleness (removed packages absent), Risk Level (appropriate for function). +- **Critical algorithms** — for each: Purpose, Location (`file` in `ClassName`/`function_name`), Implementation (brief description), Complexity (if documented), Security Considerations (if applicable). +- **Risk controls** — Security Measures (auth/authz, input validation, encryption), Error Handling (patterns from code), Logging & Monitoring, Failure Modes table (Failure Mode | Impact | Mitigation). -**Cross-reference with documentation:** +### ML/DL project — section content guidance -- Compare documented architecture vs actual model code -- Verify layer specifications match implementation -- Check input/output shapes are accurate -- Flag architecture changes not reflected in docs +- **Datasets** — Data Sources table (Dataset | Source | Size | Format), Description (features, labels), Statistics from actual file analysis. +- **Data Preprocessing** — Pipeline (numbered steps with `file:function`), Transformations table (Transformation | Purpose | Implementation), Augmentation if applicable. +- **Data Splits** — Split table (Split | Ratio | Size | Method), Implementation location, random seed if found. +- **Model Architecture** — Model Type, Framework, Layer Specifications table (Layer | Type | Parameters | Output Shape), Configuration with the actual model class signature in a fenced code block, Input/Output specs. +- **Model Training** — Training Configuration table (Parameter | Value | Source) covering Optimizer, Learning Rate, Batch Size, Epochs, Loss Function, LR Scheduler. Training Script location, Procedure summary, Checkpointing approach. +- **Model Evaluation** — Metrics table (Metric | Implementation | Latest Value), Evaluation Script location, Benchmark Results table (Dataset | Metric | Value | Date). +- **Software of Unknown Provenance** — same as Standard, plus call out ML frameworks and data libraries. +- **Risk controls** — Model Risks table (Model drift, Data leakage, Overfitting — each with Likelihood | Impact | Mitigation), Data Risks, Operational Risks. +- **Model Deployment** — Deployment Architecture, Inference Implementation (`file`, entry point), Hardware Requirements table (GPU | Memory | Storage with Source column), Serving Configuration, Monitoring. -### 5.5 Model Training Deep Analysis +## Phase 9: Run Linters -**Discover training configuration:** +After writing/updating, run `/co-dev:run-linters` and fix any errors. -```bash -# Find training scripts -find . -name "train*.py" -o -name "*training*.py" -o -name "main.py" 2>/dev/null | grep -v venv -``` +## Validation Checklist -```bash -# Extract hyperparameters from code -grep -rn "learning_rate\|lr\|batch_size\|epochs\|optimizer\|Adam\|SGD\|loss" --include="*.py" . 2>/dev/null | - grep -v venv | head -30 -``` - -```bash -# Check for config files -cat config.yaml config.yml config.json training_config.* hyperparameters.* 2>/dev/null | head -50 -``` - -```bash -# Find argument parsers for hyperparameters -grep -rn "add_argument.*lr\|add_argument.*batch\|add_argument.*epoch" --include="*.py" . 2>/dev/null | head -15 -``` - -**Extract actual training parameters:** - -- Default values in code -- Values in config files -- Command-line argument defaults - -**Cross-reference with documentation:** - -- Compare documented hyperparameters vs actual code -- Check if optimizer, loss function, lr match -- Verify batch size, epochs are accurate -- Flag any training procedure changes - -### 5.6 Model Evaluation Deep Analysis - -**Discover evaluation code:** - -```bash -# Find evaluation scripts/functions -find . -name "eval*.py" -o -name "*evaluate*.py" -o -name "test*.py" 2>/dev/null | grep -v venv | grep -v __pycache__ -``` - -```bash -# Extract metrics used -grep -rn "accuracy\|precision\|recall\|f1\|auc\|roc\|confusion\|mse\|mae\|loss" --include="*.py" . 2>/dev/null | - grep -v venv | head -30 -``` - -```bash -# Find metric computation -grep -rn "sklearn\.metrics\|torchmetrics\|tf\.keras\.metrics" --include="*.py" . 2>/dev/null | grep -v venv | head -15 -``` - -```bash -# Check for saved evaluation results -find . -name "*results*.json" -o -name "*metrics*.json" -o -name "*eval*.json" 2>/dev/null | head -5 -cat results.json metrics.json evaluation_results.json 2>/dev/null | head -30 -``` - -**Cross-reference with documentation:** - -- Compare documented metrics vs actual evaluation code -- Check if benchmark results are up-to-date -- Verify evaluation methodology matches implementation - -### 5.7 Model Deployment Deep Analysis - -**Discover deployment configuration:** - -```bash -# Find deployment files -ls -la deploy/ deployment/ serving/ inference/ 2>/dev/null -find . -name "Dockerfile*" -o -name "docker-compose*" -o -name "*deploy*" -o -name "*serve*" 2>/dev/null | - grep -v node_modules | head -15 -``` - -```bash -# Find inference code -grep -rn "def predict\|def inference\|@app\.route\|@api\|FastAPI\|Flask" --include="*.py" . 2>/dev/null | grep -v venv | - head -15 -``` - -```bash -# Check for model serving configs -cat serve.yaml serving.yaml deployment.yaml kubernetes/*.yaml 2>/dev/null | head -50 -``` - -```bash -# Find hardware requirements -grep -rn "cuda\|gpu\|device\|cpu\|memory" --include="*.py" --include="*.yaml" --include="*.yml" . 2>/dev/null | - grep -v venv | head -15 -``` - -**Cross-reference with documentation:** - -- Compare documented deployment vs actual configuration -- Verify inference requirements match code -- Check if serving infrastructure is accurate - -## Step 6: Validate Existing Architecture Document Structure - -If `docs/architecture.md` exists, validate its structure. - -### Check H1 Title - -```bash -head -5 docs/architecture.md -grep "^# " docs/architecture.md | head -1 -``` - -**Expected:** `# Architecture Design` (exactly this) - -### Check H2 Sections - -```bash -grep "^## " docs/architecture.md -``` - -**For Standard projects, must start with (in order):** - -```text -## Table of Contents -## Architecture diagram -## Software units -## Software of Unknown Provenance -## Critical algorithms -## Risk controls -``` - -**For ML/DL projects, must start with (in order):** - -```text -## Table of Contents -## Datasets -## Data Preprocessing -## Data Splits -## Model Architecture -## Model Training -## Model Evaluation -## Software of Unknown Provenance -## Risk controls -## Model Deployment -``` - -Additional H2 sections may appear after the required ones. - -### Check Table of Contents Links - -```bash -# Extract TOC links -grep -E "^\s*-\s*\[.*\]\(#" docs/architecture.md -``` - -Verify each link resolves to an actual heading in the document. - -## Step 7: Generate Comprehensive Accuracy Report - -**MANDATORY PRE-REPORT VERIFICATION:** - -Before generating the report, you MUST: - -1. Review your checkpoint log from the start of analysis -2. Verify ALL applicable checkpoints have actual values (not "pending") -3. If ANY checkpoint is still pending, STOP and complete that step first -4. Cross-reference findings: issues found in code analysis MUST appear in the report - -**If you skipped any step, the review is incomplete and results will be inconsistent.** - -After deep analysis, provide a detailed report: - -### Report Format - -```text -## Architecture Documentation Review Report - -### Analysis Checkpoint Log - -{Include your completed checkpoint log here - ALL values must be filled in, none should say "pending"} - -### Repository Info -- **Organization:** {org} -- **Repository:** {repo} -- **Project Type:** {standard/ml_dl} -- **Document Status:** {exists/missing} -- **Last Doc Update:** {date} -- **Last Code Update:** {date} -- **Documentation Freshness:** {CURRENT/STALE - code changed since last doc update} - -### Structure Checks -- [ ] H1 title "# Architecture Design": {PASS/FAIL - found: "{actual}"} -- [ ] Required H2 sections present: {PASS/FAIL} -- [ ] Section order correct: {PASS/FAIL} -- [ ] Table of Contents links valid: {PASS/FAIL} - -### Content Accuracy Checks - -#### {For Standard: "Architecture Diagram" / For ML: "Datasets"} -- **Status:** {PASS/FAIL/NEEDS UPDATE/MISSING} -- **Issues:** - - {Specific issue 1} - - {Specific issue 2} -- **Discovered in code:** {what was actually found} -- **Documented:** {what's currently in docs} - -{Repeat for each section} - -#### Software of Unknown Provenance -- **Status:** {PASS/FAIL/NEEDS UPDATE} -- **soup.json exists:** {yes/no} -- **architecture.md references soup.md:** {yes/no - flag if duplicating content} -- **Total dependencies in lock files:** {n} -- **Documented in soup.json:** {n} -- **Missing from soup.json:** {list} -- **In soup.json but not in code:** {list} -- **Inaccurate Requirements fields:** {list packages where stated purpose doesn't match actual code usage} -- **Misclassified Risk Levels:** {list packages with inappropriate risk level for their function} -- **Weak Verification Reasoning:** {list packages with generic reasoning like "popular library"} - -### Summary -- **Sections accurate:** {n}/{total} -- **Sections need update:** {n} -- **Sections missing:** {n} -- **Critical issues:** {list of high-priority fixes} - -### Proposed Changes -{Show exact changes needed with before/after for each section} -``` - -**Ask the user before making changes:** - -> "I found the following issues with docs/architecture.md. Would you like me to fix them?" - -## Step 8: Create or Update Architecture Document - -### If Creating New Document - -First create the docs directory if needed: - -```bash -mkdir -p docs -``` - -### Standard Project Template - -```markdown -# Architecture Design - -## Table of Contents - -- [Architecture diagram](#architecture-diagram) -- [Software units](#software-units) -- [Software of Unknown Provenance](#software-of-unknown-provenance) -- [Critical algorithms](#critical-algorithms) -- [Risk controls](#risk-controls) - -## Architecture diagram - -{Include or reference architecture diagram - create if missing} - -![Architecture Diagram](./images/architecture.png) - -### System Overview - -{High-level description based on discovered modules and their interactions} - -### Component Interactions - -{Description of how components interact - based on imports/dependencies analysis} - -## Software units - -{For each discovered module:} - -### {Module Name} - -**Purpose:** {Extracted from docstring or inferred from code} - -**Location:** `{actual/path/to/module}` - -**Key Components:** - -- `{ClassName}`: {description from docstring} -- `{function_name}`: {description from docstring} - -**Internal Dependencies:** - -- {Other modules this depends on} - -**External Dependencies:** - -- {Third-party packages used} - -## Software of Unknown Provenance - -See [soup.md](soup.md) for the complete list of third-party dependencies. - -**Verification:** Cross-reference soup.md entries against actual code usage to ensure accuracy: - -### Risk Level - -Classify the potential harm if the library has a vulnerability (per IEC 62304): - -| Level | Definition | -|-------|------------| -| Low | Cannot lead to harm | -| Medium | Can lead to reversible harm | -| High | Can lead to irreversible harm | - -### Requirements - -Answer: "Why do you need this library in your project?" - -Examples: -- "HTTP client for REST API communication" -- "CLI argument parsing and validation" -- "YAML/JSON configuration file parsing" -- "Dependency" (for transitive dependencies only) - -### Verification Reasoning - -Answer: "Why did you select this library among alternatives?" - -Examples: -- "Industry standard with active maintenance and security updates" -- "Official SDK provided by the service vendor" -- "Recommended by framework documentation" -- "Dependency" (for transitive dependencies only) - -### Validation Checks - -1. **Accuracy:** Verify each package's Requirements field matches its actual usage in the codebase (e.g., an AWS SDK should not say "image processing") -2. **Completeness:** All packages in lock files must be in soup.json -3. **Staleness:** Packages removed from lock files must be removed from soup.json -4. **Risk Level:** Verify risk classifications are appropriate (e.g., crypto/auth libraries should be High) - -**Note:** `soup.md` is auto-generated from `soup.json`. All edits must be made to `soup.json`. - -## Critical algorithms - -{For each discovered algorithm:} - -### {Algorithm/Function Name} - -**Purpose:** {From docstring or inferred} - -**Location:** `{actual/path/to/file}` in `{ClassName}` or `{function_name}` - -**Implementation:** -{Brief description of how it works} - -**Complexity:** {If documented or inferrable} - -**Security Considerations:** {If applicable} - -## Risk controls - -### Security Measures - -{Based on discovered security patterns:} - -- **Authentication:** {Discovered auth mechanisms} -- **Authorization:** {Discovered authz patterns} -- **Input Validation:** {Discovered validation} -- **Encryption:** {Discovered crypto usage} - -### Error Handling - -{Based on discovered error handling patterns} - -### Logging & Monitoring - -{Based on discovered logging patterns} - -### Failure Modes - -| Failure Mode | Impact | Mitigation | -|--------------|--------|------------| -| {Inferred from error handling} | {Impact} | {Mitigation} | -``` - -### ML/DL Project Template - -```markdown -# Architecture Design - -## Table of Contents - -- [Datasets](#datasets) -- [Data Preprocessing](#data-preprocessing) -- [Data Splits](#data-splits) -- [Model Architecture](#model-architecture) -- [Model Training](#model-training) -- [Model Evaluation](#model-evaluation) -- [Software of Unknown Provenance](#software-of-unknown-provenance) -- [Risk controls](#risk-controls) -- [Model Deployment](#model-deployment) - -## Datasets - -### Data Sources - -| Dataset | Source | Size | Format | -|---------|--------|------|--------| - -{For each discovered dataset:} | {name} | {source if found} | {actual size} | {format} | - -### Data Description - -{Based on discovered dataset classes and data files} - -**Features:** -{Extracted from data loading code} - -**Labels:** -{Extracted from data loading code} - -### Data Statistics - -{Based on actual data file analysis} - -## Data Preprocessing - -### Preprocessing Pipeline - -{Based on discovered preprocessing code:} - -1. **{Step from code}**: {Description} - - Implementation: `{file}:{function}` - - Parameters: {extracted parameters} - -### Data Transformations - -| Transformation | Purpose | Implementation | -|----------------|---------|----------------| - -{For each discovered transform:} | {transform_name} | {from docstring} | `{file}` in `{class/function}` | - -### Data Augmentation - -{Based on discovered augmentation code} - -## Data Splits - -### Split Configuration - -| Split | Ratio | Size | Method | -|-------|-------|------|--------| -| Training | {from code}% | {n} samples | {method} | -| Validation | {from code}% | {n} samples | {method} | -| Test | {from code}% | {n} samples | {method} | - -### Split Implementation - -**Location:** `{file}` in `{function_name}` - -**Method:** {random/stratified/temporal/custom} - -**Random Seed:** {if found} - -## Model Architecture - -### Architecture Overview - -{Based on discovered model class} - -**Model Type:** {CNN/RNN/Transformer/etc.} - -**Framework:** {PyTorch/TensorFlow/etc.} - -### Architecture Diagram - -{Generate or reference based on model structure} - -### Layer Specifications - -| Layer | Type | Parameters | Output Shape | -|-------|------|------------|--------------| - -{For each layer discovered in model:} | {layer_name} | {layer_type} | {params} | {shape if inferrable} | - -### Model Configuration - -**Location:** `{model_file}` in `{ClassName}` - -~~~python -{Actual model class signature and key layers} -~~~ - -### Input/Output Specifications - -- **Input:** {shape, dtype from code} -- **Output:** {shape, dtype from code} - -## Model Training - -### Training Configuration - -| Parameter | Value | Source | -|-----------|-------|--------| -| Optimizer | {actual optimizer} | `{file}` in `{function/class}` | -| Learning Rate | {actual lr} | `{file}` in `{function/class}` | -| Batch Size | {actual batch_size} | `{file}` in `{function/class}` | -| Epochs | {actual epochs} | `{file}` in `{function/class}` | -| Loss Function | {actual loss} | `{file}` in `{function/class}` | -| LR Scheduler | {if found} | `{file}` in `{function/class}` | - -### Training Script - -**Location:** `{training_script}` - -### Training Procedure - -{Based on actual training loop analysis} - -### Checkpointing - -{Based on discovered checkpoint saving code} - -## Model Evaluation - -### Evaluation Metrics - -| Metric | Implementation | Latest Value | -|--------|----------------|--------------| -| {metric_name} | `{file}` in `{function/class}` | {from results file if exists} | - -### Evaluation Script - -**Location:** `{eval_script}` - -### Benchmark Results - -{From discovered results files} - -| Dataset | Metric | Value | Date | -|---------|--------|-------|------| -| {dataset} | {metric} | {value} | {date} | - -## Software of Unknown Provenance - -See [soup.md](soup.md) for the complete list of third-party dependencies including ML frameworks and data processing libraries. - -**Verification:** Cross-reference soup.json entries against actual code usage. See the Standard Project Template above for Risk Level, Requirements, and Verification Reasoning guidelines. Note that `soup.md` is auto-generated from `soup.json`; all edits must target `soup.json`. - -## Risk controls - -### Model Risks - -| Risk | Likelihood | Impact | Mitigation | -|------|------------|--------|------------| -| Model drift | {assess} | {assess} | {from code} | -| Data leakage | {assess} | {assess} | {from code} | -| Overfitting | {assess} | {assess} | {from code} | - -### Data Risks - -{Based on data handling code analysis} - -### Operational Risks - -{Based on deployment code analysis} - -## Model Deployment - -### Deployment Architecture - -{Based on discovered deployment configs} - -### Inference Implementation - -**Location:** `{inference_file}` - -**Entry Point:** `{function/endpoint}` - -### Hardware Requirements - -| Requirement | Specification | Source | -|-------------|---------------|--------| -| GPU | {from code} | `{file}` | -| Memory | {from code/config} | `{file}` | -| Storage | {estimated} | - | - -### Serving Configuration - -{From discovered serving configs} - -### Monitoring - -{Based on discovered monitoring/logging code} -``` - -## Validation Checklist - -Before completing, verify: - -- [ ] H1 title is exactly `# Architecture Design` +- [ ] H1 is exactly `# Architecture Design` - [ ] All required H2 sections present in correct order -- [ ] Table of Contents links all work -- [ ] All documented modules exist in codebase -- [ ] All codebase modules are documented -- [ ] soup.json exists and is referenced (not duplicated) in architecture.md -- [ ] soup.json Requirements fields match actual code usage -- [ ] soup.json Risk Levels are appropriate for each package's function -- [ ] File paths in docs point to actual files -- [ ] For ML/DL: Hyperparameters match actual code -- [ ] For ML/DL: Model architecture matches implementation -- [ ] For ML/DL: Metrics match evaluation code +- [ ] TOC links resolve +- [ ] All documented modules exist in code; all code modules are documented +- [ ] `soup.json` exists; `architecture.md` references `soup.md` without duplicating +- [ ] `soup.json` Requirements match actual usage +- [ ] `soup.json` Risk Levels appropriate for each package's function +- [ ] File paths in doc point to actual files +- [ ] (ML/DL) hyperparameters / model architecture / metrics match implementation - [ ] Risk controls reflect actual security measures -## Step 9: Run Linters - -After making changes to docs/architecture.md, run the linters skill to ensure the file passes all markdown linting rules: - -```text -/co-dev:run-linters -``` - -Fix any linting errors before considering the task complete. - ## Important Rules -1. **Never fabricate information** - Only document what actually exists in the code -2. **Use stable code references** - Reference classes, methods, and functions instead of line numbers (line numbers change too quickly) -3. **Never document versions or duplicate SOUP data** - Do not include version numbers or dependency tables in architecture.md. Reference soup.md instead (which is auto-generated from soup.json). Lock files are the source of truth for versions. All SOUP edits must be made to soup.json. If versions or dependency tables are found in architecture.md, flag them for removal. -4. **Verify all paths** - Every file path must exist -5. **Never remove existing content** - Only add missing sections or fix inaccuracies -6. **Preserve custom sections** - Additional H2/H3 sections after required ones should be kept -7. **Ask before modifying** - Always show proposed changes and get user approval -8. **Flag stale documentation** - Warn if code changed significantly since last doc update -9. **Document security dependencies** - SOUP handling crypto/auth needs extra attention -10. **Keep metrics current** - If evaluation results exist, include latest values -11. **Run linters after changes** - Always run `/co-dev:run-linters` after modifying docs/architecture.md -12. **Complete ALL steps** - Never skip analysis steps. Each step may reveal issues not visible in other steps -13. **Output checkpoint log** - Include the completed checkpoint log in your final report to prove all steps were executed -14. **Never validate against world knowledge alone** - Do NOT use your training data to fact-check version numbers, release dates, library existence, or external claims. If uncertain about something, use web search to verify before flagging. Only validate things that can be cross-referenced against actual files in the repository or verified online. +1. **Never fabricate.** Only document what's in the code. +2. **Use stable references** — class/method/function names, not line numbers. +3. **Never duplicate SOUP data in architecture.md** — reference `soup.md`. Lock files are the version source of truth. All edits go to `soup.json`. +4. **Verify all paths** exist. +5. **Never remove existing valid content** — only update or add. +6. **Preserve custom sections** after the required ones. +7. **Ask before modifying.** Show proposed changes; get approval. +8. **Flag stale doc** if code changed significantly since last doc update. +9. **Document security deps with extra care** (crypto, auth). +10. **Keep metrics current** if results files exist. +11. **Run linters after changes.** +12. **Complete every phase** — skipping reveals nothing; cross-referencing reveals everything. +13. **Never validate against world knowledge alone.** Don't fact-check version numbers or external claims from training data — use web search or repo files. diff --git a/skills/review-user-guide/SKILL.md b/skills/review-user-guide/SKILL.md index 87c2673..f65244d 100644 --- a/skills/review-user-guide/SKILL.md +++ b/skills/review-user-guide/SKILL.md @@ -6,1316 +6,211 @@ allowed-tools: Bash(gh:*), Bash(git:*), Bash(awk:*), Bash(basename:*), Bash(cat: # Review User Guide Documentation -Review the `docs/user-guide.md` file in a repository and create or update it with comprehensive user documentation. This skill analyzes the codebase to document the product from an end-user perspective, covering UI, features, workflows, and usage instructions. Works for all project types and languages. +Review or create `docs/user-guide.md` from an end-user perspective. Works for all project types and languages. -## CRITICAL: Mandatory Analysis Tracking +## Phase Tracking -**You MUST maintain an analysis checklist throughout execution.** At each step, record what was found. This ensures consistent, reproducible results. +Use `TaskCreate` to track progress: one task per phase below. Mark `in_progress` when starting, `completed` when results are recorded. Do NOT include the task list in the final output — it's internal tracking. -**Before starting, create this tracking structure and update it as you progress:** +**Required phases:** -```text -=== ANALYSIS CHECKPOINT LOG === -[ ] Step 1: Repository Information - - organization: (pending) - - repository: (pending) - - description: (pending) - - has_user_guide: (pending) - -[ ] Step 2: Product Type Detection - - product_type: (pending) - web_app/cli_tool/api/library/mobile_app/desktop_app/hybrid - - framework: (pending) - - detection_evidence: (pending) - -[ ] Step 3-7: Deep Analysis (complete applicable sections) - For Web Applications: - [ ] 3.1 Routes/Pages - routes_found: (pending), count: (pending) - [ ] 3.2 Models/Data - models_found: (pending), count: (pending) - [ ] 3.3 Forms - forms_found: (pending), count: (pending) - [ ] 3.4 Help Text/UI Guidance - help_texts_found: (pending), count: (pending) - [ ] 3.5 Conditional Field Visibility - conditionals_found: (pending), triggers_mapped: (pending) - [ ] 3.6 Form Section Structure - sections_found: (pending), count: (pending) - [ ] 3.7 UI Components - components_found: (pending), count: (pending) - [ ] 3.8 Navigation - nav_elements: (pending) - - For CLI Tools: - [ ] 4.1 Commands - commands_found: (pending), count: (pending) - [ ] 4.2 Help Output - help_extracted: (pending) - [ ] 4.3 Options/Flags - options_found: (pending), count: (pending) - - For APIs: - [ ] 5.1 Endpoints - endpoints_found: (pending), count: (pending) - [ ] 5.2 Schemas - schemas_found: (pending) - [ ] 5.3 Authentication - auth_method: (pending) - [ ] 5.4 OpenAPI - openapi_exists: (pending) - - For Libraries: - [ ] 6.1 Public API - exports_found: (pending), count: (pending) - [ ] 6.2 Docstrings - documented: (pending) - [ ] 6.3 Examples - examples_found: (pending) - - For Mobile Apps: - [ ] 7.1 Screens - screens_found: (pending), count: (pending) - [ ] 7.2 Navigation - nav_structure: (pending) - -[ ] Step 8: Document Structure Validation (if user-guide.md exists) - - h1_title_correct: (pending) - - required_sections_present: (pending) - - toc_links_valid: (pending) - -[ ] Step 9: Report Generated - - all_checks_completed: (pending) - - features_in_code: (pending) - - features_documented: (pending) - - undocumented_features: (pending) - - issues_found: (pending) -=== END CHECKPOINT LOG === -``` - -**COMPLETION REQUIREMENT:** Before generating the final report, you MUST verify that ALL applicable checkpoints show actual values (not "pending"). If any checkpoint is still "pending", go back and complete that analysis step. - -**EVIDENCE REQUIREMENT:** For every check, you MUST record: - -1. **What the user guide claims** - the exact feature, command, or workflow described -2. **What the code shows** - the actual evidence (routes, controllers, CLI definitions, API endpoints) -3. **Comparison result** - MATCH, MISMATCH, or MISSING with specific details - -A bare "PASS" without evidence is not acceptable. If you cannot provide evidence, the check is incomplete. - -**DO NOT SKIP STEPS.** Even if an earlier check seems to suggest no issues, you MUST complete ALL steps. Issues are often only revealed when cross-referencing multiple sources. - -## User Guide Document Structure - -The user guide must have the following H1 title and H2 sections: - -```text -# User Guide - -## Table of Contents -## Getting Started -## Features -## {Project-Type-Specific Sections} -## Configuration -## Troubleshooting -## FAQ -``` - -### Project-Type-Specific Sections - -**Web Applications (Rails, Django, Express, Laravel, etc.):** - -```text -## Navigation -## Pages -## Step-by-Step Guides -## Workflows -``` - -**CLI Tools:** - -```text -## Commands -## Options & Flags -## Examples -``` - -**REST APIs:** - -```text -## Authentication -## Endpoints -## Request/Response Examples -## Error Handling -``` - -**Libraries/SDKs:** - -```text -## Installation -## Quick Start -## API Reference -## Examples -``` - -**Mobile Applications:** - -```text -## Screens -## Navigation -## Features -## Offline Mode -``` - -**Desktop Applications:** - -```text -## Windows & Views -## Menus & Toolbars -## Keyboard Shortcuts -## Features -``` - -## MCP Tools with Fallbacks - -This skill uses MCP tools when available and falls back gracefully if they are unavailable or return errors. - -### GitHub Access - -**Prefer MCP tools** (`mcp__github__*`) when available. If MCP tools are not available (tool not found errors), **fall back to the `gh` CLI**. - -| Operation | MCP Tool | CLI Fallback | -| --- | --- | --- | -| Get repo metadata | `mcp__github__get_file_contents` (path: `/`) for top-level structure; for richer metadata use the CLI fallback | `gh repo view --json owner,name,description` | -| Get file contents | `mcp__github__get_file_contents` | `cat ` | -| Get repo owner/name | Parse from `git remote get-url origin` | `gh repo view --json owner,name` | - -### Library Documentation (Context7) - -Use `mcp__context7__resolve-library-id` then `mcp__context7__query-docs` to look up current documentation for libraries and frameworks found in the project. If Context7 is unavailable or returns errors (quota exceeded, timeouts), **fall back to `WebSearch`** and then `mcp__fetch__fetch` to retrieve documentation from official sources. Do not let Context7 failures block the review. - -## Step 1: Gather Repository Information - -```bash -# Get organization and repository name (fallback if MCP tools unavailable) -gh repo view --json owner,name,description -``` - -```bash -# Check if docs/user-guide.md exists -ls -la docs/user-guide.md 2>/dev/null || echo "No docs/user-guide.md found" -``` - -```bash -# Check if docs directory exists -ls -la docs/ 2>/dev/null || echo "No docs directory found" -``` - -Store these values: - -- `organization`: The owner/organization name -- `repository`: The repository name -- `description`: Repository description -- `has_user_guide`: true/false - -## Step 2: Detect Product Type - -Analyze the repository to determine what type of product this is. - -### Web Application Detection - -**Ruby on Rails:** - -```bash -# Check for Rails -ls -la Gemfile config/routes.rb app/controllers/ app/views/ 2>/dev/null -cat Gemfile 2>/dev/null | grep -E "rails|gem 'rails'" -``` - -**Django:** - -```bash -# Check for Django -ls -la manage.py */urls.py */views.py */templates/ 2>/dev/null -cat requirements.txt pyproject.toml 2>/dev/null | grep -iE "django" -``` - -**Express/Node.js Web:** - -```bash -# Check for Express/web frameworks -cat package.json 2>/dev/null | jq -r '.dependencies | keys[]' | grep -iE "express|koa|fastify|hapi|nest" -ls -la views/ templates/ public/ src/pages/ src/routes/ 2>/dev/null -``` - -**Laravel:** - -```bash -# Check for Laravel -ls -la artisan routes/web.php app/Http/Controllers/ resources/views/ 2>/dev/null -``` - -**Next.js/React:** - -```bash -# Check for Next.js/React apps -ls -la pages/ app/ src/pages/ src/app/ components/ 2>/dev/null -cat package.json 2>/dev/null | jq -r '.dependencies | keys[]' | grep -iE "next|react|vue|angular|svelte" -``` - -### CLI Tool Detection - -```bash -# Check for CLI patterns -cat package.json 2>/dev/null | jq -r '.bin // empty' -grep -rl "argparse\|click\|typer\|commander\|yargs\|clap\|cobra" --include="*.py" --include="*.js" --include="*.ts" --include="*.rs" --include="*.go" . 2>/dev/null | - head -5 -ls -la cmd/ cli/ bin/ 2>/dev/null -``` - -### API Detection - -```bash -# Check for API patterns -grep -rl "@app.route\|@router\|@Controller\|@RestController\|@GetMapping\|@PostMapping\|router.get\|router.post" --include="*.py" --include="*.js" --include="*.ts" --include="*.java" --include="*.rb" . 2>/dev/null | - head -10 -ls -la api/ routes/ endpoints/ 2>/dev/null -cat package.json 2>/dev/null | jq -r '.dependencies | keys[]' | grep -iE "fastapi|flask|express|gin|echo|fiber" -``` - -### Library Detection - -```bash -# Check if it's a library (no app entry point, exports functions/classes) -cat package.json 2>/dev/null | jq -r '.main, .exports, .types' | grep -v null -cat setup.py pyproject.toml 2>/dev/null | grep -E "name|packages" -ls -la src/lib/ lib/ 2>/dev/null -``` - -### Mobile App Detection - -```bash -# Check for mobile frameworks -ls -la android/ ios/ lib/main.dart pubspec.yaml 2>/dev/null -cat package.json 2>/dev/null | jq -r '.dependencies | keys[]' | grep -iE "react-native|expo|ionic|capacitor" -``` - -### Classification - -Based on detection, classify as one of: - -- `web_app` - Web application with UI -- `cli_tool` - Command-line interface tool -- `api` - REST/GraphQL API service -- `library` - Reusable library/SDK -- `mobile_app` - Mobile application -- `desktop_app` - Desktop application -- `hybrid` - Multiple types (e.g., CLI + library) - -## Step 3: Deep Analysis - Web Applications - -### 3.1 Discover Routes/Pages - -**Rails:** - -```bash -# Get all routes -cat config/routes.rb -# Find controllers -ls -la app/controllers/*.rb -# Find views -find app/views -name "*.erb" -o -name "*.haml" -o -name "*.slim" 2>/dev/null | head -30 -``` - -**Django:** - -```bash -# Get URL patterns -cat */urls.py -# Find views -find . -name "views.py" -not -path "*/venv/*" 2>/dev/null | xargs cat 2>/dev/null | head -100 -# Find templates -find . -name "*.html" -path "*/templates/*" 2>/dev/null | head -30 -``` - -**Express/Node:** - -```bash -# Find route definitions -grep -rn "router\.\(get\|post\|put\|delete\|patch\)\|app\.\(get\|post\|put\|delete\|patch\)" --include="*.js" --include="*.ts" . 2>/dev/null | - grep -v node_modules | head -30 -``` - -**Next.js/React:** - -```bash -# Find pages -find pages app src/pages src/app -name "*.tsx" -o -name "*.jsx" -o -name "*.js" 2>/dev/null | grep -v _app | - grep -v _document | head -30 -``` - -### 3.2 Discover Models/Data Structures - -**Rails:** - -```bash -# Get models -ls -la app/models/*.rb -# Get schema -cat db/schema.rb 2>/dev/null | head -100 -``` - -**Django:** - -```bash -# Get models -find . -name "models.py" -not -path "*/venv/*" 2>/dev/null | xargs cat 2>/dev/null | grep -E "class.*Model" | head -30 -``` - -### 3.3 Discover Forms - -**Rails:** - -```bash -# Find form templates -grep -rl "form_for\|form_with\|form_tag" app/views/ 2>/dev/null | head -20 -``` - -**Django:** - -```bash -# Find forms -find . -name "forms.py" -not -path "*/venv/*" 2>/dev/null | xargs cat 2>/dev/null | head -50 -``` - -**React/Vue:** - -```bash -# Find form components -grep -rl "/dev/null | - grep -v node_modules | head -20 -``` - -### 3.4 Discover Help Text and UI Guidance - -Extract help text, tooltips, placeholders, and section descriptions that guide users when filling forms. This content should be reused in the user guide. - -**Rails:** - -```bash -# Find help_text parameters in form views -grep -rn "help_text:" app/views/ app/helpers/ 2>/dev/null | head -30 -# Find form section descriptions and list_items -grep -rn "form_section\|description:\|list_items:" app/views/ app/helpers/ 2>/dev/null | head -30 -# Find placeholder text -grep -rn "placeholder:" app/views/ 2>/dev/null | head -20 -``` - -**Django:** - -```bash -# Find help_text in forms and models -grep -rn "help_text=" --include="*.py" . 2>/dev/null | grep -v venv | grep -v migrations | head -30 -# Find widget placeholders -grep -rn "placeholder" --include="*.py" --include="*.html" . 2>/dev/null | grep -v venv | head -20 -``` - -**React/Vue/Angular:** - -```bash -# Find help/guidance props in components -grep -rn "helperText\|tooltip\|placeholder\|description\|hint\|aria-describedby" --include="*.tsx" --include="*.jsx" --include="*.vue" . 2>/dev/null | - grep -v node_modules | head -30 -``` - -**Generic (any framework):** - -```bash -# Find common help text patterns in templates -grep -rn "help-text\|hint\|tooltip\|aria-describedby\|\.help\b\|\.hint\b" --include="*.html" --include="*.erb" --include="*.hbs" --include="*.pug" . 2>/dev/null | head -20 -``` - -### 3.5 Discover Conditional Field Visibility - -Find fields that show or hide based on user selections. This is critical for guiding users through forms. - -**Rails (inline JS / Stimulus):** - -```bash -# Find CSS wrapper classes used for toggling visibility (e.g., hide-*, show-*) -grep -rn "wrapper_class.*hide-\|wrapper_class.*show-" app/views/ 2>/dev/null | head -20 -# Find JavaScript toggle functions in form views -grep -rn "style.display\|\.hidden\|\.visible\|toggle\|addEventListener.*change" app/views/ app/javascript/ 2>/dev/null | head -30 -# Find Stimulus show/hide actions -grep -rn "data-action.*show\|data-action.*hide\|data-action.*toggle\|data-.*target" app/views/ 2>/dev/null | head -20 -``` - -**React/Vue/Angular:** - -```bash -# Find conditional rendering patterns -grep -rn "v-if=\|v-show=\|x-show=\|x-if=\|{.*&&.*<\|? <\|condition\|isVisible\|showField\|hidden={" --include="*.tsx" --include="*.jsx" --include="*.vue" . 2>/dev/null | - grep -v node_modules | head -30 -``` - -**Django / Generic JS:** - -```bash -# Find JavaScript show/hide in templates -grep -rn "\.show()\|\.hide()\|\.toggle()\|display.*none\|display.*block\|classList.*hidden" --include="*.html" --include="*.js" . 2>/dev/null | - grep -v node_modules | grep -v venv | head -30 -``` - -**Map the dependency chain:** For each conditional field found, record: - -1. **Trigger field** — which field the user interacts with (e.g., a checkbox or dropdown) -2. **Trigger condition** — what value or state triggers the change (e.g., "checked", "value is High") -3. **Affected fields** — which fields appear or disappear -4. **Visibility logic** — show when condition is true, or hide when condition is true (inverse logic) - -### 3.6 Discover Form Section Structure - -Find how forms group fields into logical sections with headers and descriptions. - -**Rails:** - -```bash -# Find form_section helpers or fieldset groupings -grep -rn "form_section\|/dev/null | head -20 -# Extract section titles and descriptions -grep -A3 "form_section" app/views/ 2>/dev/null | head -40 -``` - -**React/Vue:** - -```bash -# Find section/fieldset components -grep -rn "/dev/null | - grep -v node_modules | head -20 -``` - -**Django:** - -```bash -# Find fieldset definitions in admin or forms -grep -rn "fieldsets\|/dev/null | grep -v venv | head -20 -``` - -### 3.7 Extract UI Components - -```bash -# Find component files -find . -name "*.tsx" -o -name "*.jsx" -o -name "*.vue" -o -name "*.svelte" 2>/dev/null | grep -v node_modules | - grep -iE "component|page|view|screen" | head -30 -``` - -### 3.8 Discover Navigation - -```bash -# Find navigation components -grep -rl "nav\|menu\|sidebar\|header\|footer\|navbar\|drawer" --include="*.tsx" --include="*.jsx" --include="*.vue" --include="*.erb" --include="*.html" . 2>/dev/null | - grep -v node_modules | head -15 -``` - -## Step 4: Deep Analysis - CLI Tools - -### 4.1 Discover Commands - -**Python (Click/Typer/Argparse):** - -```bash -# Find CLI entry points -grep -rn "@click.command\|@click.group\|@app.command\|add_parser\|add_subparser" --include="*.py" . 2>/dev/null | - grep -v venv | head -30 -``` - -**Node.js (Commander/Yargs):** - -```bash -# Find command definitions -grep -rn "\.command(\|\.option(\|yargs\." --include="*.js" --include="*.ts" . 2>/dev/null | grep -v node_modules | - head -30 -``` - -**Go (Cobra):** - -```bash -# Find cobra commands -grep -rn "cobra.Command\|cmd.AddCommand" --include="*.go" . 2>/dev/null | head -30 -``` - -**Rust (Clap):** - -```bash -# Find clap definitions -grep -rn "#\[command\|#\[arg\|Command::new" --include="*.rs" . 2>/dev/null | head -30 -``` - -### 4.2 Extract Command Help - -```bash -# Try to get help output -./bin/* --help 2>/dev/null || npm run --help 2>/dev/null || python -m * --help 2>/dev/null || echo "Cannot extract help" -``` - -### 4.3 Discover Options/Flags - -```bash -# Find option definitions -grep -rn "\.option\|\.flag\|add_argument\|--\w" --include="*.py" --include="*.js" --include="*.ts" --include="*.go" --include="*.rs" . 2>/dev/null | - grep -v node_modules | grep -v venv | head -40 -``` - -## Step 5: Deep Analysis - APIs - -### 5.1 Discover Endpoints - -**FastAPI/Flask:** - -```bash -# Find API routes -grep -rn "@app\.\(get\|post\|put\|delete\|patch\)\|@router\.\(get\|post\|put\|delete\|patch\)" --include="*.py" . 2>/dev/null | - grep -v venv | head -40 -``` - -**Express:** - -```bash -# Find Express routes -grep -rn "router\.\(get\|post\|put\|delete\|patch\)\|app\.\(get\|post\|put\|delete\|patch\)" --include="*.js" --include="*.ts" . 2>/dev/null | - grep -v node_modules | head -40 -``` - -**Rails API:** - -```bash -# Find API controllers -ls -la app/controllers/api/ 2>/dev/null -cat config/routes.rb 2>/dev/null | grep -E "namespace :api|resources" -``` - -### 5.2 Extract Request/Response Schemas - -```bash -# Find schema definitions -grep -rl "Schema\|Serializer\|DTO\|interface.*Request\|interface.*Response\|type.*Request\|type.*Response" --include="*.py" --include="*.ts" --include="*.rb" . 2>/dev/null | - grep -v node_modules | grep -v venv | head -20 -``` - -### 5.3 Discover Authentication - -```bash -# Find auth patterns -grep -rn "auth\|jwt\|bearer\|api.key\|oauth\|token" --include="*.py" --include="*.js" --include="*.ts" --include="*.rb" . 2>/dev/null | - grep -v node_modules | grep -v venv | grep -v test | head -20 -``` - -### 5.4 Check for OpenAPI/Swagger - -```bash -# Find OpenAPI specs -ls -la openapi.yaml openapi.json swagger.yaml swagger.json api-spec.* 2>/dev/null -find . -name "openapi*" -o -name "swagger*" 2>/dev/null | head -5 -``` - -## Step 6: Deep Analysis - Libraries - -### 6.1 Discover Public API - -**Python:** - -```bash -# Find exports in __init__.py -cat */__init__.py src/*/__init__.py 2>/dev/null | grep -E "^from|^import|__all__" -# Find public classes/functions -grep -rn "^class \|^def \|^async def " --include="*.py" . 2>/dev/null | grep -v venv | grep -v test | grep -v "_" | - head -40 -``` - -**Node.js/TypeScript:** - -```bash -# Find exports -grep -rn "^export \|module\.exports" --include="*.ts" --include="*.js" . 2>/dev/null | grep -v node_modules | - grep -v test | head -40 -``` - -**Rust:** - -```bash -# Find public items -grep -rn "^pub fn\|^pub struct\|^pub enum\|^pub trait" --include="*.rs" . 2>/dev/null | head -40 -``` - -**Go:** - -```bash -# Find exported functions (capitalized) -grep -rn "^func [A-Z]\|^type [A-Z]" --include="*.go" . 2>/dev/null | grep -v vendor | head -40 -``` - -### 6.2 Extract Docstrings/Comments - -```bash -# Find documented functions -grep -B5 "^def \|^class \|^func \|^pub fn" --include="*.py" --include="*.go" --include="*.rs" . 2>/dev/null | - grep -E '"""|///|//|#' | head -30 -``` - -### 6.3 Find Examples - -```bash -# Find example files -find . -name "example*" -o -name "*example*" -o -name "demo*" 2>/dev/null | grep -v node_modules | head -20 -ls -la examples/ example/ demo/ 2>/dev/null -``` - -## Step 7: Deep Analysis - Mobile Apps - -### 7.1 Discover Screens (Flutter) - -```bash -# Find Flutter screens/pages -find lib -name "*screen*.dart" -o -name "*page*.dart" -o -name "*view*.dart" 2>/dev/null | head -30 -grep -rl "Scaffold\|MaterialApp\|CupertinoPageScaffold" --include="*.dart" lib/ 2>/dev/null | head -20 -``` - -### 7.2 Discover Screens (React Native) - -```bash -# Find React Native screens -find . -name "*Screen*" -o -name "*screen*" 2>/dev/null | grep -v node_modules | head -30 -grep -rl "createStackNavigator\|createBottomTabNavigator\|Screen" --include="*.tsx" --include="*.jsx" . 2>/dev/null | - grep -v node_modules | head -20 -``` - -### 7.3 Extract Navigation Structure - -```bash -# Find navigation definitions -grep -rn "Navigator\|createNavigator\|navigation\|router" --include="*.dart" --include="*.tsx" --include="*.jsx" . 2>/dev/null | - grep -v node_modules | head -30 -``` - -## Step 8: Validate Existing User Guide Document - -If `docs/user-guide.md` exists, validate its structure. - -### Check H1 Title - -```bash -head -5 docs/user-guide.md -grep "^# " docs/user-guide.md | head -1 -``` - -**Expected:** `# User Guide` - -Note: If the file was previously named `docs/manual.md`, rename it to `docs/user-guide.md`. - -### Check Required H2 Sections - -```bash -grep "^## " docs/user-guide.md -``` - -**Must include (in order):** - -```text -## Table of Contents -## Getting Started -## Features -{Project-type-specific sections} -## Configuration -## Troubleshooting -## FAQ -``` - -### Cross-Reference with Code (MANDATORY - do not skip) - -**You MUST create a two-column comparison table:** - -| Documented Feature | Code Evidence | -|---------------------------|---------------------------------------------| -| {feature from user guide} | {file:function where it exists, or MISSING} | - -For EACH documented feature/page/command: - -1. Verify it exists in the codebase - record the specific file and function -2. Check if the description matches the implementation - record mismatches -3. Flag outdated screenshots or examples - -For EACH feature found in code (from Steps 3-7): - -1. Check if it is documented in the user guide -2. Record undocumented features with their code location - -For web applications, ALSO verify: +1. Repository info gathered +2. Product type detected +3. Deep analysis (relevant sub-phases per product type) +4. Existing user-guide validated (if present) +5. Report generated +6. User guide written/updated (if approved) -1. **Step-by-step guide format** - Forms are documented as guided walkthroughs, not field tables -2. **Help text accuracy** - UI help text in the user guide matches the actual help_text in form views -3. **Conditional visibility** - All fields with conditional show/hide behavior are documented with their trigger conditions +**Evidence rule:** Every check must record (a) what the guide claims, (b) what the code shows, (c) MATCH / MISMATCH / MISSING. A bare "PASS" without code evidence is invalid. -## Step 9: Generate Comprehensive Report - -**MANDATORY PRE-REPORT VERIFICATION:** - -Before generating the report, you MUST: - -1. Review your checkpoint log from the start of analysis -2. Verify ALL applicable checkpoints have actual values (not "pending") -3. If ANY checkpoint is still pending, STOP and complete that step first -4. Cross-reference findings: features found in code analysis MUST appear in the report - -**If you skipped any step, the review is incomplete and results will be inconsistent.** +## User Guide Document Structure -### Report Format +Required H1 + H2 sections: ```text -## User Guide Review Report - -### Analysis Checkpoint Log - -{Include your completed checkpoint log here - ALL values must be filled in, none should say "pending"} - -### Repository Info -- **Organization:** {org} -- **Repository:** {repo} -- **Product Type:** {web_app/cli/api/library/mobile/desktop} -- **Document Status:** {exists/missing} - -### Structure Checks -- [ ] H1 title "# User Guide": {PASS/FAIL} -- [ ] Required H2 sections present: {PASS/FAIL} -- [ ] Table of Contents links valid: {PASS/FAIL} - -### Content Accuracy Checks -- [ ] Getting Started matches actual setup: {PASS/FAIL} -- [ ] All features documented: {PASS/FAIL} -- [ ] {Project-specific checks} - -### Guide Style Checks (Web Applications) -- [ ] Forms documented as guided walkthroughs (not field tables): {PASS/FAIL} -- [ ] UI help text incorporated from form views: {PASS/FAIL} -- [ ] Conditional field visibility documented: {PASS/FAIL} -- [ ] Form section structure matches UI layout: {PASS/FAIL} - -### Coverage Analysis -- **Documented features:** {n} -- **Features in code:** {n} -- **Undocumented features:** {list} -- **Documented but removed:** {list} -- **Conditional fields documented:** {n} of {total} -- **Help texts incorporated:** {n} of {total} - -### Proposed Changes -{Show exact changes needed} -``` - -## Step 10: Create or Update User Guide Document - -### Web Application Template - -```markdown # User Guide ## Table of Contents - -- [Getting Started](#getting-started) -- [Features](#features) -- [Navigation](#navigation) -- [Pages](#pages) -- [Step-by-Step Guides](#step-by-step-guides) -- [Workflows](#workflows) -- [Configuration](#configuration) -- [Troubleshooting](#troubleshooting) -- [FAQ](#faq) - ## Getting Started - -### Prerequisites - -{Based on detected requirements} - -### Accessing the Application - -{URL or deployment info} - -### First-Time Setup - -1. {Step based on onboarding flow} -2. {Step} -3. {Step} - -### Logging In - -{Based on discovered auth mechanism} - ## Features - -### Feature Overview - -| Feature | Description | Location | -|---------|-------------|----------| - -{For each discovered feature:} | {name} | {description} | {page/section} | - -### Key Capabilities - -{Based on code analysis} - -## Navigation - -### Main Menu - -{Based on discovered navigation components} - -| Menu Item | Description | Shortcut | -|-----------|-------------|----------| -| {item} | {description} | {if any} | - -### Sidebar/Secondary Navigation - -{If applicable} - -## Pages - -{For each discovered page/view:} - -### {Page Name} - -**URL:** `{route}` - -**Purpose:** {Inferred from controller/view} - -{Describe what the user sees when they first land on this page — the main content area, key data displayed, and the overall layout.} - -#### What You Can Do Here - -{For each action, describe it from the user's perspective:} - -- **{Action}** — {Description of what happens when the user performs this action, where it leads, and any prerequisites} - -{If the page has a status workflow or lifecycle, describe it narratively:} - -#### {Entity} Status Workflow - -{Entity names} progress through the following statuses: - -1. **{Status}** — {What this status means and who can set it} -2. **{Status}** — {Description} - -## Step-by-Step Guides - -{IMPORTANT: Write this section as a guided walkthrough, NOT as field tables. Use the help text discovered from the UI code (Step 3.4) and the conditional visibility logic (Step 3.5) to write instructions that feel like a knowledgeable colleague walking the user through each form.} - -{For each discovered form:} - -### {Action Verb + Entity} (e.g., "Creating a New Service", "Filing an Incident Report") - -Navigate to **{page name}** in the sidebar and click **{action button label}**. - -{For each form section discovered via form_section, fieldset, or logical grouping:} - -#### {Section Title} - -{Include the section description/guidance text from the UI code. If the UI has list_items explaining enum options, reproduce them here as a bulleted guide.} - -{For each field in the section, use numbered steps:} - -1. **{Field Label}** — {Use the actual help_text from the UI code if available. Otherwise write a clear plain-language explanation of what to enter and why. For checkboxes, explain what checking it means.} - {If the field is a select/dropdown with enum values, explain the options:} - - **{Option 1}:** {Description — use list_items text from UI if available} - - **{Option 2}:** {Description} - -{When a field triggers conditional visibility, document it inline:} - -2. **{Trigger Field Label}** — {Help text / explanation of the field.} - - > When you {check this box / select "Value"}, the following fields appear: - - - **{Conditional Field 1}** — {Help text / explanation} - - **{Conditional Field 2}** — {Help text / explanation} - -{When a field triggers inverse visibility (hides other fields):} - -3. **{Trigger Field Label}** — {Help text / explanation.} - - > When you {check this box / select "Value"}, the {section name} fields below are no longer needed and will be hidden. - -{End of form sections} - -#### Saving and Next Steps - -{What happens when the user clicks Save — where they are redirected, what status the record starts in, and what they should do next.} - -## Workflows - -{For each major user workflow:} - -### {Workflow Name} - -**Goal:** {what user accomplishes} - -**Steps:** - -1. **{Step 1}** - - Navigate to {page} - - {Action} - -2. **{Step 2}** - - {Action} - - Expected result: {result} - -3. **{Step 3}** - - {Action} - - Completion: {final state} - +## {Project-Type-Specific Sections} ## Configuration - -### User Settings - -{Based on discovered settings pages/models} - -| Setting | Description | Default | -|---------|-------------|---------| -| {setting} | {description} | {default} | - -### Environment Variables - -{If user-configurable} - -| Variable | Description | Required | -|----------|-------------|----------| -| {var} | {description} | {yes/no} | - ## Troubleshooting - -### Common Issues - -{Based on error handling in code} - -#### {Issue 1} - -**Symptom:** {what user sees} - -**Cause:** {why it happens} - -**Solution:** {how to fix} - -#### {Issue 2} - -{Repeat} - -### Error Messages - -| Error | Meaning | Solution | -|-------|---------|----------| - -{From discovered error messages:} | {error} | {meaning} | {fix} | - ## FAQ - -### General Questions - -**Q: {Question based on common patterns}** - -A: {Answer} - -**Q: {Question}** - -A: {Answer} - -### Technical Questions - -**Q: {Question}** - -A: {Answer} ``` -### CLI Tool Template - -```markdown -# User Guide - -## Table of Contents - -- [Getting Started](#getting-started) -- [Features](#features) -- [Commands](#commands) -- [Options & Flags](#options--flags) -- [Examples](#examples) -- [Configuration](#configuration) -- [Troubleshooting](#troubleshooting) -- [FAQ](#faq) - -## Getting Started - -### Installation - -{Based on detected package manager} - -\`\`\`bash {install command} \`\`\` - -### Quick Start - -\`\`\`bash {simplest usage example} \`\`\` - -### Verifying Installation - -\`\`\`bash {command} --version \`\`\` - -## Features - -{Overview of what the CLI can do} - -## Commands - -{For each discovered command:} - -### `{command name}` - -{Description from docstring or inferred} - -**Usage:** - -\`\`\`bash {command} [options] -\`\`\` - -**Arguments:** - -| Argument | Required | Description | -|----------|----------|-------------| -| {arg} | {yes/no} | {description} | - -**Example:** - -\`\`\`bash {command} {example args} \`\`\` - -## Options & Flags - -### Global Options - -| Option | Short | Description | Default | -|--------|-------|-------------|---------| - -{For each global option:} | `--{option}` | `-{short}` | {description} | {default} | - -### Command-Specific Options - -{Grouped by command} - -## Examples - -### Basic Usage - -\`\`\`bash - -# {Description of what this does} - -{command} \`\`\` - -### Common Workflows - -#### {Workflow 1} - -\`\`\`bash - -# Step 1: {description} - -{command} - -# Step 2: {description} - -{command} \`\`\` - -### Advanced Usage - -\`\`\`bash - -# {Advanced example} - -{command with complex options} \`\`\` - -## Configuration - -### Configuration File - -**Location:** `{config file path}` - -**Format:** {JSON/YAML/TOML} - -\`\`\`{format} {example config} \`\`\` - -### Configuration Options - -| Option | Type | Description | Default | -|--------|------|-------------|---------| -| {option} | {type} | {description} | {default} | - -### Environment Variables - -| Variable | Description | Default | -|----------|-------------|---------| -| {var} | {description} | {default} | - -## Troubleshooting - -### Common Errors - -#### {Error message} - -**Cause:** {why it happens} - -**Solution:** - -\`\`\`bash {fix command} \`\`\` - -### Debug Mode - -\`\`\`bash {command} --verbose +### Project-Type-Specific Sections -# or +| Type | Required H2 sections | +| ---- | -------------------- | +| Web App | Navigation, Pages, Step-by-Step Guides, Workflows | +| CLI Tool | Commands, Options & Flags, Examples | +| REST API | Authentication, Endpoints, Request/Response Examples, Error Handling | +| Library / SDK | Installation, Quick Start, API Reference, Examples | +| Mobile App | Screens, Navigation, Features, Offline Mode | +| Desktop App | Windows & Views, Menus & Toolbars, Keyboard Shortcuts, Features | -{command} --debug \`\`\` +## MCP Tools with Fallbacks -## FAQ +Prefer MCP tools when available; fall back to CLI on errors. Don't let MCP failures block the review. -**Q: How do I {common task}?** +| Operation | Preferred | Fallback | +| --- | --- | --- | +| Get file contents | `mcp__github__get_file_contents` | `cat ` | +| Repo metadata | `gh repo view --json owner,name,description` | n/a | +| Library docs | `mcp__context7__*` | `WebSearch` → `mcp__fetch__fetch` | -A: {Answer with example} +## Phase 1: Repository Info -\`\`\`bash {example} \`\`\` +```bash +gh repo view --json owner,name,description +ls -la docs/user-guide.md docs/manual.md 2>/dev/null ``` -### API Template - -```markdown -# User Guide - -## Table of Contents - -- [Getting Started](#getting-started) -- [Features](#features) -- [Authentication](#authentication) -- [Endpoints](#endpoints) -- [Request/Response Examples](#requestresponse-examples) -- [Error Handling](#error-handling) -- [Configuration](#configuration) -- [Troubleshooting](#troubleshooting) -- [FAQ](#faq) - -## Getting Started - -### Base URL - -\`\`\`text {base URL or how to determine it} \`\`\` - -### Quick Start - -\`\`\`bash - -# Get your API key - -# Then make your first request: - -curl -X GET "{base_url}/endpoint" \\ -H "Authorization: Bearer YOUR_API_KEY" -\`\`\` - -## Features - -{Overview of API capabilities} - -## Authentication - -### Authentication Method +Record: organization, repository, description, has_user_guide. If `docs/manual.md` exists but `docs/user-guide.md` does not, plan to rename. -{Based on discovered auth: API Key / OAuth / JWT / etc.} +## Phase 2: Detect Product Type -### Getting Credentials +Classify as one of: `web_app`, `cli_tool`, `api`, `library`, `mobile_app`, `desktop_app`, `hybrid`. -1. {Step to get credentials} -2. {Step} +**Detection signals (use the appropriate one — don't run all):** -### Using Credentials +- **Rails** — `Gemfile` containing `rails`; `config/routes.rb`; `app/controllers/`, `app/views/` +- **Django** — `manage.py`; `*/urls.py`; `requirements.txt`/`pyproject.toml` containing `django` +- **Express / Next.js / React** — `package.json` with `express`/`koa`/`fastify`/`nest`/`next`/`react`/`vue`/`angular`/`svelte`; `pages/`, `app/`, `src/pages/`, `src/app/` +- **Laravel** — `artisan`, `routes/web.php`, `app/Http/Controllers/` +- **CLI tool** — `package.json` with `bin`; usage of `argparse`/`click`/`typer`/`commander`/`yargs`/`clap`/`cobra`; `cmd/` or `cli/` directory +- **REST API** — route decorators (`@app.route`, `@router`, `@RestController`, `router.get`); `api/` or `routes/`; FastAPI/Flask/Express/Gin/Echo/Fiber +- **Library / SDK** — `package.json` `main`/`exports`; `setup.py` / `pyproject.toml` with `name`/`packages`; no app entry point +- **Mobile** — `android/`, `ios/`, `pubspec.yaml`, `lib/main.dart`; `package.json` with `react-native`/`expo`/`ionic`/`capacitor` +- **Hybrid** — multiple types match (e.g. CLI exposing both a binary and a library) -**Header Authentication:** +## Phase 3: Deep Analysis -\`\`\`bash curl -H "Authorization: Bearer {token}" {url} \`\`\` +Run only the sub-phases that match the detected product type. For each, record exact counts and file:line locations as evidence. -**Query Parameter:** +### 3.1 Web Applications -\`\`\`bash curl "{url}?api_key={key}" -\`\`\` +**Routes / pages** — Rails: `cat config/routes.rb`, list `app/controllers/`. Django: `cat */urls.py`, find `views.py`. Express: `grep -rn "router\.\(get\|post\|put\|delete\|patch\)" --include="*.js" --include="*.ts"`. Next.js / React: list `pages/`, `app/`, `src/pages/`, `src/app/` for `*.tsx`/`*.jsx`. -## Endpoints +**Models / data** — Rails: `app/models/`, `db/schema.rb`. Django: `find . -name "models.py"`. Generic: any `Schema`/`type Entity` declarations. -{For each discovered endpoint:} +**Forms** — Rails: `grep -rl "form_for\|form_with\|form_tag" app/views/`. Django: `find . -name "forms.py"`. React/Vue: `grep -rl "`, ``, ``, ``, `fieldsets` (Django admin). Record section title and description text. -| Name | In | Type | Required | Description | -|------|----|----- |----------|-------------| -| {param} | {path/query/body} | {type} | {yes/no} | {description} | +**UI components, navigation** — component files; `nav`, `menu`, `sidebar`, `header`, `footer`, `navbar`, `drawer` patterns. -**Request Body:** +### 3.2 CLI Tools -\`\`\`json {example request body} \`\`\` +**Commands** — Python (Click/Typer/argparse): `@click.command`, `@app.command`, `add_parser`, `add_subparser`. Node (Commander/Yargs): `.command(`, `.option(`, `yargs.`. Go (Cobra): `cobra.Command`, `cmd.AddCommand`. Rust (Clap): `#[command`, `Command::new`. -**Response:** +**Help output** — Try `./bin/* --help` if executable. Otherwise extract from source. -\`\`\`json {example response} \`\`\` +**Options / flags** — `add_argument`, `.option(`, `.flag(`, `--` patterns. -**Status Codes:** +### 3.3 REST APIs -| Code | Description | -|------|-------------| -| 200 | Success | -| 400 | Bad Request | -| 401 | Unauthorized | -| 404 | Not Found | +**Endpoints** — FastAPI/Flask: `@app.`, `@router.`. Express: `router.`, `app.`. Rails API: `app/controllers/api/`, namespace blocks in `config/routes.rb`. -## Request/Response Examples +**Schemas** — `Schema`, `Serializer`, `DTO`, `interface .*Request`, `type .*Response`. -### {Use Case 1} +**Authentication** — search for `auth`, `jwt`, `bearer`, `api.key`, `oauth`, `token`. Identify the actual mechanism used. -**Request:** +**OpenAPI / Swagger** — `openapi.yaml`, `openapi.json`, `swagger.*`, `api-spec.*`. -\`\`\`bash curl -X {METHOD} "{base_url}{path}" \\ -H "Authorization: Bearer {token}" \\ -H "Content-Type: application/json" \\ -d '{request body}' \`\`\` +### 3.4 Libraries / SDKs -**Response:** +**Public API** — Python: `__all__`, `__init__.py` exports, top-level `class`/`def`. Node/TS: `export` statements. Rust: `pub fn`/`pub struct`/`pub enum`/`pub trait`. Go: capitalized `func`/`type`. -\`\`\`json {response} \`\`\` +**Docstrings** — comments preceding public items (`"""…"""`, `///`, `//`). -## Error Handling +**Examples** — `examples/`, `example/`, `demo/`, files matching `*example*` / `*demo*`. -### Error Response Format +### 3.5 Mobile Apps -\`\`\`json { -"error": { -"code": "{error_code}", -"message": "{human readable message}" -} } \`\`\` +**Screens** — Flutter: `lib/**/*screen*.dart`, `*page*.dart`, files using `Scaffold`, `MaterialApp`. React Native: `*Screen*` files, files using `createStackNavigator`, `createBottomTabNavigator`. -### Error Codes +**Navigation** — `Navigator`, `createNavigator`, route definitions. -| Code | HTTP Status | Description | Resolution | -|------|-------------|-------------|------------| +## Phase 4: Validate Existing User Guide -{For each error code:} | {code} | {status} | {description} | {how to fix} | +If `docs/user-guide.md` exists: -## Configuration +1. **H1 title** must be exactly `# User Guide` (not `# Manual`). +2. **Required H2 sections** (per product type) present in correct order. +3. **TOC links** resolve to actual sections. +4. **Cross-reference table (MANDATORY):** -### Rate Limiting + | Documented Feature | Code Evidence | + | ------------------ | ------------- | + | {claim from guide} | {file:function, or MISSING} | -{Based on discovered rate limiting} + For each documented feature: confirm it exists, confirm description matches implementation, flag outdated examples. + For each feature found in Phase 3: confirm it's documented, otherwise flag undocumented. -### Pagination +5. **Web-app specific** (if applicable): + - Forms documented as **guided walkthroughs**, not field tables. + - **Help text** in the guide matches the actual `help_text:` / `helperText` in form views. + - **Conditional visibility** is explicitly documented (which fields appear/disappear and when). + - **Form section structure** in the guide matches the UI layout. -{Based on discovered pagination patterns} +If `docs/manual.md` exists, plan to rename to `docs/user-guide.md` and replace the H1 title. -## Troubleshooting +## Phase 5: Generate Report -### Common Issues +Pre-report verification: every applicable phase task is marked complete; cross-reference evidence is recorded; counts are exact (not "some" / "a few"). -{Based on error handling code} +```text +## User Guide Review Report -## FAQ +### Repository Info +- Organization: {org} +- Repository: {repo} +- Product Type: {type} +- Document Status: {exists / missing} -**Q: What is the rate limit?** +### Structure Checks +- H1 title `# User Guide`: {PASS / FAIL} +- Required H2 sections: {PASS / FAIL — list missing} +- TOC links valid: {PASS / FAIL} + +### Content Accuracy +- Getting Started matches actual setup: {PASS / FAIL} +- All features documented: {PASS / FAIL} +- {Product-type-specific checks} + +### Web App Style Checks (if applicable) +- Forms as guided walkthroughs: {PASS / FAIL} +- Help text reused from UI code: {PASS / FAIL} +- Conditional visibility documented: {PASS / FAIL} +- Section structure matches UI: {PASS / FAIL} + +### Coverage +- Documented features: {n} +- Features in code: {n} +- Undocumented features: {list with file:line} +- Documented but removed: {list} +- Conditional fields documented: {n} of {total} +- Help texts incorporated: {n} of {total} -A: {answer} +### Proposed Changes +{Specific edits with file:line targets} +``` -**Q: How do I handle pagination?** +## Phase 6: Write or Update the Guide -A: {answer with example} -``` +After the report is approved, write `docs/user-guide.md` using the structure below. **Do not paste templates verbatim — fill them with content discovered in Phase 3.** -### Library Template +### Required structure (all product types) ```markdown # User Guide @@ -1324,137 +219,110 @@ A: {answer with example} - [Getting Started](#getting-started) - [Features](#features) -- [Installation](#installation) -- [Quick Start](#quick-start) -- [API Reference](#api-reference) -- [Examples](#examples) +- {Product-type-specific sections} - [Configuration](#configuration) - [Troubleshooting](#troubleshooting) - [FAQ](#faq) ## Getting Started -### Requirements - -{Based on detected requirements} - -### Installation +### Prerequisites / Requirements +{Detected from package files, README, deployment configs} -{Based on package manager} +### Installation / Access +{Install command, URL, or onboarding step} -\`\`\`bash {install command} \`\`\` +### First-Time Setup / Quick Start +{Minimal example or numbered steps} ## Features -{Overview of library capabilities} - -## Quick Start - -\`\`\`{language} {minimal working example} \`\`\` - -## API Reference - -{For each public class/function:} - -### `{name}` - -{Description from docstring} - -**Signature:** - -\`\`\`{language} {function/class signature} \`\`\` - -**Parameters:** - -| Name | Type | Required | Description | -|------|------|----------|-------------| -| {param} | {type} | {yes/no} | {description} | - -**Returns:** +### Feature Overview -{Return type and description} +| Feature | Description | Location | +| ------- | ----------- | -------- | +| {name} | {description} | {page/command/endpoint} | +``` -**Example:** +### Product-type-specific guidance -\`\`\`{language} {usage example} \`\`\` +**Web Apps — Pages section.** For each page: URL, purpose (inferred from controller/view), what the user sees on landing, then a "What You Can Do Here" subsection listing each action with what it does and prerequisites. If a page has a status workflow, document the lifecycle. -## Examples +**Web Apps — Step-by-Step Guides (CRITICAL formatting rule).** This section MUST be a guided walkthrough, NOT a field table. For each form: -### Basic Usage +- Open with: "Navigate to **{page}** in the sidebar and click **{button label}**." +- For each form section discovered in Phase 3: include the section's description text from the UI; if list_items explain enum options, reproduce them as a bulleted guide. +- For each field in the section, use **numbered steps**: -\`\`\`{language} {example} \`\`\` + 1. **{Field Label}** — {use the actual `help_text` from the UI verbatim if present; otherwise plain-language explanation}. + - For dropdowns/enums, list each option with its description (use `list_items` text from UI). +- For trigger fields with conditional visibility, document inline: -### {Use Case 1} + > When you check this box / select "{value}", the following fields appear: + > + > - **{Conditional Field}** — {help text} -\`\`\`{language} {example} \`\`\` +- For inverse visibility: -### {Use Case 2} + > When you check this box, the {section name} fields below are no longer needed and will be hidden. -\`\`\`{language} {example} \`\`\` +- Close with "Saving and Next Steps": where the user is redirected, the starting status, and what to do next. -## Configuration +**Web Apps — Workflows.** For each major end-to-end flow: goal, then numbered steps with `Navigate to`, `Action`, `Expected result`. -### Options +**CLI Tools — Commands.** For each command: description, usage line (`command [options] `), arguments table (Argument | Required | Description), one example. Then global Options & Flags table, then per-command flags. -| Option | Type | Default | Description | -|--------|------|---------|-------------| -| {option} | {type} | {default} | {description} | +**CLI Tools — Examples.** Basic Usage → Common Workflows (multi-step) → Advanced Usage (complex options). -## Troubleshooting +**REST APIs — Authentication.** Method (API Key / OAuth / JWT), how to obtain credentials, header example, query-parameter example. -### Common Issues +**REST APIs — Endpoints.** For each: METHOD, path, parameters table (Name | In | Type | Required | Description), example request body (JSON), example response (JSON), status codes table. -{Based on error patterns} +**REST APIs — Error Handling.** Error response format, error codes table (Code | HTTP Status | Description | Resolution). -## FAQ +**Libraries — Quick Start.** One minimal working example in the project's primary language. -**Q: {Common question}** +**Libraries — API Reference.** For each public class/function: description from docstring, signature in a fenced block, parameters table, return type, one usage example. -A: {Answer} -``` +**Mobile Apps — Screens.** For each screen: name, navigation entry point, primary actions, what the user sees, screen-specific gestures or shortcuts. -## Step 11: Run Linters +### Configuration, Troubleshooting, FAQ (all types) -After making changes to docs/user-guide.md, run the linters skill: +- **Configuration** — settings table (Setting | Description | Default), environment variables table. +- **Troubleshooting** — Common Issues (Symptom / Cause / Solution), Error Messages table. +- **FAQ** — questions grounded in real user scenarios from issue templates, support patterns, or Phase 3 findings. Don't invent generic Q&As. -```text -/co-dev:run-linters -``` +## Phase 7: Run Linters -Fix any linting errors before considering the task complete. +After writing/updating the guide, run `/co-dev:run-linters` and fix any errors before reporting completion. ## Validation Checklist -Before completing, verify: +Before completing, confirm: -- [ ] H1 title is exactly `# User Guide` -- [ ] All required H2 sections present -- [ ] Table of Contents links work +- [ ] H1 is exactly `# User Guide` +- [ ] All required H2 sections present in correct order +- [ ] TOC links resolve - [ ] Getting Started actually gets users started -- [ ] All user-facing features are documented -- [ ] All documented features exist in code -- [ ] Examples are tested and working -- [ ] Configuration options match code -- [ ] Error messages and troubleshooting are accurate -- [ ] Step-by-step guides use walkthrough style (not field tables) -- [ ] Help text from the UI code is incorporated into the guides -- [ ] Conditional field visibility is documented (which fields appear/disappear and when) +- [ ] All user-facing features documented; nothing fabricated +- [ ] Examples tested against actual code +- [ ] Configuration matches code +- [ ] Step-by-Step Guides use walkthrough style (not field tables) +- [ ] UI help text incorporated verbatim +- [ ] Conditional field visibility documented with callout blocks ## Important Rules -1. **Write for end-users** - Not developers; assume no code knowledge -2. **Never fabricate features** - Only document what exists -3. **Include real examples** - Examples must work with actual code -4. **Keep language simple** - Avoid jargon; explain technical terms -5. **Show, don't tell** - Use screenshots, code examples, step-by-step guides -6. **Document the happy path first** - Then edge cases -7. **Cross-reference with code** - Every feature documented must exist -8. **Ask before modifying** - Show proposed changes and get approval -9. **Preserve existing content** - Only update, don't remove valid content -10. **Run linters after changes** - Always run `/co-dev:run-linters` -11. **Complete ALL steps** - Never skip analysis steps. Each step may reveal features not visible in other steps -12. **Output checkpoint log** - Include the completed checkpoint log in your final report to prove all steps were executed -13. **Never validate against world knowledge alone** - Do NOT use your training data to fact-check version numbers, release dates, or external claims. If uncertain about something, use web search to verify before flagging. Only validate things that can be cross-referenced against actual files in the repository or verified online. -14. **Write guided walkthroughs, not field tables** - For web application forms, NEVER output field tables (Field | Type | Required | Description). Instead, write numbered step-by-step instructions that walk the user through each form section, using the actual help text from the UI code. The tone should feel like a knowledgeable colleague guiding them through the screen. -15. **Document conditional field visibility** - When a checkbox, dropdown, or other field controls the visibility of other fields, explicitly document this behavior using callout blocks (e.g., "When you check this box, the following fields appear:"). Users need to know that form sections will change based on their selections. -16. **Reuse UI help text** - Extract and incorporate the actual help text, section descriptions, and list_items from the application's form views. This ensures the user guide matches what users see on screen. +1. **Write for end-users.** Assume no code knowledge. Explain technical terms. +2. **Never fabricate features.** Only document what exists in code. +3. **Cross-reference always.** Every documented feature must have a code-evidence pair. +4. **Examples must work** with actual code — test them. +5. **Document the happy path first**, then edge cases. +6. **Ask before modifying.** Show proposed changes and get approval. +7. **Preserve existing valid content** — only update or add, don't strip. +8. **Walkthrough style, not field tables** — for web app forms (rule 14 is non-negotiable). +9. **Reuse UI help text verbatim** — extract from `help_text:`, `helperText`, `placeholder:`, `list_items:` in the actual code. +10. **Document conditional visibility** with explicit callouts ("When you check this box, the following fields appear"). +11. **Run linters after changes** — always. +12. **Never validate against world knowledge alone.** Don't fact-check version numbers or external claims from training data — use web search or repo files. +13. **Complete all phases** — skipping reveals nothing; cross-referencing reveals everything.