nightshift: tech-debt-classify analysis

## nightshift: Tech Debt Classification

**Repository:** Microck/traccia
**Task:** tech-debt-classify
**Category:** options
**Date:** 2026-04-23

---

### Summary

Classified tech debt across 15 source modules (~260KB source). Traccia is a Python 3.12+ local-first skill graph compiler using Pydantic v2, Typer CLI, and SQLite storage. The codebase is well-structured but has several areas of accumulated debt.

### Debt Classification

#### 🔴 HIGH — Model surface area (`src/traccia/models.py`, 296 lines)

**Issue:** `models.py` contains 20 enum types and 12 model classes in a single file. This is the most imported module in the codebase.

**Risk:** Any change to an enum or model triggers widespread re-imports. No clear internal organization.

**Recommendation:** Split into `src/traccia/enums.py` (all 20 StrEnum types) and `src/traccia/models.py` (data models only). Re-export from `__init__.py` for backward compatibility.

**Estimated effort:** 1-2 hours.

#### 🔴 HIGH — God module: `pipeline.py` (1251 lines, 51KB)

**Issue:** The pipeline module handles discovery, ingestion, extraction, canonicalization, scoring, rendering orchestration, and export — all in one file.

**Risk:** Hard to test individual stages in isolation. High cognitive load for any change.

**Recommendation:** Split into stage modules: `pipeline/discover.py`, `pipeline/extract.py`, `pipeline/canonicalize.py`, `pipeline/score.py`, `pipeline/render.py`, `pipeline/export.py`. Keep `pipeline.py` as an orchestrator.

**Estimated effort:** 4-8 hours.

#### 🟡 MEDIUM — LLM backend has 19 raise statements with inconsistent error types

**Issue:** `src/traccia/llm.py` (538 lines) raises `BackendError`, `TimeoutError`, `subprocess.SubprocessError`, `json.JSONDecodeError`, and a private `_HttpResponseError`. No clean hierarchy.

**Risk:** CLI error handling catches broad `Exception` because there's no clean error hierarchy to match on.

**Recommendation:** Introduce a `BackendError` hierarchy: `BackendConnectionError`, `BackendResponseError`, `BackendAPIError`, `BackendConfigError`.

**Estimated effort:** 3-4 hours.

#### 🟡 MEDIUM — `parsers.py` (700+ lines) handles 13 source types in one module

**Issue:** All parsers live in a single file with deeply nested if/elif chains.

**Risk:** Adding a new source type requires modifying the same large file. Test isolation is difficult.

**Recommendation:** Use a registry pattern with `@register_parser(SourceType.MARKDOWN)` decorators.

**Estimated effort:** 4-6 hours.

#### 🟡 MEDIUM — Bare error propagation in `storage.py`

**Issue:** Many storage methods return bare `err` without wrapping context about which table/record/operation failed.

**Recommendation:** Wrap errors with operation context: `raise StorageError(f"failed to upsert skill node {node.skill_id}: {err}") from err`

**Estimated effort:** 2-3 hours.

#### 🟢 LOW — Ruff config only enables F (Pyflakes) and I (isort) rules

**Issue:** `pyproject.toml` line 54: `select = ["F", "I"]`. Misses bugbears (B), comprehensions (C4), simplified ranges (SIM), etc.

**Recommendation:** Enable `["F", "I", "B", "SIM", "C4", "UP"]` and fix violations incrementally.

**Estimated effort:** 1-2 hours.

#### 🟢 LOW — No type checking in CI

**Issue:** No mypy/pyright config. CI doesn't run type checking. With Pydantic v2 models, type coverage would catch schema drift early.

**Recommendation:** Add `pyright` or `mypy` to dev dependencies and CI.

**Estimated effort:** 2-3 hours.

### Debt Summary

| Priority | Count | Estimated Total Effort |
|----------|-------|----------------------|
| 🔴 HIGH | 2 | 5-10 hours |
| 🟡 MEDIUM | 3 | 9-13 hours |
| 🟢 LOW | 2 | 3-5 hours |
| **Total** | **7 items** | **17-28 hours** |

### Recommended Order

1. Split `models.py` enums (low risk, high impact, quick win)
2. Extend ruff rules (quick win)
3. Split `pipeline.py` into stage modules (highest value)
4. Improve error hierarchy in `llm.py`
5. Refactor `parsers.py` to registry pattern
6. Add type checking to CI

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nightshift: tech-debt-classify analysis #28

nightshift: Tech Debt Classification

Summary

Debt Classification

🔴 HIGH — Model surface area (`src/traccia/models.py`, 296 lines)

🔴 HIGH — God module: `pipeline.py` (1251 lines, 51KB)

🟡 MEDIUM — LLM backend has 19 raise statements with inconsistent error types

🟡 MEDIUM — `parsers.py` (700+ lines) handles 13 source types in one module

🟡 MEDIUM — Bare error propagation in `storage.py`

🟢 LOW — Ruff config only enables F (Pyflakes) and I (isort) rules

🟢 LOW — No type checking in CI

Debt Summary

Recommended Order

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Priority	Count	Estimated Total Effort
🔴 HIGH	2	5-10 hours
🟡 MEDIUM	3	9-13 hours
🟢 LOW	2	3-5 hours
Total	7 items	17-28 hours

nightshift: tech-debt-classify analysis #28

Description

nightshift: Tech Debt Classification

Summary

Debt Classification

🔴 HIGH — Model surface area (src/traccia/models.py, 296 lines)

🔴 HIGH — God module: pipeline.py (1251 lines, 51KB)

🟡 MEDIUM — LLM backend has 19 raise statements with inconsistent error types

🟡 MEDIUM — parsers.py (700+ lines) handles 13 source types in one module

🟡 MEDIUM — Bare error propagation in storage.py

🟢 LOW — Ruff config only enables F (Pyflakes) and I (isort) rules

🟢 LOW — No type checking in CI

Debt Summary

Recommended Order

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

🔴 HIGH — Model surface area (`src/traccia/models.py`, 296 lines)

🔴 HIGH — God module: `pipeline.py` (1251 lines, 51KB)

🟡 MEDIUM — `parsers.py` (700+ lines) handles 13 source types in one module

🟡 MEDIUM — Bare error propagation in `storage.py`