Weekly tech debt audit: miso-gallery - 2026-07-01

## Summary

**Overall Risk Level: Moderate**

This weekly audit covers the miso-gallery codebase (Flask app, server-rendered gallery), with a focus on code consistency, operational gaps, dependency hygiene, and incremental degradation since the last audit (2026-06-03). The app is in a generally healthy state with strong security posture, but several areas of accumulated technical debt and gaps were identified.

---

## Top Findings

### P0 — Critical

None identified. No active security vulnerabilities, data loss risks, or blocked workflows were found.

### P1 — High

1. **`iter_gallery_media()` / `iter_gallery_folders()` coexist with newer `iter_gallery_items()` — dead code ambiguity**  
   The newer `iter_gallery_items()` centralizes the bounded iterator pattern but `iter_gallery_media()` and `iter_gallery_folders()` are still present and used by several endpoints (`llm_images`, `llm_folders`, `llm_recent`, `llm_tags`, `add_tag`). These legacy functions duplicate the same bounded-scan logic with slightly different exclusion handling. Inconsistencies could lead to drift where one iterator excludes paths differently from another.  
   **Evidence:** `app.py` ~lines 165-210: three separate rglob-based iterators with near-identical loop bodies. The `is_excluded_gallery_path` check is applied in `iter_gallery_items()` but `iter_gallery_media()` uses inline excluded_dirs logic.

2. **`llm_tags` tag storage is a no-op (logs only)**  
   The `/api/llm/tags` endpoint and the web UI `/tag` route both accept tags but only log them — no backend storage exists. This was a UI-first partial fix for issue #91. Tags are a visible feature: the "Tag" button renders on every image card and the API accepts/validates them, but they disappear on reload. Users and LLM callers have no way to retrieve tags later.  
   **Evidence:** `app.py` `/tag` route (~line 590): `log_security_event("add_tag", "success", ...)` then `return {"status": "ok"} — no DB or file writes. `llm_tags` similarly only logs.

3. **Duplicate thumbnail cache removal across delete paths — logic divergence**  
   `remove_thumbnail_cache_for()` walks THUMBNAIL_CACHE_DIR with `iterdir()` filtering by prefix. The `/delete/` route calls it, `/bulk-delete` calls it, and LLM delete routes call it. Each call does its own loop. This is duplicated in 5+ places. A refactor into a single batch-purge function would reduce inconsistency risk.  
   **Evidence:** Appears in `delete` (~line 410), `bulk_delete` (~lines 455, 465), `llm_delete` (~line 841), `llm_bulk_delete` (~line 868), `llm_dedup` (~line 896).

4. **`RATE_LIMIT_ROUTE_LIMITS` JSON env var is fragile and untested**  
   The override mechanism in `security.py` (`_load_route_overrides()`) parses a JSON env var for custom rate limits. If the JSON is malformed, it silently falls back to defaults with only a warning log. There are no tests for this path. If an operator sets a bad value thinking it's active, the overrides silently disappear.  
   **Evidence:** `security.py` `_load_route_overrides()` — no unit tests cover this function in `test_security_edges.py` or any other test file.

5. **Release workflow publishes Docker tags to `ghcr.io/misospace/miso-gallery` but manual-release creates a tag — the `release.yaml` and `publish-release.yml` publish to different tags**  
   The Release workflow (`release.yaml`) publishes `ghcr.io/misospace/miso-gallery` with SHA-based tags (via `docker/metadata-action` with `type=sha,prefix={{branch}}-`) during the merge step, but creates no semver tag. The Publish Release workflow (`publish-release.yml`) creates the semver tag and GH release. Both workflows then trigger the Release workflow on tag creation, which builds the multi-arch image again. This means the image is built twice: once during the PR merge build (as `main-<sha>`) and once during the release tag (as `:<semver>`). The `:latest` tag is never updated by any workflow.  
   **Evidence:** `.github/workflows/release.yaml` merges digests with `type=sha,prefix={{branch}}-` tags. `.github/workflows/publish-release.yml` tags the merge commit. `:latest` is never explicitly pushed.

### P2 — Medium

6. **`file_sha256()` reads 1MB chunks but uses `iter()` pattern that may leave files open on error**  
   The function opens the file but only closes on normal completion. Generator exception safety is not guaranteed; the file handle may leak if an exception occurs mid-iteration.  
   **Evidence:** `app.py` line ~224: `with path.open("rb") as handle:` — needs `try/finally` or contextlib.closing.

7. **`grep -m1` in release workflow version check**  
   The Release workflow checks APP_VERSION with `grep -oP 'or "\K[^"]+' | head -1` which can match multiple lines if `APP_VERSION` appears in comments or docstrings. A more deterministic extraction (e.g., `app.py` import + regex) is safer.  
   **Evidence:** `.github/workflows/release.yaml`: `VERSION=$(grep '^APP_VERSION = ' app.py | head -1 | grep -oP 'or "\K[^"]+' | head -1)`

8. **`dir_size()` and `_dir_size()` modules are duplicated**  
   `trash.py` has both a module-level `dir_size()` and a private `_dir_size()`. The module-level one skips symlinks (security), while the private one does not skip symlinks. `move_to_trash()` calls `_dir_size()` (not `dir_size()`) for the post-move metadata, meaning the trash metadata size estimate may follow symlinks.  
   **Evidence:** `trash.py` lines ~16-26 (`dir_size` skips symlinks) vs lines ~135-141 (`_dir_size` does not skip symlinks).

9. **Python 3.14 in CI but 3.12 in dependency audit — inconsistency**  
   The lint and test workflows use `python-version: '3.14'`, while the dependency audit workflow uses `"3.12"`. This means dependency audit runs against a different Python version than the code actually runs on. Vulnerabilities specific to 3.14 (or deps compiled for 3.14) won't be caught.  
   **Evidence:** `.github/workflows/lint.yaml` (3.14), `.github/workflows/tests.yaml` (3.14), `.github/workflows/dependency-audit.yaml` (3.12).

10. **`bulk_delete` folder preflight path validation has dead code**  
    The `sanitize_path()` call inside the folder size estimation loop has a `# sanitize_path() rejects paths containing ...` comment but no error handling — it just calls `continue` on failure without logging.  
    **Evidence:** `app.py` ~lines 445-448: `if not sanitize_path(rel_path): # continue`.

11. **`health.py` defines routes via Blueprint but `app.py` registers them redundantly via `app.add_url_rule`**  
    `health.py` creates a `health_bp` Blueprint with routes like `/health/storage`. But `app.py` registers them again via `app.add_url_rule("/health", ...)` etc. The Blueprint routes in `health.py` are never actually registered on the app — only the `add_url_rule` registrations in `app.py` take effect. This is confusing and the Blueprint is dead code.  
    **Evidence:** `health.py`: `health_bp = Blueprint("health", __name__)` with `@health_bp.route("/health/storage")` — but `app.py` registers via `app.add_url_rule("/health/storage", ...)`.

### P3 — Low

12. **`APP_VERSION` default is `"0.1.18"` — should match semantic release**  
    The hardcoded default in `app.py` has not been bumped since the last release. While the env var override handles deployments, the source-of-truth default is stale. This creates confusion when reading the source.

13. **`conftest.py` uses `pathlib.Path` for ROOT but re-adds `str(ROOT)` to `sys.path`**  
    Minor: `Path.__str__` returns the path string, which is already what `sys.path.insert()` expects. The `str()` call is redundant.

14. **Trash restore uses `shutil.copytree` for directories (non-atomic) instead of `rename`**  
    `restore_from_trash()` uses `shutil.copytree()` for directory restore while `move_to_trash()` uses `rename`. This means directory restores are not atomic and can leave partial state on failure.

15. **`docs/runbook.md` release section references `npm version` — miso-gallery has no `package.json`**  
    The runbook shows "npm version 0.1.x --no-git-tag-version" for bumping APP_VERSION, but the repo is Python-only with no npm.

---

## Evidence: Files and Observations

### Code Duplication — `iter_gallery_items` vs `iter_gallery_media` vs `iter_gallery_folders`
- `app.py` lines 165-210: three separate functions doing `DATA_FOLDER.rglob("*")` bounded by `GALLERY_SCAN_LIMIT`
- `iter_gallery_items()` was added later as a unified function but `iter_gallery_media()` and `iter_gallery_folders()` remain deployed
- Callers: `find_duplicate_media()` uses `iter_gallery_media()`; `iter_gallery_folders()` is called by `llm_folders`; `llm_images` uses `iter_gallery_media()`; `iter_gallery_items()` currently has no callers

### Tag storage — no-op
- `/tag` route (`app.py` ~line 590): logs event and returns `{"status": "ok"}`
- `/api/llm/tags` (`app.py` ~line 810): logs event and returns `{"status": "ok", "updated": [...], "tags": [...]}`
- The `updated` list in the API response is misleading — it shows paths that *would* be tagged but the tags are never persisted

### Dead Blueprint in health.py
```python
# health.py creates this:
health_bp = Blueprint("health", __name__)
@health_bp.route("/health/storage")
# ... but this Blueprint is never imported/registered in app.py
# Instead app.py uses:
app.add_url_rule("/health/storage", "storage_health", storage_health, methods=["GET"])
```

### Duplicate thumnail cleanup
Pattern repeated at app.py lines:
- 410: `remove_thumbnail_cache_for(rel_path)` in `delete()`
- 455, 465: in `bulk_delete()` (for files and folders)
- 841: in `llm_delete()`
- 868: in `llm_bulk_delete()`
- 896: in `llm_dedup()`

---

## Recommended Issue Breakdown

1. **P1 — Consolidate gallery iterators: replace `iter_gallery_media()` and `iter_gallery_folders()` with unified `iter_gallery_items()`**  
   Migrate all callers (`llm_images`, `llm_folders`, `llm_recent`, `llm_tags`, `find_duplicate_media`) to use the single bounded iterator. Remove legacy duplicates.

2. **P1 — Implement tag persistence: store tags as sidecar files or SQLite**  
   Add actual storage backend for tags (JSON sidecar per image or flat SQLite DB). Update `/tag` and `/api/llm/tags` to persist and `/api/llm/images` to include tags in metadata response.

3. **P2 — Extract thumbnail cache cleanup into a single batch function**  
   Replace 5+ inline `remove_thumbnail_cache_for()` calls with a single `batch_remove_thumbnails(paths: list[str])` function that takes a list of rel_paths and does one directory walk.

4. **P2 — Add tests for `_load_route_overrides()`**  
   Cover valid JSON, malformed JSON, empty string, non-dict JSON, and boundary values.

5. **P1 — Fix release workflow: restore `:latest` tag publishing and eliminate double build**  
   Add `type=raw,value=latest` to the Release workflow's metadata-action tags, and consolidate release/publish workflows to avoid building the image twice per release.

6. **P2 — Fix `file_sha256()` file handle safety**  
   Add explicit `finally` block or wrap the generator in a context manager to guarantee file handle release on error.

7. **P2 — Strengthen `APP_VERSION` extraction in release workflow**  
   Replace `grep -oP` with a Python one-liner that imports the module and reads the constant directly.

8. **P2 — Standardize `dir_size()` and `_dir_size()` in trash.py**  
   Either make both skip symlinks or eliminate the private variant and use the public one consistently (including in `move_to_trash()`).

9. **P3 — Align dependency-audit Python version with CI Python version**  
   Change dependency-audit.yaml from `python-version: "3.12"` to `"3.14"` to match lint/tests workflows.

10. **P3 — Fix dead comment/logging in `bulk_delete` folder preflight**  
    Remove dead `continue` with no logging and add proper security event logging for sanitize_path failures.

11. **P2 — Remove dead `health_bp` Blueprint from health.py**  
    Either register the Blueprint in `app.py` or remove it to eliminate confusion.

12. **P3 — Remove `npm version` reference from docs/runbook.md**  
    Replace with `sed -i ...` or Python script instruction.

---

## Not Worth Doing Yet

- **The dead Blueprint issue is cosmetic but low impact.** The `health.py` routes work fine via `app.add_url_rule()`. Only fix this if the file is already being touched.
- **Python 3.14 in CI vs 3.12 for dep audit** — the dependency audit runs weekly and checks for known CVEs; Python 3.14 vs 3.12 doesn't change the vulnerability surface for pure-Python deps significantly. Only fix if the audit CI starts false-positive flagging 3.12-only vulnerabilities.
- **`conftest.py` redundant `str()` call** — cosmetic only, no runtime impact.
- **Tag UX is visible but functionless, but the API does exist.** Adding a full tag store may be more work than the feature value justifies at this point. Consider whether tags are worth keeping as a user-facing feature before investing.
- **`npm version` in runbook** — low priority since the manual-release workflow doesn't actually run npm; the runbook instructions are stale but harmless.

## Decomposed into

- #271 — Consolidate gallery iterators: replace itergallerymedia() and itergalleryfolders() with unified itergalleryitems()
- #272 — Implement tag persistence: store tags as sidecar files or SQLite
- #273 — Extract thumbnail cache cleanup into a single batch function
- #274 — Add tests for loadrouteoverrides()
- #275 — Fix release workflow: restore :latest tag publishing and eliminate double build
- #276 — Fix filesha256() file handle safety
- #277 — Strengthen APPVERSION extraction in release workflow
- #278 — Standardize dirsize() and dirsize() in trash.py
- #279 — Align dependency-audit Python version with CI Python version
- #280 — Fix dead comment/logging in bulkdelete folder preflight
- #281 — Remove dead healthbp Blueprint from health.py
- #282 — Remove npm version reference from docs/runbook.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Weekly tech debt audit: miso-gallery - 2026-07-01 #243

Summary

Top Findings

P0 — Critical

P1 — High

P2 — Medium

P3 — Low

Evidence: Files and Observations

Code Duplication — `iter_gallery_items` vs `iter_gallery_media` vs `iter_gallery_folders`

Tag storage — no-op

Dead Blueprint in health.py

Duplicate thumnail cleanup

Recommended Issue Breakdown

Not Worth Doing Yet

Decomposed into

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Weekly tech debt audit: miso-gallery - 2026-07-01 #243

Description

Summary

Top Findings

P0 — Critical

P1 — High

P2 — Medium

P3 — Low

Evidence: Files and Observations

Code Duplication — iter_gallery_items vs iter_gallery_media vs iter_gallery_folders

Tag storage — no-op

Dead Blueprint in health.py

Duplicate thumnail cleanup

Recommended Issue Breakdown

Not Worth Doing Yet

Decomposed into

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Code Duplication — `iter_gallery_items` vs `iter_gallery_media` vs `iter_gallery_folders`