Skip to content

nightshift: idea-generator #23

@nightshift-micr

Description

@nightshift-micr

Nightshift Idea Generator — Microck/traccia

Analyzed repo structure, docs, and source code (6,710 lines across 18 Python modules). Generated improvement ideas prioritized by impact and feasibility.


1. Incremental Ingest with File-Watch Mode

Priority: High | Effort: Medium | Area: Pipeline

The current ingest model is batch-only (ingest-dir). For ongoing personal archive management, a traccia watch command that monitors a directory for new/modified files and ingests them incrementally would dramatically reduce friction. This could use watchdog (already Python-native) and reuse the existing Pipeline.ingest_dir() logic with a changed-file filter.

Why: The README says the tool is "built for mixed archives rather than one clean source of truth." Real archives grow continuously. Watch mode makes traccia a living tool instead of a one-shot batch processor.


2. Evidence Deduplication Across Re-Ingests

Priority: High | Effort: Low | Area: Storage/Pipeline

When the same file is re-ingested (content unchanged), Storage.replace_source_evidence() deletes and re-inserts all evidence. This means evidence IDs change, downstream references break, and the graph unnecessarily re-scores. A dedup layer that compares evidence by (source_id, span_hash, evidence_type) before inserting would make re-ingest idempotent.

Why: The plan document (Phase 1) says "ignore unchanged files" as a done-when criterion, but evidence-level dedup is still missing. Without it, any re-ingest churns the graph.


3. Skill Diff / Change Log Between Ingest Runs

Priority: Medium | Effort: Low | Area: Rendering

Add a traccia diff command that shows what changed in the skill graph between the last two ingests (or between two timestamps). Output: new skills, level changes, freshness transitions, new evidence. This directly supports the project's goal of "long-range memory" by making the graph's evolution visible.

Why: The tree/log.md is append-only but unstructured. A formal diff command gives users a clear "what changed" view that the current log rendering doesn't provide.


4. Export Backend Abstraction (Beyond OpenAI-Compatible)

Priority: Medium | Effort: Medium | Area: LLM

LLMBackend is defined as a Protocol with only one real implementation (OpenAICompatibleBackend) and a FakeLLMBackend. Adding an AnthropicBackend or a LiteLLMBackend wrapper would broaden compatibility. The Protocol design already supports this — it just needs adapters.

Why: The README says "any provider that clones the same request and response shape can be used," but Anthropic and Google don't clone the OpenAI shape. LiteLLM would unify all of them without changing the extraction contract.


5. Confidence Score Calibration Report

Priority: Medium | Effort: Low | Area: Pipeline Support / CLI

Add a traccia audit command that generates a report showing: skills grouped by confidence bucket (high/medium/low), skills where evidence count is 1 (fragile), and skills where level was boosted despite only consumption evidence (potential violation of the L2 cap rule from the plan). This gives users a way to inspect scoring quality without reading graph JSON.

Why: The plan (Phase 4) has detailed scoring rules (consumption cap at L2, confidence model, recency model), but there's no user-facing tool to verify the rules are working correctly on real data.


6. Archive Family Plugin System

Priority: Medium | Effort: Medium | Area: Source Detection / Family Normalizer

The current SourceFamily enum has 7 families (generic, google_takeout, discord_data_package, twitter_archive, reddit_export, instagram_export, facebook_export). Each new family requires modifying the enum, the normalizer, and the source detector. A plugin system where families are discovered from entry points (traccia.families) would let third-party packages add support for new export formats (LinkedIn, GitHub, Spotify, etc.) without modifying core.

Why: The README explicitly says "broader archive direction" is in scope and "the system is meant to grow toward bigger archive imports." The current monolithic approach doesn't scale to the long tail of export formats.


7. SQLite Migration Support

Priority: Medium | Effort: Low | Area: Storage

Storage._ensure_schema() creates tables and adds columns via _ensure_columns(), but there's no versioned migration system. If the schema changes between releases, users need to re-ingest everything. A simple migration table (_migrations (version INTEGER, applied_at TEXT)) with numbered SQL patches would protect user data across upgrades.

Why: The config already tracks PipelineVersions with schema_version and extraction_version, but the actual migration mechanism is missing. This is a data-loss risk for early adopters.


8. Streaming Extraction Progress with ETA

Priority: Low | Effort: Low | Area: Pipeline / CLI

Long ingest runs (large archives) have no progress indication beyond IngestManifestEntry counts. Adding a rich.progress or simple tqdm progress bar showing files processed / total, evidence extracted, and estimated time remaining would improve the CLI experience significantly.

Why: The README mentions "long-running scan can be inspected," but inspection requires reading the manifest file manually. Real-time progress in the terminal is more practical.


9. Skill Graph Diffing for Merge Scenarios

Priority: Low | Effort: High | Area: Pipeline / Storage

Support merging two traccia projects (e.g., a work archive and a personal archive) with conflict detection. When both projects have evidence for the same skill, the merge should combine evidence, resolve level conflicts (take the higher confidence), and flag duplicates for review.

Why: The README says the tool handles "mixed archives." Users often have multiple archive sources that they might want to analyze separately first, then merge. Multi-project support is a natural extension.


10. Test Coverage Expansion for Edge Cases

Priority: High | Effort: Low | Area: Tests

Current tests cover ~2,800 lines across 7 test files. The largest modules (pipeline.py at 1,251 lines, rendering.py at 930 lines, storage.py at 517 lines) have limited direct test coverage. Specific gaps:

  • No tests for the document normalizer fallback chain
  • No tests for concurrent/re-entrant ingest scenarios
  • No tests for the review queue accept/reject flow
  • No tests for the Obsidian export path
  • No tests for malformed input handling in parsers

Why: The verification path in the README (uv run pytest -q) suggests the test suite is meaningful, but the large pipeline and rendering modules are undertested relative to their complexity.


Summary

# Idea Priority Effort
1 File-watch ingest mode High Medium
2 Evidence dedup on re-ingest High Low
3 Skill diff command Medium Low
4 LLM backend abstraction Medium Medium
5 Confidence calibration report Medium Low
6 Archive family plugin system Medium Medium
7 SQLite migration support Medium Low
8 Streaming extraction progress Low Low
9 Multi-project merge Low High
10 Test coverage expansion High Low

This report was generated by nightshift — autonomous code quality bot.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions