add tag-based indexing by Ashex · Pull Request #3 · Ashex/atproto-mcp

Ashex · 2026-02-22T18:53:23Z

Documents are now indexed with structured tags (source, content_type,
domain, topic, namespace, lexicon_type, language) using a hybrid
vocabulary: controlled core enums plus generated tags derived from
content metadata. Tags are stored in txtai's tags column and queried
via SQL WHERE clauses, replacing the previous post-retrieval filtering
that frequently returned empty results when top-k ANN candidates were
dominated by other sources.

Documents are now indexed with structured tags (source, content_type, domain, topic, namespace, lexicon_type, language) using a hybrid vocabulary: controlled core enums plus generated tags derived from content metadata. Tags are stored in txtai's tags column and queried via SQL WHERE clauses, replacing the previous post-retrieval filtering that frequently returned empty results when top-k ANN candidates were dominated by other sources. Key changes: - parser: add tags field to ContentChunk, tag builder functions per source, encode_tags() for pipe-delimited txtai storage - indexer: write tags at index time, persist in chunk_meta.json, SQL-filtered _filtered_search() with fallback, backward-compat for old indexes without tags - tools: add content_type filter to search_atproto_docs - tests: update regression tests for tag-aware fake embeddings, add test_tags.py (37 tests covering encoding, builders, filtered search, metadata round-trip), add compare_search_quality.py

Ashex added 3 commits February 22, 2026 19:50

Cleanup type checking

11c3b36

Fixed config referencing

50689a3

Ashex merged commit b7cae5b into main Feb 22, 2026
1 check passed

Ashex deleted the feat/tag-indexing branch February 22, 2026 19:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add tag-based indexing#3

add tag-based indexing#3
Ashex merged 3 commits into
mainfrom
feat/tag-indexing

Ashex commented Feb 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Ashex commented Feb 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant