Skip to content

Add dynamic metadata filtering for RAG queries #21

@samkeen

Description

@samkeen

Feature Request: Dynamic Metadata Filtering for RAG Queries

Problem Statement

Currently, the RAG implementation only supports filtering by:

  • Number of results (n_results)
  • Distance threshold (distance_threshold)

However, ChromaDB supports powerful metadata filtering through its where parameter, which could significantly improve the precision of document retrieval.

Proposed Solution

Add a dynamic filter builder UI that allows users to create metadata-based filters for their RAG queries.

Implementation Overview

Backend Changes

  1. RAG Config Service (rag_config_service.py)

    • Add method to detect available metadata fields from collection
    • Modify query_collection() to accept where parameter
    • Store filter preferences in config
  2. API Endpoints (routes.py)

    • GET /api/rag/metadata-fields - Return available fields with types and unique values
    • Update POST /api/chat to accept filter parameters

Frontend Changes

  1. Main Chat Interface (script.js, index.html)

    • Collapsible filter panel below RAG toggle
    • Dynamic filter rows with field/operator/value selectors
    • Support AND/OR logic between filters
    • Show active filter count badge
  2. Settings Page (settings.js, settings.html)

    • Preview available metadata fields when collection selected
    • Configure default filters

Filter Types Support

  • Text fields: equals, contains (using $in)
  • Numbers: equals, $gt, $lt, $gte, $lte, range
  • Lists: multi-select with $in/$nin
  • Dates: date picker with comparison operators

Example Filter Format

{
  "filters": [
    {"field": "author", "operator": "$eq", "value": "John Doe"},
    {"field": "chapter", "operator": "$in", "value": [1, 2, 3]},
    {"field": "date", "operator": "$gte", "value": "2024-01-01"}
  ],
  "logic": "$and"  // or "$or"
}

User Benefits

  1. Precision: Target specific document subsets (e.g., "only search in chapter 3")
  2. Efficiency: Reduce noise from irrelevant content
  3. Flexibility: Build complex queries without writing code
  4. Discovery: Explore metadata patterns in the corpus
  5. Performance: Smaller, more relevant result sets

Additional Features to Consider

  • Save/load filter presets
  • Quick filter templates ("Recent docs", "By author")
  • Filter match explanations in results
  • Visual indicators for active filters
  • Recently used filters history

ChromaDB Reference

ChromaDB supports these metadata filter operators:

  • Comparison: $eq, $ne, $gt, $gte, $lt, $lte
  • Logical: $and, $or
  • Inclusion: $in, $nin

Documentation: https://docs.trychroma.com/docs/querying-collections/metadata-filtering

Acceptance Criteria

  • Users can add/remove filter conditions dynamically
  • Filters persist across page refreshes
  • Filter UI shows available fields from current collection
  • Applied filters are visible in chat details modal
  • Clear documentation on how to use filters

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions