Skip to content
This repository was archived by the owner on May 27, 2026. It is now read-only.

feat: add BM25 text indexing support for hybrid search#908

Draft
Askir wants to merge 1 commit into
mainfrom
feature/text-indexing-bm25
Draft

feat: add BM25 text indexing support for hybrid search#908
Askir wants to merge 1 commit into
mainfrom
feature/text-indexing-bm25

Conversation

@Askir

@Askir Askir commented Dec 22, 2025

Copy link
Copy Markdown
Contributor

Add a text_indexing parameter to create_vectorizer that enables BM25 full-text search indexing via the pg_textsearch extension. This enables hybrid search (vector + BM25) on vectorized data.

  • ai.text_indexing_bm25(text_config, k1, b) - enable BM25 indexing with optional parameters
  • ai.text_indexing_none() - explicitly disable (default)
  • Auto-infers target column based on chunking config
  • Index created immediately when vectorizer is created
SELECT ai.create_vectorizer(
    'my_documents'::regclass,
    embedding => ai.embedding_openai('text-embedding-3-small', 768),
    text_indexing => ai.text_indexing_bm25()
);

Add a `text_indexing` parameter to `create_vectorizer` that enables BM25
full-text search indexing via the pg_textsearch extension. This enables
hybrid search (vector + BM25) on vectorized data.

Features:
- `ai.text_indexing_bm25(text_config, k1, b)` function with sane defaults
- `ai.text_indexing_none()` to explicitly disable (default)
- Auto-infers target column: chunk column if chunking enabled, source
  column if chunking disabled
- Index created immediately when vectorizer is created
- Fails with descriptive error if pg_textsearch extension not installed

Example usage:
```sql
SELECT ai.create_vectorizer(
    'my_documents'::regclass,
    embedding => ai.embedding_openai('text-embedding-3-small', 768),
    text_indexing => ai.text_indexing_bm25()
);
```
@Askir Askir temporarily deployed to internal-contributors December 22, 2025 17:10 — with GitHub Actions Inactive
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant