Create PZ Index Class which Can Be Used by Semantic Filter and Semantic Top-K Operators 

## Overview
In practice, Semantic Filters should be able to use vector databases to accelerate their operations. Given a set of documents to filter, a naive solution could involve:
1. Embedding the query
2. Embedding each document and ingesting it into a vector database
3. Retrieving the top-k documents or all documents with a similarity score greater than some threshold

While we have implemented support for using vector database(s) as indices in our Semantic Top-K operator, this support is currently limited to `chromadb` and two embedding models (`text-embedding-3-small` for text-only queries and the CLIP model for text / image queries).

The primary goals of this issue are two-fold:
1. Implement a `BaseIndex` class within PZ which provides an abstraction / interface that can be implemented for any vector database and/or embedding model
2. Create a physical implementation of PZ's semantic filter operator which can construct an index on-the-fly and use it to efficiently execute a semantic filter.

Secondary goals of this issue include:
1. Refactor the semantic top-k operator to use the new index abstraction
2. Implement the index abstraction for a few standard vector database(s) and embedding models
3. Implement the index abstraction such that we can support any combination of text / image / audio queries. (There may be some fundamental limitations with queries involving text + image + audio, image + audio, and even text + audio; but for every combination where we can compute embeddings, we should seek to have an index implemented).

## Acceptance Criteria
- Implement a `BaseIndex` class within PZ (some starter code [may already exist here](https://github.com/mitdbg/palimpzest/blob/main/src/palimpzest/core/data/index_dataset.py)).
- Refactor the Semantic Top-K physical operator to use this `BaseIndex` class
- Create a physical operator for Semantic Filter which constructs an index on-the-fly and uses it to perform the filter
- Modify the `sem_filter()` and `sem_topk()` functions in `pz.Dataset` to accept an index if the user has already constructed one outside of their PZ program.
- Aim to support as many multimodal queries with indices as possible (i.e. not just text-only and text-image queries).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create PZ Index Class which Can Be Used by Semantic Filter and Semantic Top-K Operators #137

Overview

Acceptance Criteria

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Create PZ Index Class which Can Be Used by Semantic Filter and Semantic Top-K Operators #137

Description

Overview

Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions