Skip to content

Add test-time scaling strategies like resampling and ensembling #200

@liana313

Description

@liana313

Problem Statement

When running operations like sem_filter or sem_map, I would like to be able to specify a n_sample parameter and ensembling strategy

Proposed Solution

Allow users to specify a test-time-scaling strategy and ensembling strategy per operator. For example

df.sem_filter("the {abstract} is relevant to vector databases", n_sample=3, ensemble='majority_vote', temperature=1.0)

Use Cases

When I perform a sem_filter (eg "is this row relevant to XX"), the results often change for repeated trials, so it would help to have built in functionally

Checklist

  • [ x] I have searched existing issues to avoid duplicates
  • [ x] I have provided a clear problem statement
  • [ x] I have considered alternative solutions
  • [ x] I have assessed the impact and priority
  • [ x] I am willing to contribute to implementation (if applicable)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions