Problem Statement
When running operations like sem_filter or sem_map, I would like to be able to specify a n_sample parameter and ensembling strategy
Proposed Solution
Allow users to specify a test-time-scaling strategy and ensembling strategy per operator. For example
df.sem_filter("the {abstract} is relevant to vector databases", n_sample=3, ensemble='majority_vote', temperature=1.0)
Use Cases
When I perform a sem_filter (eg "is this row relevant to XX"), the results often change for repeated trials, so it would help to have built in functionally
Checklist
- [ x] I have searched existing issues to avoid duplicates
- [ x] I have provided a clear problem statement
- [ x] I have considered alternative solutions
- [ x] I have assessed the impact and priority
- [ x] I am willing to contribute to implementation (if applicable)
Problem Statement
When running operations like sem_filter or sem_map, I would like to be able to specify a n_sample parameter and ensembling strategy
Proposed Solution
Allow users to specify a test-time-scaling strategy and ensembling strategy per operator. For example
df.sem_filter("the {abstract} is relevant to vector databases", n_sample=3, ensemble='majority_vote', temperature=1.0)Use Cases
When I perform a sem_filter (eg "is this row relevant to XX"), the results often change for repeated trials, so it would help to have built in functionally
Checklist