Skip to content

[Feat]: Automated batching to respect quotas #29

Description

@zilto

Problem Statement

When doing a batch request to OpenAI embedding models, the total number of tokens is limited to 300k (reference). catsu properly surfaces that error response to the user.

import catsu

lots_of_text = ["foo", "bar", "baz", ...]  # totals >300k tokens

client = catsu.client()
response = client.embed("openai:text-embedding-small-3", lots_of_text)

This is fixed by manually batching. Illustratively

import itertools
import catsu

lots_of_text = ["foo", "bar", "baz", ...]  # totals >300k tokens

client = catsu.client()

responses = []
# batch items in groups of 500; arbitrary and could hit token limit 
for batch in itertools.batched(lots_of_text, 500): 
    response = client.embed("openai:text-embedding-small-3", batch)
    responses.append(response)

Proposed Solution

It would be amazing if catsu could automatically batch inputs. This would involve tokenizing all inputs and would be expensive though. Maybe this should like in chonkie ?

import catsu

lots_of_text = ["foo", "bar", "baz", ...]  # totals >300k tokens

client = catsu.client()

responses = []
# batch items in groups of 500; arbitrary and could hit token limit 
for batch in catsu.batch_inputs(lots_of_text): 
    response = client.embed("openai:text-embedding-small-3", batch)
    responses.append(response)

# OR
responses = client.embed("openai:text-embedding-small-3", lots_of_text, batch_inputs=True)

There could be a sync and async option.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Fields

No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions