Problem Statement
When doing a batch request to OpenAI embedding models, the total number of tokens is limited to 300k (reference). catsu properly surfaces that error response to the user.
import catsu
lots_of_text = ["foo", "bar", "baz", ...] # totals >300k tokens
client = catsu.client()
response = client.embed("openai:text-embedding-small-3", lots_of_text)
This is fixed by manually batching. Illustratively
import itertools
import catsu
lots_of_text = ["foo", "bar", "baz", ...] # totals >300k tokens
client = catsu.client()
responses = []
# batch items in groups of 500; arbitrary and could hit token limit
for batch in itertools.batched(lots_of_text, 500):
response = client.embed("openai:text-embedding-small-3", batch)
responses.append(response)
Proposed Solution
It would be amazing if catsu could automatically batch inputs. This would involve tokenizing all inputs and would be expensive though. Maybe this should like in chonkie ?
import catsu
lots_of_text = ["foo", "bar", "baz", ...] # totals >300k tokens
client = catsu.client()
responses = []
# batch items in groups of 500; arbitrary and could hit token limit
for batch in catsu.batch_inputs(lots_of_text):
response = client.embed("openai:text-embedding-small-3", batch)
responses.append(response)
# OR
responses = client.embed("openai:text-embedding-small-3", lots_of_text, batch_inputs=True)
There could be a sync and async option.
Problem Statement
When doing a batch request to OpenAI embedding models, the total number of tokens is limited to 300k (reference).
catsuproperly surfaces that error response to the user.This is fixed by manually batching. Illustratively
Proposed Solution
It would be amazing if catsu could automatically batch inputs. This would involve tokenizing all inputs and would be expensive though. Maybe this should like in
chonkie?There could be a sync and async option.