TPM-Aware Rate Limiting

# Feature Request: TPM-Aware Rate Limiting

## Problem Statement

The current implementation of rate limiting in `lotus` is purely **RPM-based (Requests Per Minute)**. While this works for small data samples, it fails when processing "Large Context Tasks" such as long agent traces or document analysis. 

When a user has a low-to-mid tier API account (e.g., OpenAI Tier 1 with 200,000 TPM limit), firing off a batch of requests with large prompt sizes immediately exhausts the **TPM (Tokens Per Minute)** budget, even if the RPM limit is respected. This leads to `429 Rate Limit Reached` errors and makes it impossible to process datasets with large vertical context (rows with 50k+ tokens) reliably.

## Proposed Solution

Implement a **TPM Rate Limiter** alongside the existing RPM logic in [lm.py]


## Use Cases

- **Agent Trace Analysis**: Processing traces that are 300k+ characters long per row.
- **Document Filtering/Joins**: Running `sem_filter` or `sem_join` on long research papers, legal documents, or logs.
- **Stable Batch Processing**: Ensuring that large DataFrames can be processed without manual restarts or "Retry-Heads" on standard API accounts (OpenAI Tier 1/2).

## Alternative Solutions

- **Lowering `max_batch_size`**: Users can manually set a tiny batch size (e.g., 1), but this is "token-blind" and inefficient if row sizes fluctuate.
- **Retry Logic**: Relying on library-level retries (e.g., `litellm` retries) which leads to a "wave of failures" where entire batches crash simultaneously, creating massive overhead and latency.

## Additional Context

A TPM-aware approach ensures **Theoretical Maximum Throughput**. It keeps the "Token Pipe" to the limit of the user's specific API tier without overflowing it. I'm working on an implementation and am happy to contribute this as a PR.

## Checklist

- [x] I have searched existing issues to avoid duplicates
- [x] I have provided a clear problem statement
- [x] I have considered alternative solutions
- [x] I have assessed the impact and priority
- [x] I am willing to contribute to implementation (if applicable)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TPM-Aware Rate Limiting #233

Feature Request: TPM-Aware Rate Limiting

Problem Statement

Proposed Solution

Use Cases

Alternative Solutions

Additional Context

Checklist

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

TPM-Aware Rate Limiting #233

Description

Feature Request: TPM-Aware Rate Limiting

Problem Statement

Proposed Solution

Use Cases

Alternative Solutions

Additional Context

Checklist

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions