Skip to content

TPM-Aware Rate Limiting #233

@suhaasteja

Description

@suhaasteja

Feature Request: TPM-Aware Rate Limiting

Problem Statement

The current implementation of rate limiting in lotus is purely RPM-based (Requests Per Minute). While this works for small data samples, it fails when processing "Large Context Tasks" such as long agent traces or document analysis.

When a user has a low-to-mid tier API account (e.g., OpenAI Tier 1 with 200,000 TPM limit), firing off a batch of requests with large prompt sizes immediately exhausts the TPM (Tokens Per Minute) budget, even if the RPM limit is respected. This leads to 429 Rate Limit Reached errors and makes it impossible to process datasets with large vertical context (rows with 50k+ tokens) reliably.

Proposed Solution

Implement a TPM Rate Limiter alongside the existing RPM logic in [lm.py]

Use Cases

  • Agent Trace Analysis: Processing traces that are 300k+ characters long per row.
  • Document Filtering/Joins: Running sem_filter or sem_join on long research papers, legal documents, or logs.
  • Stable Batch Processing: Ensuring that large DataFrames can be processed without manual restarts or "Retry-Heads" on standard API accounts (OpenAI Tier 1/2).

Alternative Solutions

  • Lowering max_batch_size: Users can manually set a tiny batch size (e.g., 1), but this is "token-blind" and inefficient if row sizes fluctuate.
  • Retry Logic: Relying on library-level retries (e.g., litellm retries) which leads to a "wave of failures" where entire batches crash simultaneously, creating massive overhead and latency.

Additional Context

A TPM-aware approach ensures Theoretical Maximum Throughput. It keeps the "Token Pipe" to the limit of the user's specific API tier without overflowing it. I'm working on an implementation and am happy to contribute this as a PR.

Checklist

  • I have searched existing issues to avoid duplicates
  • I have provided a clear problem statement
  • I have considered alternative solutions
  • I have assessed the impact and priority
  • I am willing to contribute to implementation (if applicable)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions