feat: use tokie for accurate token counting by chonknick · Pull Request #31 · feyninc/catsu

chonknick · 2026-04-21T01:27:05Z

Summary

Adds tokie as a dependency for accurate token counting in providers that don't return token counts from their API (Cloudflare and Gemini)
Tokenizers are lazily loaded from HuggingFace Hub and cached in-process for reuse
Supports both HuggingFace tokenizers and tiktoken (cl100k_base mapped via xenova/gpt-4)
Falls back to cl100k tokenizer when a model's specific tokenizer is unavailable, with a final len/5 fallback if cl100k itself can't be loaded
Deserializes the existing tokenizer metadata from models.json into TokenizerInfo on ModelInfo

Coverage

Provider	Model	Token Counting
Cloudflare	bge-small/base/large-en-v1.5, bge-m3, qwen3-embedding-0.6b	Accurate (tokie)
Cloudflare	embeddinggemma-300m, plamo-embedding-1b	cl100k fallback
Gemini	gemini-embedding-001	Accurate (cl100k via tokie)

Files changed

Cargo.toml — added tokie dependency
src/models.rs — added TokenizerInfo struct and field on ModelInfo
src/catalog.rs — deserialize tokenizer field from models.json
src/tokenizer.rs — new module: cached tokenizer loading and counting
src/providers/cloudflare.rs — replaced len/5 with tokie
src/providers/gemini.rs — replaced len/5 with tokie
src/lib.rs — registered new module, exported TokenizerInfo

Test plan

cargo build compiles cleanly
cargo test — all 11 tests + 2 doc-tests pass
Manual test with Cloudflare provider to verify accurate token counts
Manual test with Gemini provider to verify cl100k counting

🤖 Generated with Claude Code

…providers Replace the rough len/5 heuristic with real tokenizer-based counting via tokie. Tokenizers are lazily loaded from HuggingFace Hub and cached. For models without a known tokenizer, falls back to cl100k (GPT-4). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

cloudflare-workers-and-pages · 2026-04-21T01:27:38Z

Deploying catsu with Cloudflare Pages

Latest commit:	`aab3170`
Status:	✅ Deploy successful!
Preview URL:	https://15a6a257.catsu-3ib.pages.dev
Branch Preview URL:	https://feat-tokie-token-counting.catsu-3ib.pages.dev

View logs

cloudflare-workers-and-pages · 2026-04-21T01:28:04Z

Deploying with Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status	Name	Latest Commit	Preview URL	Updated (UTC)
✅ Deployment successful! View logs	catsu-docs	`aab3170`	Commit Preview URL Branch Preview URL	Apr 21 2026, 01:28 AM

Copilot

Pull request overview

This PR introduces accurate token counting via the tokie tokenizer library for providers that don’t return token usage (notably Cloudflare and Gemini), using lazily loaded/cached tokenizers with sensible fallbacks.

Changes:

Add tokie dependency and introduce a cached tokenizer loader/token counter module.
Extend the model catalog data model to include optional tokenizer metadata (TokenizerInfo) loaded from models.json.
Replace len/5 token estimation with tokie-based counting in Cloudflare and Gemini providers (with fallbacks).

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
Cargo.toml	Adds `tokie` dependency (HF-enabled) to support accurate tokenization.
src/tokenizer.rs	New module for cached tokenizer loading and token counting with fallbacks.
src/models.rs	Adds `TokenizerInfo` and attaches optional tokenizer config to `ModelInfo`.
src/catalog.rs	Deserializes optional tokenizer metadata from the embedded catalog JSON.
src/providers/cloudflare.rs	Uses tokenizer-based counting (via `spawn_blocking`) instead of `len/5`.
src/providers/gemini.rs	Uses tokenizer-based counting (via `spawn_blocking`) instead of `len/5`.
src/lib.rs	Registers tokenizer module and re-exports `TokenizerInfo`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-21T01:30:04Z

+    let mut cache = TOKENIZER_CACHE.lock().ok()?;
+
+    if let Some(tok) = cache.get(name) {
+        return Some(Arc::clone(tok));
+    }
+
+    match Tokenizer::from_pretrained(name) {
+        Ok(tok) => {
+            let tok = Arc::new(tok);


get_or_load_tokenizer holds the global TOKENIZER_CACHE mutex while calling Tokenizer::from_pretrained(name), which can perform disk/network I/O and block for a long time. This effectively serializes all concurrent token counting calls and can stall unrelated requests. Consider dropping the lock before loading (double-checked locking: check cache, release lock, load, then re-lock to insert) or using a per-tokenizer OnceCell/DashMap so only the same tokenizer load is deduplicated.

Suggested change

let mut cache = TOKENIZER_CACHE.lock().ok()?;

if let Some(tok) = cache.get(name) {

return Some(Arc::clone(tok));

}

match Tokenizer::from_pretrained(name) {

Ok(tok) => {

let tok = Arc::new(tok);

{

let cache = TOKENIZER_CACHE.lock().ok()?;

if let Some(tok) = cache.get(name) {

return Some(Arc::clone(tok));

}

}

match Tokenizer::from_pretrained(name) {

Ok(tok) => {

let tok = Arc::new(tok);

let mut cache = TOKENIZER_CACHE.lock().ok()?;

if let Some(existing) = cache.get(name) {

return Some(Arc::clone(existing));

}

Copilot · 2026-04-21T01:30:04Z

+        .await
+        .unwrap_or_else(|_| {
+            crate::tokenizer::fallback_count(&request.inputs)
+        });


In the spawn_blocking(...).await error path, the fallback calls crate::tokenizer::fallback_count(&request.inputs) on the async runtime thread. Since fallback_count may attempt to load a tokenizer (and uses a blocking std::sync::Mutex), this can block the reactor thread. Consider making the error-path fallback a pure len/5 estimate, or running fallback_count in spawn_blocking as well.

Copilot · 2026-04-21T01:30:04Z

+        .await
+        .unwrap_or_else(|_| {
+            crate::tokenizer::fallback_count(&request.inputs)
+        });


In the spawn_blocking(...).await error path, the fallback calls crate::tokenizer::fallback_count(&request.inputs) on the async runtime thread. Since fallback_count may attempt to load a tokenizer (and uses a blocking std::sync::Mutex), this can block the runtime thread. Consider making the error-path fallback a pure len/5 estimate, or running fallback_count inside spawn_blocking too.

Copilot AI review requested due to automatic review settings April 21, 2026 01:27

Copilot started reviewing on behalf of chonknick April 21, 2026 01:27 View session

Copilot AI reviewed Apr 21, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: use tokie for accurate token counting#31

feat: use tokie for accurate token counting#31
chonknick wants to merge 1 commit into
mainfrom
feat/tokie-token-counting

chonknick commented Apr 21, 2026

Uh oh!

cloudflare-workers-and-pages Bot commented Apr 21, 2026

Uh oh!

cloudflare-workers-and-pages Bot commented Apr 21, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 21, 2026

Uh oh!

Copilot AI Apr 21, 2026

Uh oh!

Copilot AI Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

chonknick commented Apr 21, 2026

Summary

Coverage

Files changed

Test plan

Uh oh!

cloudflare-workers-and-pages Bot commented Apr 21, 2026

Deploying catsu with Cloudflare Pages

Uh oh!

cloudflare-workers-and-pages Bot commented Apr 21, 2026

Deploying with Cloudflare Workers

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants