Skip to content

Remove Cartesia provider — keep AWS Polly + Microsoft Azure only#80

Merged
Saqoosha merged 4 commits into
developfrom
remove-cartesia-provider
May 22, 2026
Merged

Remove Cartesia provider — keep AWS Polly + Microsoft Azure only#80
Saqoosha merged 4 commits into
developfrom
remove-cartesia-provider

Conversation

@Saqoosha
Copy link
Copy Markdown
Owner

@Saqoosha Saqoosha commented May 22, 2026

Summary

  • Drop the Cartesia provider entirely from the Premium TTS path on iOS + Worker. PremiumVoiceProvider loses the .cartesia case. PremiumVoiceCatalog drops 25 Cartesia voices (22 JA + 3 EN). Polly (14) + Azure (16) = 30 voices still covers US/UK/AU + JA/EN with race-announcer and friendly-narrator personas.
  • Worker: proxyCartesia deleted, CARTESIA_API_KEY binding removed, ALLOWED_CARTESIA_MODELS allow-list gone, cartesia branches removed from contentTypeFor / sampleRateFor / responseHeadersFor. buildCacheKey drops the model field; R2 key prefix bumps v2 → v3.
  • iOS: parseSSE + handleEventJSON deleted (SSE was Cartesia-only — Polly + Azure stream raw PCM directly). prefetch + sendAndStream collapse to a single Polly/Azure raw-PCM path. TTSCache.key drops the model field; local cache prefix bumps v5 → v6.
  • All Cartesia labels / hint strings / comments removed from PaywallView, PremiumVoicePickerView, AudioSettingsView, LapAnnouncer, and the HDZapPremium.storekit product descriptions ("30+ voices across AWS Polly and Microsoft Azure"). Paywall sample teaser is now 2-row (Polly + Azure) instead of 3-row.

Why

TestFlight Build 17 tester report: Cartesia's per-IP rate limit on parallel SSE bursts kept dropping prewarm prefetches — even with the 3-concurrent cap added in PR #79, 8 of 14 phrases returned 429 at countdownStartSeconds=15 and Start audio failed to play because the user-visible request hit the rate-limit ceiling. Polly + Azure are both materially more permissive on TPS, so removing Cartesia eliminates the entire rate-limit class of bugs while retaining a strong race-announcer voice catalogue.

Bonus: ~450 lines removed across Worker + iOS, no more SSE parsing, no model-field special-cases, no Cartesia-specific UI disclaimers. Lower-surface-area Premium TTS for the remaining roadmap work.

Test plan

  • Build the iOS app and confirm it compiles (no leftover .cartesia references) — already done locally, ** BUILD SUCCEEDED **.
  • Deploy the Worker to staging / dev (wrangler deploy) and confirm a /tts POST with provider=polly and one with provider=azure each return audio. A POST with provider=cartesia should now return bad-provider (400).
  • Open Settings → Audio → Premium voice picker, confirm only Polly + Azure sections render and the catalogue lists 30 voices.
  • If the operator previously had a Cartesia voice selected (premiumLapVoiceId UUID), the voice falls back to the System path because currentPremiumVoiceIfActive() returns nil (voice ID no longer in catalog). Re-pick a Polly or Azure voice.
  • Race with Azure Daichi at 1.45× / 0.0st pitch — countdown should now play every number without the rate-limit drops the user saw with Cartesia Ayumi.
  • Confirm the v6 local cache prefix invalidates earlier entries: first race after this deploy fetches fresh from Polly / Azure, subsequent races hit local cache.

Summary by CodeRabbit

  • Chores

    • ビルドバージョンを 16 から 17 に更新しました。
  • Improvements

    • Premium音声を AWS Polly と Microsoft Azure に統合しました(Cartesia を除外)。
    • 音声キャッシュ仕様を更新し、既存のキャッシュキーは無効化されます。
    • Premium音声のプリウォーム処理と並行実行制御を強化しました。
    • 起動時に存在しない保存済みプレミアム音声IDをリセットするようにしました。
  • UI Updates

    • 設定画面とペイウォールのプロバイダ表記・文言を Polly/Azure 前提へ更新しました。

Review Change Stack

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 22, 2026

Warning

Rate limit exceeded

@Saqoosha has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 19 minutes and 10 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 89c85f52-976f-4a48-8480-5ec7e73d3ef1

📥 Commits

Reviewing files that changed from the base of the PR and between 25234da and 3f3328c.

📒 Files selected for processing (3)
  • app/HDZap/Models/Speech/PremiumSpeechSynthesizer.swift
  • app/HDZap/Views/Settings/AudioSettingsView.swift
  • app/HDZap/Views/Settings/PremiumVoicePickerView.swift

Walkthrough

Premium TTS 提供元から Cartesia サポートを削除し、Polly(AWS)と Azure(Microsoft)のみに統一。TTSキャッシュキー契約を v5→v6(Worker は v2→v3)に更新して model パラメータを廃止。iOS と Worker のストリーミング経路を生 PCM 前提で整理し、LapAnnouncer のプリウォーム並行制御・キャンセル再開ロジックを追加。

Changes

Cartesia 削除と Premium TTS Polly/Azure 統一

Layer / File(s) Summary
TTS キャッシュキー契約の更新(v5→v6、model パラメータ削除)
app/HDZap/Models/Speech/TTSCache.swift, app/HDZap/Models/Speech/PremiumSpeechSynthesizer.swift, workers/hdzap-premium/src/index.ts
TTSCache のカノニカル文字列を更新し、key から model を削除。iOS と Worker のキャッシュ呼び出し/説明を合わせて変更し、既存キャッシュとの互換性を断つ。
Worker 側 Cartesia 削除と環境・ルーティング統一
workers/hdzap-premium/src/index.ts
CARTESIA_API_KEY バインディングと model フィールド、proxyCartesia を削除。許可プロバイダを polly/azure のみとし、キャッシュプレフィックスを v2→v3 に更新。レスポンスヘッダと R2 書き込み前説明を Polly/Azure 前提に統一。
iOS PremiumSpeechSynthesizer の Polly/Azure 統一化
app/HDZap/Models/Speech/PremiumSpeechSynthesizer.swift
PremiumVoiceProvider から .cartesia を削除しカタログを再編。ストリーミングを常に生 PCM 経路へ統一、SSE パースと関連デバッグカウンタを除去、prefetch に 429 リトライ追加、AVAudioConverter 終端挙動を .endOfStream に調整。キャッシュキー呼び出しで model を渡さない。
LapAnnouncer プリウォーム並行制御とキャンセル管理
app/HDZap/Models/LapAnnouncer.swift
currentPrewarmTask を導入。Premium 発話直前に進行中のプリウォームを cancel() して nil にし、発話終了時に inflightUtteranceCount == 0 の場合のみ prewarmFixedPhrases() を再開。prewarmFixedPhrases()withTaskGroup(maxConcurrent=3)へ変更。
ビルド番号更新と UI/StoreKit 表記の Cartesia 削除
app/HDZap.xcodeproj/project.pbxproj, app/project.yml, app/HDZap/Resources/StoreKit/HDZapPremium.storekit, app/HDZap/Views/Settings/AudioSettingsView.swift, app/HDZap/Views/Settings/PaywallView.swift, app/HDZap/Views/Settings/PremiumVoicePickerView.swift
Xcode と project.yml の CURRENT_PROJECT_VERSION を 16→17 に更新。StoreKit と UI 表示を Cartesia 除外(35+→30+)、Premium ボイス一覧/ヒント/レート・ピッチ表記を Polly/Azure 前提へ整理。
アプリ初期化: 保存済みプレミアム音声ID の整合性チェック
app/HDZap/HDZapApp.swift
起動時に UserDefaults の保存プレミアム音声IDが現在の PremiumVoiceCatalog.voices に存在しない場合、デフォルト ID にリセットするガード処理を追加。

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • Saqoosha/HDZap#12: LapAnnouncer 周り(TTS 発話/統合)を変更しているため重複箇所がある可能性があります.
  • Saqoosha/HDZap#78: LapAnnouncer.prewarmFixedPhrases の呼び出しタイミング変更と実装面で直接関連します.
  • Saqoosha/HDZap#74: PremiumSpeechSynthesizer / LapAnnouncer の inflight 発話管理に関する変更が重複しています.

Poem

🐰 Cartesia はお別れ、Polly と Azure で
キャッシュは v6 にぴょんと跳び
三つずつプリウォーム、焦らずに歌い
PCM の流れがすっきり整い
さあ、30+ の声で新しい朝へ

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Remove Cartesia provider — keep AWS Polly + Microsoft Azure only' directly and clearly summarizes the main objective of the pull request, which is to remove the Cartesia TTS provider while retaining AWS Polly and Microsoft Azure.
Docstring Coverage ✅ Passed Docstring coverage is 86.96% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch remove-cartesia-provider

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
app/HDZap/Models/LapAnnouncer.swift (1)

1025-1034: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Premium を外したときも既存 prewarm を止めてください。

Line 1026 の guard が先に return するので、設定変更で system に戻したり voice を未選択にした直後でも、前回の currentPrewarmTask は走り続けます。不要な prefetch が残ると upstream への負荷と 429 リスクをまた増やすので、キャンセルは guard より前で必ず実行した方がいいです。

💡 修正案
     func prewarmFixedPhrases() {
+        currentPrewarmTask?.cancel()
+        currentPrewarmTask = nil
         guard let voice = currentPremiumVoiceIfActive() else { return }
         let phrases = fixedPrewarmPhrases(for: currentLanguage())
         let synth = premiumSynth
         let lang = voice.lang
-        currentPrewarmTask?.cancel()
-        currentPrewarmTask = Task.detached { `@MainActor` in
+        currentPrewarmTask = Task.detached { `@MainActor` in
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/HDZap/Models/LapAnnouncer.swift` around lines 1025 - 1034, In
prewarmFixedPhrases(), always cancel any existing currentPrewarmTask before the
early exit: move the currentPrewarmTask?.cancel() call to the top of the
function (before the guard let voice = currentPremiumVoiceIfActive() else {
return }) so switching off premium or deselecting voice stops any running
prewarm; keep the rest of the logic (computing phrases, synth, lang, and
creating Task.detached) unchanged and ensure currentPrewarmTask is then set to
the new Task only when a new prewarm starts.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@app/HDZap/Models/LapAnnouncer.swift`:
- Around line 355-367: premiunUtteranceEnded currently always calls
prewarmFixedPhrases after utteranceDidEnd/deactivateSession, which can restart
prewarm while a replacement utterance is still playing; modify
premiumUtteranceEnded to only call prewarmFixedPhrases when there are no active
utterances (check inflightUtteranceCount == 0) after invoking utteranceDidEnd(),
keeping deactivateSession() as-is; reference premiumUtteranceEnded,
utteranceDidEnd, prewarmFixedPhrases, deactivateSession, and
inflightUtteranceCount when making the change.

In `@app/HDZap/Models/Speech/PremiumSpeechSynthesizer.swift`:
- Around line 578-590: The retry path swallows Task cancellation because `try?
await Task.sleep(...)` ignores CancellationError and causes a retry even after
prewarm was cancelled; in PremiumSpeechSynthesizer's prewarm code replace the
`try? await Task.sleep(nanoseconds: 1_000_000_000)` with a
cancellation-respecting approach (e.g. use `try await Task.sleep(...)` and let
CancellationError propagate, or explicitly check `Task.isCancelled` and abort
before re-sending the request) so the subsequent `(data, response) = try await
URLSession.shared.data(for: req)` is not executed when the task has been
cancelled.

---

Outside diff comments:
In `@app/HDZap/Models/LapAnnouncer.swift`:
- Around line 1025-1034: In prewarmFixedPhrases(), always cancel any existing
currentPrewarmTask before the early exit: move the currentPrewarmTask?.cancel()
call to the top of the function (before the guard let voice =
currentPremiumVoiceIfActive() else { return }) so switching off premium or
deselecting voice stops any running prewarm; keep the rest of the logic
(computing phrases, synth, lang, and creating Task.detached) unchanged and
ensure currentPrewarmTask is then set to the new Task only when a new prewarm
starts.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 5245d31b-7132-4dfb-b85b-56f7e20b7927

📥 Commits

Reviewing files that changed from the base of the PR and between 62b9a96 and 3763fe1.

⛔ Files ignored due to path filters (1)
  • workers/hdzap-premium/package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (10)
  • app/HDZap.xcodeproj/project.pbxproj
  • app/HDZap/Models/LapAnnouncer.swift
  • app/HDZap/Models/Speech/PremiumSpeechSynthesizer.swift
  • app/HDZap/Models/Speech/TTSCache.swift
  • app/HDZap/Resources/StoreKit/HDZapPremium.storekit
  • app/HDZap/Views/Settings/AudioSettingsView.swift
  • app/HDZap/Views/Settings/PaywallView.swift
  • app/HDZap/Views/Settings/PremiumVoicePickerView.swift
  • app/project.yml
  • workers/hdzap-premium/src/index.ts

Comment thread app/HDZap/Models/LapAnnouncer.swift
Comment thread app/HDZap/Models/Speech/PremiumSpeechSynthesizer.swift
@Saqoosha Saqoosha force-pushed the remove-cartesia-provider branch 2 times, most recently from 78ff8aa to f0cf7ba Compare May 22, 2026 04:00
- PremiumVoiceProvider enum loses the .cartesia case. PremiumVoiceCatalog drops 25 Cartesia voices (22 JA + 3 EN); Polly (14) + Azure (16) = 30 voices total, plenty for race-announcer / friendly-narrator personas across US/UK/AU accents
- Worker drops the proxyCartesia function, the CARTESIA_API_KEY env binding, the ALLOWED_CARTESIA_MODELS allow-list, the cartesia branch in the /tts handler, and the cartesia / SSE special-cases in contentTypeFor / sampleRateFor / responseHeadersFor. The Cartesia model field is removed from buildCacheKey; R2 key prefix bumps v2 -> v3 to invalidate the old entries
- iOS drops parseSSE + handleEventJSON (SSE was Cartesia-only — Polly + Azure stream raw PCM directly). prefetch + sendAndStream collapse to a single Polly/Azure raw-PCM path. accumulatedPCM, currentSampleRate, and the resampler logic stay (Polly is still 16 kHz, needs upsample). TTSCache key drops the trailing model segment and bumps v5 -> v6
- All Cartesia comments / labels / hint strings removed from PaywallView, PremiumVoicePickerView, AudioSettingsView, LapAnnouncer, and the HDZapPremium.storekit product descriptions ('30+ voices across AWS Polly and Microsoft Azure'). Sample voice teaser on paywall is now 2-row (Polly + Azure) instead of 3-row
- Motivation: Cartesia's per-IP rate limit on parallel SSE bursts kept dropping prewarm prefetches (observed 8 of 14 phrases 429'd at countdownStartSeconds=15 with the 3-concurrent cap from PR #79). Polly + Azure are both more permissive on TPS, so removing Cartesia eliminates the rate-limit class of bugs entirely while retaining a strong race-announcer voice catalog
- Code simplification: ~450 lines removed across Worker + iOS, no SSE parsing, no model-field special-cases, no Cartesia-specific UI disclaimers. Lower-surface-area Premium TTS for the remaining roadmap work

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@Saqoosha Saqoosha force-pushed the remove-cartesia-provider branch from f0cf7ba to 25234da Compare May 22, 2026 08:21
Saqoosha and others added 2 commits May 22, 2026 17:51
…lyphase tail

Polly 16 kHz countdown utterances were truncating ~10-20 ms at the end because
the one-shot resampler input block returned .noDataNow instead of .endOfStream.
The polyphase upsampler's FIR-filter tail stayed buffered inside AVAudioConverter
and never reached the output buffer. Azure (24 kHz native) bypasses the converter
so was unaffected.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- PremiumVoicePickerView: selecting a premium voice now also flips
  `ttsEngine` to "premium" (both on direct tap when entitled and on
  post-purchase auto-commit). Previously the selectedId was written
  but the router kept routing through System voice, so the picker
  looked like it did nothing.
- AudioSettingsView: the Voice section's Reset button no longer
  touches Announcement-section keys (master toggle, announce-best,
  countdown enable/start). It now also resets ttsEngine and the
  premium voice / rate / pitch so a single tap returns the entire
  Voice section to defaults, while leaving the Announcement section
  untouched.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@Saqoosha Saqoosha force-pushed the remove-cartesia-provider branch from 25234da to 01ba91c Compare May 22, 2026 08:52
…rsions

Two defensive hardenings in `buildOverlapBuffer` that protect against the
exact silent-failure mode the prior `.endOfStream` fix targeted, in case
AVAudioConverter behaves at the edges of its documented contract.

- Raise `outputCapacity` headroom from +64 to +1024 frames. The FIR group
  delay for 16 kHz → 24 kHz upsample is ~300-600 output samples; +64
  (~2.7 ms) leaves no margin if Apple's converter ever needs slightly
  more room after `.endOfStream` than the prior frame ratio suggested.
  +1024 (~43 ms) costs ~2 KB per utterance and gives a 2× safety factor.
- Throw on `outputBuffer.frameLength == 0`. `convert()` can return a
  non-error status while still producing zero frames; scheduling such a
  buffer on AVAudioPlayerNode is a silent no-op (no completion callback
  path that signals failure). The throw lets `speakOverlap`'s catch set
  `lastError` so the missing audio is visible in the Settings banner /
  dev panel instead of an unexplained silent countdown beat.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@Saqoosha Saqoosha merged commit eaef49a into develop May 22, 2026
1 check passed
@Saqoosha Saqoosha deleted the remove-cartesia-provider branch May 22, 2026 09:09
Saqoosha added a commit that referenced this pull request May 22, 2026
…ning

Build-number-only bump — MARKETING_VERSION stays at 1.1.0 so the build
ships to existing 1.1.0 beta testers via Apple's fast-track build
review, without re-running beta-review approval.

Includes PR #80 in develop: Cartesia provider removed, Polly countdown
tail truncation fixed (.endOfStream flush), voice picker engine flip
on selection, Voice section Reset scope narrowed, AVAudioConverter
defensive zero-frame guard + wider FIR tail headroom.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant