Skip to content

Fix/realtime tts voice rewire#181

Open
atomer-nvidia wants to merge 8 commits into
nvidia-riva:mainfrom
atomer-nvidia:fix/realtime-tts-voice-rewire
Open

Fix/realtime tts voice rewire#181
atomer-nvidia wants to merge 8 commits into
nvidia-riva:mainfrom
atomer-nvidia:fix/realtime-tts-voice-rewire

Conversation

@atomer-nvidia
Copy link
Copy Markdown
Contributor

No description provided.

atomer-nvidia and others added 5 commits May 14, 2026 11:11
Defer the pyaudio import to the points where it is actually needed
(MicrophoneStream.__enter__, SoundCallBack.__init__, list_*_devices,
get_*_info). Default WAV-output flows now work on machines without
PortAudio headers installed. When pyaudio is missing, raise an
ImportError that explicitly tells the user to install portaudio19-dev
first, addressing the VDR finding that fresh-box users got blocked by
a bare ModuleNotFoundError with no install instructions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The riva-asr/nmt/tts client scripts historically exit 0 on most error
paths — including "Unavailable model", connection refused, empty/invalid
input, and missing files — which causes CI pipelines composing these
scripts via && chains to silently swallow real failures.

Add a cli_main decorator that translates uncaught exceptions into a
small, consistent set of exit codes:

  2 = bad input (missing/empty file, ValueError, IsADirectoryError)
  3 = gRPC UNAVAILABLE (server down, wrong port, network)
  4 = gRPC INVALID_ARGUMENT / NOT_FOUND (bad model/lang/voice)
  1 = anything else
  130 = SIGINT

The decorator also writes the error to stderr so CI logs surface the
cause rather than the script swallowing it. Follow-up commit wires
this into each client script.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ation

Address the VDR 26.02 finding that python-clients CLIs exit 0 on most
error paths across all three modalities. Each script now:

  - Wraps main() with @cli_main so gRPC and OS errors propagate to a
    real exit code instead of being printed and swallowed.
  - Calls sys.exit(main()) so the chosen exit code reaches the shell.

Script-specific fixes:

  scripts/nmt/nmt.py
    - Drop the inner request() try/except that swallowed every gRPC
      status; let cli_main translate it. Empty/whitespace --text and
      missing --text-file now return EXIT_BAD_INPUT (was: silent
      exit 0). Document --max-len-variation as decoder-token units
      with valid range [0, 256], default 20, and Arabic chunking note.

  scripts/tts/talk.py
    - Reject whitespace-only --text up front (defense-in-depth pair to
      the server-side fix in riva-speech that closed the hang on
      `--text "   "`). Drop the broad `except Exception` that
      stringified gRPC errors and exited 0.

  scripts/asr/transcribe_file*.py
    - Replace `print(...); return` on missing input files with
      EXIT_BAD_INPUT. Remove the silent grpc.RpcError swallow in
      transcribe_file_offline.py.

  scripts/asr/transcribe_mic.py + realtime_asr_client.py + tts/talk.py
    - Pyaudio install hint now mentions `apt-get install -y
      portaudio19-dev` (Debian/Ubuntu) and `brew install portaudio`
      (macOS), pairing with the prereqs doc landed in documentation_2.

  scripts/tts/realtime_tts_client.py
    - Drop the module-level `from riva.client.audio_io import
      SoundCallBack` import (it was unused and pulled pyaudio in
      eagerly, defeating the lazy import). Drop the broad
      `except Exception` that mapped every failure to exit 1.

  scripts/nmt/nmt_speech_to_{text,speech}.py
    - Drop unused `import grpc`; remove the catch-all that printed
      "Error during translation" and exited 0.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
VDR 26.02 found that realtime_tts_client.py silently ignored --voice and
fell back to the server default (Mia). Tracing the WebSocket flow, the
synthesize_session.update payload was built by deep-mutating the response
from POST /v1/realtime/synthesis_sessions — an InitialSynthesisSessionConfig
that carries id/object/client_secret fields not present in
BaseSynthesisSessionConfig (the type the server validates the update
against). Carrying those keys through to the override, plus the shallow
.copy() + _safe_update_config nested-dict mutation, was the path that let
the voice_name override fail to land on published 26.02 NIMs.

Build the update payload explicitly from CLI args instead, so only fields
the user actually overrode reach the server, in the exact shape documented
in the SynthesisSessionUpdateMessage schema. Bump the override summary to
INFO so users can see which fields were sent. After the
synthesize_session.updated response, compare the server-applied voice_name
and language_code against what was requested and log a WARNING on
mismatch — defense-in-depth so any future server-side drop surfaces in the
client log instead of as a wrong-sounding audio file.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Only import parse_custom_configuration and pass custom_configuration to
synthesize/synthesize_online when --custom-configuration is supplied,
so talk.py keeps working against older riva-client wheels that lack
the function and the kwarg.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
atomer-nvidia and others added 3 commits May 18, 2026 19:08
cli_main and EXIT_BAD_INPUT were added recently in argparse_utils and
are not present in older riva-client wheels. Wrap their imports in a
try/except across all asr/nmt/tts client scripts, falling back to a
no-op decorator and EXIT_BAD_INPUT=2 so the scripts keep running
against older installed wheels (only the structured exit codes are
lost in that case).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
fix: nmt: surface EXIT_BAD_INPUT when --text-file has no non-empty lines
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants