-
Notifications
You must be signed in to change notification settings - Fork 17
Point RT quickstart to new Python client #228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
9bed98f
Point quickstart to new Python client
d0f0a04
address rt spelling changes
fe591b5
Match batch and rt steps
2d3a409
uplift rt JS docs
767c3dd
rename extension
J-Jaywalker 43d0b3b
point quickstart to new mjs module
J-Jaywalker 1c3ca30
Update docs/speech-to-text/realtime/quickstart.mdx
J-Jaywalker bea34ab
fix quickstart missing quote mark
J-Jaywalker File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
58 changes: 58 additions & 0 deletions
58
docs/speech-to-text/realtime/assets/javascript-realtime-example.mjs
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,58 @@ | ||
| import { spawn } from "node:child_process"; | ||
| import { createSpeechmaticsJWT } from "@speechmatics/auth"; | ||
| import { RealtimeClient } from "@speechmatics/real-time-client"; | ||
|
|
||
| const apiKey = YOUR_API_KEY; | ||
| const client = new RealtimeClient(); | ||
|
|
||
| const audio_format = { | ||
| type: "raw", | ||
| encoding: "pcm_s16le", | ||
| sample_rate: 44100, | ||
| }; | ||
|
|
||
| async function transcribe() { | ||
| client.addEventListener("receiveMessage", ({ data }) => { | ||
| if (data.message === "AddTranscript") { | ||
| const transcript = data.metadata?.transcript; | ||
| if (transcript) console.log(`[Final]: ${transcript}`); | ||
| } else if (data.message === "Error") { | ||
| console.error(`Error [${data.type}]: ${data.reason}`); | ||
| process.exit(1); | ||
| } | ||
| }); | ||
|
|
||
| const jwt = await createSpeechmaticsJWT({ type: "rt", apiKey, ttl: 60 }); | ||
|
|
||
| await client.start(jwt, { | ||
| transcription_config: { | ||
| language: "en", | ||
| max_delay: 0.7 | ||
| }, | ||
| audio_format, | ||
| }); | ||
|
|
||
| const recorder = spawn("sox", [ | ||
|
J-Jaywalker marked this conversation as resolved.
|
||
| "-d", // default audio device (mic) | ||
| "-q", // quiet | ||
| "-r", String(audio_format.sample_rate), // sample rate | ||
| "-e", "signed-integer", // match pcm_s16le | ||
| "-b", "16", // match pcm_s16le | ||
| "-c", "1", // mono | ||
| "-t", "raw", // raw PCM output | ||
| "-", // pipe to stdout | ||
| ]); | ||
|
|
||
| recorder.stdout.on("data", (chunk) => client.sendAudio(chunk)); | ||
| recorder.stderr.on("data", (d) => console.error(`sox: ${d}`)); | ||
|
|
||
| process.on("SIGINT", () => { | ||
| recorder.kill(); | ||
| client.stopRecognition({ noTimeout: true }); | ||
| }); | ||
| } | ||
|
|
||
| transcribe().catch((err) => { | ||
| console.error(err); | ||
| process.exit(1); | ||
| }); | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,60 @@ | ||
| import asyncio | ||
| from speechmatics.rt import ( | ||
| AudioEncoding, AudioFormat, AuthenticationError, | ||
| Microphone, ServerMessageType, TranscriptResult, | ||
| TranscriptionConfig, AsyncClient, | ||
| ) | ||
|
|
||
| API_KEY = YOUR_API_KEY | ||
|
|
||
| # Set up config and format for transcription | ||
| audio_format = AudioFormat( | ||
| encoding=AudioEncoding.PCM_S16LE, | ||
| sample_rate=16000, | ||
| chunk_size=4096, | ||
| ) | ||
| config = TranscriptionConfig( | ||
| language="en", | ||
| max_delay=0.7, | ||
| ) | ||
|
|
||
| async def main(): | ||
|
|
||
| # Set up microphone | ||
| mic = Microphone( | ||
| sample_rate=audio_format.sample_rate, | ||
| chunk_size=audio_format.chunk_size | ||
| ) | ||
| if not mic.start(): | ||
| print("Mic not started — please install PyAudio") | ||
|
|
||
| try: | ||
| async with AsyncClient(api_key=API_KEY) as client: | ||
| # Handle ADD_TRANSCRIPT message | ||
| @client.on(ServerMessageType.ADD_TRANSCRIPT) | ||
| def handle_finals(msg): | ||
| if final := TranscriptResult.from_message(msg).metadata.transcript: | ||
| print(f"[Final]: {final}") | ||
|
|
||
| try: | ||
| # Begin transcribing | ||
| await client.start_session( | ||
| transcription_config=config, | ||
| audio_format=audio_format | ||
| ) | ||
| while True: | ||
| await client.send_audio( | ||
| await mic.read( | ||
| chunk_size=audio_format.chunk_size | ||
| ) | ||
| ) | ||
| except KeyboardInterrupt: | ||
| pass | ||
| finally: | ||
| mic.stop() | ||
|
|
||
| except AuthenticationError as e: | ||
| print(f"Auth error: {e}") | ||
|
|
||
| if __name__ == "__main__": | ||
| asyncio.run(main()) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,98 +1,189 @@ | ||
| --- | ||
| description: Learn how to convert streaming audio to text. | ||
| pagination_prev: null | ||
| pagination_next: null | ||
| description: Learn how to transcribe streaming audio to text in real-time. | ||
| --- | ||
|
|
||
| import Admonition from '@theme/Admonition'; | ||
| import CodeBlock from '@theme/CodeBlock'; | ||
| import Tabs from '@theme/Tabs'; | ||
| import TabItem from '@theme/TabItem'; | ||
| import { Grid } from '@radix-ui/themes'; | ||
| import { LinkCard } from "@site/src/theme/LinkCard"; | ||
| import { Users, BookMarked, Zap, Mic, Radio, Clock } from 'lucide-react'; | ||
|
|
||
| import javascriptRadioExample from "./assets/javascript-radio-example.js?raw" | ||
| import pythonRadioExample from "./assets/url-example.py?raw" | ||
| import javascriptRtExample from "./assets/javascript-realtime-example.mjs?raw" | ||
| import pythonRtExample from "./assets/sm-rt-example.py?raw" | ||
|
|
||
| # Quickstart | ||
|
|
||
| :::tip | ||
| The easiest way to try Realtime transcription is via the [web portal](https://portal.speechmatics.com/jobs/create/real-time). | ||
| The quickest way to try Realtime transcription is via the [web portal](https://portal.speechmatics.com/jobs/create/real-time) — no code required. | ||
| ::: | ||
|
|
||
| ## Using the Realtime SaaS webSocket API | ||
| ## Using the Realtime API | ||
|
|
||
| The Realtime API streams audio over a WebSocket connection and returns transcript results as you speak. Unlike the [Batch API](/speech-to-text/batch/quickstart), results arrive continuously — within milliseconds of the spoken words. | ||
|
|
||
| ### 1. Create an API key | ||
|
|
||
| [Create an API key in the portal here](https://portal.speechmatics.com/settings/api-keys), which you'll use to securely access the API. | ||
| Store the key as a managed secret. | ||
| [Create an API key in the portal](https://portal.speechmatics.com/settings/api-keys), which you'll use to securely access the API. Store the key as a managed secret. | ||
|
|
||
| :::info | ||
| Enterprise customers may need to speak to [Support](https://support.speechmatics.com) to get your API keys. | ||
| ::: | ||
|
|
||
| ### 2. Pick and install a library | ||
|
|
||
| Check out our [JavaScript client](https://www.npmjs.com/package/@speechmatics/real-time-client) or [Python client](https://pypi.org/project/speechmatics-python/) to get started. | ||
| ### 2. Install the library | ||
|
|
||
| <Tabs groupId="language"> | ||
| <TabItem value="javascript" label="JavaScript"> | ||
| ``` | ||
| npm install @speechmatics/real-time-client @speechmatics/auth | ||
| ``` | ||
| </TabItem> | ||
| <TabItem value="python" label="Python"> | ||
| ``` | ||
| pip3 install speechmatics-python | ||
| ``` | ||
| Install using pip: | ||
| ``` | ||
| pip install speechmatics-rt pyaudio | ||
| ``` | ||
| :::note | ||
| `pyaudio` is required for microphone input in this quickstart. | ||
| ::: | ||
| </TabItem> | ||
| <TabItem value="javascript" label="JavaScript"> | ||
| Install using npm: | ||
| ``` | ||
| npm install @speechmatics/real-time-client @speechmatics/auth | ||
| ``` | ||
| :::note | ||
| This quickstart uses `sox` for microphone input. Install it with `brew install sox` (macOS) or `apt install sox` (Linux). | ||
| ::: | ||
| </TabItem> | ||
| </Tabs> | ||
|
|
||
| ### 3. Run the example | ||
|
J-Jaywalker marked this conversation as resolved.
|
||
|
|
||
| ### 3. Insert your API key | ||
|
|
||
| Paste your API key into `YOUR_API_KEY` in the code. | ||
| Replace `YOUR_API_KEY` with your key, then run the script. | ||
|
|
||
| <Tabs groupId="language"> | ||
| <TabItem value="javascript" label="JavaScript"> | ||
| <CodeBlock language="javascript"> | ||
| {javascriptRadioExample} | ||
| </CodeBlock> | ||
| </TabItem> | ||
| <TabItem value="python" label="Python"> | ||
| <CodeBlock language="python"> | ||
| {pythonRadioExample} | ||
| {pythonRtExample} | ||
| </CodeBlock> | ||
| Press `Ctrl+C` to stop. | ||
| </TabItem> | ||
| <TabItem value="javascript" label="NodeJS + Sox"> | ||
| <CodeBlock language="javascript"> | ||
| {javascriptRtExample} | ||
| </CodeBlock> | ||
| </TabItem> | ||
| </Tabs> | ||
| Speak into your microphone. You should see output like: | ||
| ``` | ||
| [Final]: Hello, welcome to Speechmatics. | ||
| [Final]: This is a real-time transcription example. | ||
| ``` | ||
| Press `Ctrl+C` to stop. | ||
|
|
||
| ## Understanding the output | ||
|
|
||
| The API returns two types of transcript results. **Finals** and **Partials**. | ||
|
|
||
| ## Transcript outputs | ||
| **Finals** represent the best transcription for a span of audio and are never updated once emitted. | ||
|
|
||
| The API returns transcripts in JSON format. You can receive two types of output: [Final](#final-transcripts) and [Partial](#partial-transcripts) transcripts. Choose the type based on your latency and accuracy needs. | ||
| **Partials** are emitted immediately as audio arrives and may be revised as more context is processed. | ||
|
|
||
| ### Final transcripts | ||
| | Type | Latency | Stability | Best for | | ||
| |------|---------|-----------|----------| | ||
| | **Final** | ~0.7–2s | Definitive, never revised | Accurate transcripts, subtitles | | ||
| | **Partial** | <500ms | May be revised | Live captions, voice interfaces | | ||
|
|
||
| Final transcripts are the definitive result. | ||
| - They reflect the best transcription for the spoken audio. | ||
| - Once displayed, they are not updated. | ||
| - Words arrive incrementally, with some delay. | ||
| ## Receiving Finals and Partials | ||
|
|
||
| You control the latency and accuracy tradeoff [using the `max_delay` setting](/speech-to-text/realtime/output#latency) in your `transcription_config`. | ||
| Larger values of `max_delay` increase accuracy by giving the system more time to process audio context. | ||
| To receive partials, add the following changes and handlers to your code: | ||
|
|
||
| :::tip | ||
| Best for accurate, completed transcripts where some delay is acceptable | ||
| ::: | ||
|
|
||
| ### Partial transcripts | ||
|
|
||
| Partial transcripts are low-latency and can update later as more conversation context arrives. | ||
| - You must enable them using `enable_partials` in your `transcription_config`. | ||
| - Partials are emitted quickly (typically less than 500ms). | ||
| - The engine may revise them as more audio is processed. | ||
|
|
||
| You can combine partials with finals for a responsive user experience — show partials first, then replace them with finals as they arrive. | ||
|
|
||
| You control the latency and accuracy tradeoff using the [`max_delay` setting](/speech-to-text/realtime/output#latency) in your `transcription_config`. | ||
|
|
||
| :::tip | ||
| Use partials for: real-time captions, voice interfaces, or any case where speed matters | ||
| ::: | ||
| <Tabs groupId="language"> | ||
| <TabItem value="python" label="Python"> | ||
| ```python {4,8-11} | ||
| config = TranscriptionConfig( | ||
| language="en", | ||
| max_delay=0.7, | ||
| enable_partials=True, | ||
| ) | ||
|
|
||
| async with AsyncClient(api_key=API_KEY) as client: | ||
| @client.on(ServerMessageType.ADD_PARTIAL_TRANSCRIPT) | ||
| def handle_partials(msg): | ||
| if partial := TranscriptResult.from_message(msg).metadata.transcript: | ||
| print(f"[Partial]: {partial}") | ||
|
|
||
| @client.on(ServerMessageType.ADD_TRANSCRIPT) | ||
| def handle_finals(msg): | ||
| if final := TranscriptResult.from_message(msg).metadata.transcript: | ||
| print(f"[Final]: {final}") | ||
| ``` | ||
| </TabItem> | ||
| <TabItem value="javascript" label="JavaScript"> | ||
| ```javascript {5,12-13} | ||
| await client.start(jwt, { | ||
| transcription_config: { | ||
| max_delay: 0.7, | ||
| language: "en", | ||
| enable_partials: true, | ||
| }, | ||
| }); | ||
|
|
||
| client.addEventListener("receiveMessage", ({ data }) => { | ||
|
J-Jaywalker marked this conversation as resolved.
|
||
| if (data.message === "AddTranscript") { | ||
| console.log(`[Final]: ${data.metadata.transcript}`); | ||
| } else if (data.message === "AddPartialTranscript") { | ||
| console.log(`[Partial]: ${data.metadata.transcript}\r`); | ||
| } | ||
| }); | ||
| ``` | ||
| </TabItem> | ||
| </Tabs> | ||
| With both handlers registered, you'll see partials arrive first, followed by the final result: | ||
| ``` | ||
| [Partial]: Hello | ||
| [Partial]: Hello welcome to | ||
| [Final]: Hello, welcome to Speechmatics. | ||
| ``` | ||
|
|
||
| ## Next steps | ||
|
|
||
| Now that you have Realtime transcription working, explore these features to build more powerful applications. | ||
|
|
||
| <Grid columns={{initial: "1", md: "2"}} gap="3"> | ||
| <LinkCard | ||
| title="Speaker Diarization" | ||
| description="Identify and label individual speakers in a multi-person conversation" | ||
| href="/speech-to-text/realtime/realtime-diarization" | ||
| icon={<Users/>} | ||
| /> | ||
| <LinkCard | ||
| title="Custom Dictionary" | ||
| description="Boost accuracy for domain-specific terms, names, and acronyms" | ||
| href="/speech-to-text/features/custom-dictionary" | ||
| icon={<BookMarked/>} | ||
| /> | ||
| <LinkCard | ||
| title="Turn Detection" | ||
| description="Detect when a speaker finishes their utterance — ideal for voice assistants" | ||
| href="/speech-to-text/realtime/turn-detection" | ||
| icon={<Clock/>} | ||
| /> | ||
| <LinkCard | ||
| title="Output & Latency" | ||
| description="Fine-tune transcript timing with max_delay and partial transcripts" | ||
| href="/speech-to-text/realtime/output" | ||
| icon={<Zap/>} | ||
| /> | ||
| <LinkCard | ||
| title="Audio Input" | ||
| description="Supported formats, sample rates, and how to send audio from any source" | ||
| href="/speech-to-text/realtime/input" | ||
| icon={<Mic/>} | ||
| /> | ||
| <LinkCard | ||
| title="Speaker Identification" | ||
| description="Recognize known speakers by enrolling voice profiles" | ||
| href="/speech-to-text/realtime/speaker-identification" | ||
| icon={<Radio/>} | ||
| /> | ||
| </Grid> | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.