diff --git a/authors/assets/boris-github-avatar.jpg b/authors/assets/boris-github-avatar.jpg new file mode 100644 index 00000000..dbd43dec Binary files /dev/null and b/authors/assets/boris-github-avatar.jpg differ diff --git a/authors/boris.md b/authors/boris.md new file mode 100644 index 00000000..6eb4495b --- /dev/null +++ b/authors/boris.md @@ -0,0 +1,13 @@ +Author: Boris +Title: Software Engineer +Description: Boris is a software engineer focused on practical developer +workflows, automation, and reliable local tooling. He writes guides that turn +small open-source utilities into reproducible workflows developers can run, +test, and adapt inside isolated development environments. +Author Image: /authors/assets/boris-github-avatar.jpg +Author LinkedIn: +Author Twitter: +Company Name: +Company Description: +Company Logo Dark: +Company Logo White: diff --git a/definitions/20260524_definition_on_device_speech_to_text.md b/definitions/20260524_definition_on_device_speech_to_text.md new file mode 100644 index 00000000..b6316e00 --- /dev/null +++ b/definitions/20260524_definition_on_device_speech_to_text.md @@ -0,0 +1,22 @@ +--- +title: 'On-device speech-to-text' +description: + 'On-device speech-to-text converts recorded or live audio into text on the + same machine that holds the audio, reducing network exposure and making + transcription workflows easier to test in isolated development environments.' +--- + +# On-device speech-to-text + +## Definition + +On-device speech-to-text is a transcription approach where the speech +recognition engine runs locally on the machine that stores the audio. Instead +of uploading recordings to a hosted transcription API, the workflow loads a +local model or SDK, processes the file in place, and writes the transcript back +to local storage. + +This pattern is useful for private recordings, regulated data, offline demos, +and reproducible development environments. It still may require a license key, +model file, or vendor SDK, but the audio payload itself does not need to leave +the workspace during transcription. diff --git a/guides/20260524_run_picovoice_leopard_transcription_with_sapat_in_daytona.md b/guides/20260524_run_picovoice_leopard_transcription_with_sapat_in_daytona.md new file mode 100644 index 00000000..a3b4d246 --- /dev/null +++ b/guides/20260524_run_picovoice_leopard_transcription_with_sapat_in_daytona.md @@ -0,0 +1,336 @@ +--- +title: 'Run Picovoice Leopard with Sapat in Daytona' +description: + 'Build a private, on-device transcription workflow with Sapat, Picovoice + Leopard, ffmpeg, and a reproducible Daytona sandbox.' +date: 2026-05-24 +author: 'Boris' +tags: ['daytona', 'python', 'speech-to-text', 'sapat'] +--- + +# Run Picovoice Leopard with Sapat in Daytona + +## Introduction + +Sapat is a small Python command-line tool that turns video files into text. It +already handles the boring parts of a transcription workflow: taking an input +file, converting the audio with `ffmpeg`, calling a selected transcription +backend, and writing a `.txt` file next to the source media. + +This guide shows how to run Sapat with Picovoice Leopard inside a Daytona +sandbox. Leopard is useful when you want +[on-device speech-to-text](../definitions/20260524_definition_on_device_speech_to_text.md): +the audio is processed locally in the workspace instead of being uploaded as +the request body of a hosted transcription API. Picovoice still requires an +AccessKey, but the recording stays inside the Daytona environment. + +The workflow is intentionally plain. You will create an isolated Daytona +sandbox, install Sapat with the optional Leopard provider, add your Picovoice +AccessKey as an environment variable, and run one transcription command against +a video or audio file. + +The main benefit is operational clarity. Hosted speech-to-text APIs are often +the right choice for large-scale production pipelines, but they make local +testing harder because the input audio leaves the development environment and +the request depends on provider-side limits. A local provider lets an engineer +debug the conversion step, transcript file handling, and prompt-independent +audio quality in one place. Daytona adds a clean boundary around that work: the +workspace can be created, rebuilt, and discarded without changing the host +machine. + +![Sapat Picovoice Leopard workflow](assets/20260524_run_picovoice_leopard_transcription_with_sapat_in_daytona.svg) + +## TL;DR + +- Create a Daytona sandbox so the transcription workflow is reproducible. +- Install Sapat with the optional `pvleopard` dependency. +- Keep `PICOVOICE_ACCESS_KEY` in your shell or `.env`; never commit it. +- Run `sapat --api leopard` to convert audio and write the transcript. +- Use the troubleshooting checklist when `ffmpeg`, Python, or the AccessKey is + the real blocker. + +## Prerequisites + +You need: + +- A Daytona account and a working `daytona` CLI. +- Python 3.9 or newer. Picovoice Leopard's Python SDK requires Python 3.9+. +- `ffmpeg` in the sandbox. +- A Picovoice AccessKey from the Picovoice Console. +- A short `.mp4`, `.mp3`, `.wav`, or `.flac` file for validation. + +The companion Sapat provider implementation for this guide is available in +`nibzard/sapat#44`. If you are testing before that PR is merged, fetch the +branch directly as shown below. + +## Step 1: Create a Daytona sandbox + +Create a new Daytona sandbox and open a shell in it. The exact command depends +on how you use Daytona, but the CLI supports creating a sandbox directly: + +```bash +daytona create +``` + +Once the sandbox is ready, open its terminal. A clean sandbox keeps the +transcription stack separate from your laptop and makes it easy to repeat the +steps later. + +Install system packages used by Sapat: + +```bash +sudo apt-get update +sudo apt-get install -y ffmpeg git python3 python3-venv +``` + +Confirm the tools are present: + +```bash +python3 --version +ffmpeg -version | head -n 1 +git --version +``` + +If Python is older than 3.9, create the sandbox from a newer Python image or +install a newer interpreter before continuing. + +## Step 2: Clone Sapat and install the Leopard extra + +Clone Sapat: + +```bash +git clone https://github.com/nibzard/sapat.git +cd sapat +``` + +If the Picovoice provider PR has not been merged yet, switch to the companion +branch: + +```bash +git fetch https://github.com/kpoxo6op/sapat.git \ + codex/leopard-transcription-provider +git checkout FETCH_HEAD +``` + +Create and activate a virtual environment: + +```bash +python3 -m venv .venv +. .venv/bin/activate +python -m pip install --upgrade pip +``` + +Install Sapat with Leopard support: + +```bash +pip install -e ".[leopard]" +``` + +This installs the normal Sapat dependencies plus `pvleopard`, the official +Picovoice Leopard Python SDK. The provider is lazy-loaded, so the rest of Sapat +can still run without `pvleopard` when you choose `--api openai`, `--api groq`, +or `--api azure`. + +Check that the command sees the new backend: + +```bash +sapat --help +``` + +The API option should include `leopard`: + +```text +--api [openai|groq|azure|leopard] +``` + +## Step 3: Configure Picovoice Leopard + +Set your Picovoice AccessKey in the shell: + +```bash +export PICOVOICE_ACCESS_KEY="paste-your-access-key-here" +``` + +For repeat runs inside the same repo, you can also create a local `.env` file. +Do not commit this file. + +```bash +cat > .env <<'EOF' +PICOVOICE_ACCESS_KEY=paste-your-access-key-here +PICOVOICE_LEOPARD_ENABLE_PUNCTUATION=true +PICOVOICE_LEOPARD_ENABLE_DIARIZATION=false +EOF +``` + +Optional settings: + +```bash +export PICOVOICE_LEOPARD_MODEL_PATH="/absolute/path/to/custom-model.pv" +export PICOVOICE_LEOPARD_DEVICE="best" +export PICOVOICE_LEOPARD_ENABLE_PUNCTUATION="true" +export PICOVOICE_LEOPARD_ENABLE_DIARIZATION="false" +``` + +Use `PICOVOICE_LEOPARD_MODEL_PATH` when you have a custom `.pv` model from +Picovoice Console. Use `PICOVOICE_LEOPARD_DEVICE=best` when your sandbox or +workstation has more than one possible execution device and you want the SDK to +select the best available target. + +Leopard supports several languages through model files. If your recording is +not in English, download or create the correct model in Picovoice Console and +point `PICOVOICE_LEOPARD_MODEL_PATH` at that `.pv` file. Keeping the model path +explicit is also useful in a team setting because every developer can see which +model was used for a transcript. + +## Step 4: Add a media file + +Copy a short test recording into the sandbox. For example: + +```bash +mkdir -p samples +cp ~/Downloads/standup-recording.mp4 samples/standup-recording.mp4 +``` + +If you only want to verify that the conversion path works, generate a tiny MP3 +file with `ffmpeg`: + +```bash +ffmpeg -f lavfi -i sine=frequency=880:duration=2 \ + -ar 44100 -ac 1 samples/tone.mp3 +``` + +The tone file will not produce a useful transcript, but it can confirm that +`ffmpeg`, paths, and the Leopard provider are wired correctly. For a real +transcript, use a file with spoken audio. + +## Step 5: Run Sapat with Leopard + +Run Sapat against a video file: + +```bash +sapat samples/standup-recording.mp4 --api leopard --quality M +``` + +Sapat will: + +1. Convert `samples/standup-recording.mp4` to + `samples/standup-recording.mp3`. +2. Initialize Picovoice Leopard with your AccessKey and optional settings. +3. Process the MP3 locally. +4. Write `samples/standup-recording.txt`. +5. Remove the temporary MP3 file. + +Review the transcript: + +```bash +sed -n '1,120p' samples/standup-recording.txt +``` + +For directories, Sapat processes each `.mp4` file: + +```bash +sapat samples --api leopard --quality M +``` + +That is useful when you have a folder of meeting recordings and want one text +file per video. + +For longer recordings, start with one short sample before batch processing the +whole directory. That confirms the AccessKey, model, and language settings +before you spend time on every file. It also gives you a quick quality check: +if the transcript misses names or product terms, create a custom Leopard model +or choose a clearer audio source before running the full set. + +## When to use this workflow + +Use the Leopard backend when privacy, offline behavior, or repeatable local +testing matters more than using a hosted transcription endpoint. Examples +include internal engineering standups, customer-call excerpts that cannot leave +your controlled workspace, and regression fixtures where a test should not +depend on a remote API being available. + +Use a hosted provider when you need a managed service, centralized billing, +very large batch throughput, or an API feature that Leopard does not provide +for your use case. Sapat keeps those choices behind the same `--api` flag, so +teams can use Leopard for local validation and switch to OpenAI, Groq, or Azure +for another environment without changing the rest of the workflow. + +## Step 6: Validate and capture evidence + +Before sharing the workflow with a teammate, capture a short validation log: + +```bash +python -m unittest discover -s tests -v +sapat --help +ls -lh samples/*.txt +``` + +For a content or code contribution, include: + +- The Sapat provider PR link. +- The exact Sapat command used. +- The Python version. +- Whether the input was a real speech sample or a generated smoke-test file. +- Confirmation that no `.env` file or AccessKey was committed. + +## Common Issues and Troubleshooting + +**Problem:** `RuntimeError: Picovoice Leopard support requires pvleopard.` + +**Solution:** Install the optional dependency in the active virtual environment: + +```bash +pip install -e ".[leopard]" +``` + +**Problem:** `PICOVOICE_ACCESS_KEY is required for --api leopard.` + +**Solution:** Export the key in the terminal or add it to an uncommitted `.env` +file: + +```bash +export PICOVOICE_ACCESS_KEY="paste-your-access-key-here" +``` + +**Problem:** `ffmpeg` is missing. + +**Solution:** Install it in the sandbox: + +```bash +sudo apt-get update +sudo apt-get install -y ffmpeg +``` + +**Problem:** The transcript is empty. + +**Solution:** Confirm that the sample contains spoken audio. A generated tone +file is useful for smoke testing, but it will not produce meaningful words. +Also check whether the selected Leopard model matches the language of the +recording. + +**Problem:** The custom model path fails. + +**Solution:** Use an absolute path for `PICOVOICE_LEOPARD_MODEL_PATH` and make +sure the `.pv` file is available inside the Daytona sandbox, not just on your +local machine. + +## Conclusion + +Sapat plus Picovoice Leopard gives you a small, repeatable transcription +workflow that keeps audio processing local to the Daytona sandbox. The command +surface stays the same as the hosted providers: install the backend, set the +environment variable, and choose `--api leopard`. + +This pattern works well for internal meetings, private demos, and test fixtures +where uploading recordings to a hosted transcription endpoint is unnecessary. +Because the setup lives in an isolated sandbox, you can rebuild it, hand it to a +teammate, or attach it to a pull request without depending on whatever happens +to be installed on your laptop. + +## References + +- [Sapat repository](https://github.com/nibzard/sapat) +- [Companion Sapat Picovoice provider PR](https://github.com/nibzard/sapat/pull/44) +- [Picovoice Leopard Python Quick Start](https://picovoice.ai/docs/quick-start/leopard-python/) +- [Picovoice Leopard Python API](https://picovoice.ai/docs/api/leopard-python/) +- [Daytona Getting Started](https://www.daytona.io/docs/getting-started) diff --git a/guides/assets/20260524_run_picovoice_leopard_transcription_with_sapat_in_daytona.svg b/guides/assets/20260524_run_picovoice_leopard_transcription_with_sapat_in_daytona.svg new file mode 100644 index 00000000..5a27e6b2 --- /dev/null +++ b/guides/assets/20260524_run_picovoice_leopard_transcription_with_sapat_in_daytona.svg @@ -0,0 +1,32 @@ + + Sapat Picovoice Leopard transcription workflow in Daytona + A workflow showing a Daytona sandbox running Sapat, converting video to MP3, transcribing locally with Picovoice Leopard, and writing a transcript file. + + + Daytona sandbox + Isolated Python workspace with ffmpeg, Sapat, and the optional pvleopard SDK + + + video file + meeting.mp4 + + + + ffmpeg + MP3 conversion + + + + Leopard + local STT engine + + + + text + .txt + + + sapat meeting.mp4 --api leopard + + PICOVOICE_ACCESS_KEY=... +