-
Notifications
You must be signed in to change notification settings - Fork 245
docs: add whisper-webui #2786
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
fnalways
wants to merge
3
commits into
beclab:main
Choose a base branch
from
fnalways:docs/feat/add-whisper-webui
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
docs: add whisper-webui #2786
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,147 @@ | ||
| --- | ||
| outline: [2, 3] | ||
| description: Learn how to use Whisper-WebUI on Olares for speech-to-text transcription, subtitle generation, real-time recording, subtitle translation, and vocal separation across 96 languages. | ||
| head: | ||
| - - meta | ||
| - name: keywords | ||
| content: Olares, Whisper-WebUI, speech to text, transcription, subtitles, AI, self-hosted, vocal separation | ||
| app_version: "1.0.3" | ||
| doc_version: "1.0" | ||
| doc_updated: "2026-04-10" | ||
| --- | ||
|
|
||
| # Transcribe audio and video with Whisper-WebUI | ||
|
|
||
| Whisper-WebUI is an open-source speech-to-text tool powered by OpenAI's Whisper model, supporting 96 languages. It accepts audio files, video files, YouTube links, and live microphone input, and generates timestamped subtitles in formats like SRT and TXT. Beyond transcription, it can also translate subtitle files and separate vocals from background music. | ||
|
|
||
| ## Install Whisper-WebUI | ||
|
|
||
| 1. Open Market and search for "Whisper-WebUI". | ||
| {width=80%} | ||
|
|
||
| 2. Click **Get**, then **Install**, and wait for installation to complete. | ||
|
|
||
| ## Understand the basics | ||
|
|
||
| ### Main workflows | ||
|
|
||
| Whisper-WebUI includes five main tabs. Each tab is designed for a different workflow. | ||
|
|
||
| | Tab | Input | Output | Best for | | ||
| | :--- | :--- | :--- | :--- | | ||
| | File | Local audio/video files | Transcripts & subtitle files | Generating subtitles for podcasts, interviews, or local media. | | ||
| | YouTube | YouTube video URL | Transcripts & subtitle files | Transcribing online videos without downloading them first. | | ||
| | Mic | Microphone recording | Transcripts from recorded audio | Dictating voice notes, spoken drafts, or short speeches. | | ||
| | T2T Translation | Subtitle files | Translated subtitle files | Localizing video content into other languages. | | ||
| | BGM Separation | Audio files | Isolated vocal & instrumental tracks | Removing background music for better transcription, or remixing. | | ||
|
|
||
|  | ||
|
|
||
| ### Choose an output format | ||
|
|
||
| When using the **File**, **YouTube**, or **Mic** tabs, choose the output format based on how you plan to use the result. | ||
|
|
||
| | Format | Best for | | ||
| |:--|:--| | ||
| | SRT | Standard subtitle files for video players and editors. | | ||
| | WebVTT | Web video subtitles and browser-based playback. | | ||
| | TXT | Plain text transcripts without timestamps. | | ||
| | LRC | Synchronized lyrics for music players and audio applications. | | ||
|
|
||
| ### Choose a transcription model | ||
|
|
||
| When using the **File**, **YouTube**, or **Mic** tabs, the exact list of available models may vary slightly, but they generally follow the same Whisper naming patterns. | ||
|
|
||
| - **Smaller models**: models starting with `tiny` or `base` | ||
| Choose these when you want faster results or have limited hardware resources. | ||
|
|
||
| - **Mid-size models**: models starting with `small` or `medium` | ||
| Choose these for a balance between speed and accuracy. | ||
|
|
||
| - **Larger models**: `large-v1`, `large-v2`, `large-v3`, `large` | ||
| Choose these when accuracy matters more than speed. | ||
|
|
||
| - **Distilled models**: models starting with `distil-` | ||
| Choose these when you want a lighter and faster alternative. | ||
|
|
||
| - **Turbo models**: `large-v3-turbo`, `turbo` | ||
| Choose these for fast transcription, but not for speech-to-English translation. | ||
|
|
||
| - **English-only models**: models ending in `.en` | ||
| Choose these when the source audio is English only. | ||
|
|
||
| - **Multilingual models**: models without `.en` | ||
| Choose these for non-English audio, mixed-language audio, or speech-to-English translation. | ||
|
|
||
| For most transcription tasks, start with `small` or `medium`. If the result is not accurate enough, move to a larger model. | ||
|
|
||
| ## Use Whisper-WebUI | ||
|
|
||
| ### Transcribe local files | ||
|
|
||
| 1. Click the upload area and select an audio or video file. | ||
| 2. Under **Model**, select a transcription model (e.g., V3 for better accuracy). | ||
| 3. Under **Language**, specify the source language. | ||
| 4. Under **File Format**, choose your preferred output format (e.g., SRT). | ||
| 5. Click **GENERATE SUBTITLE FILE**. | ||
|
|
||
| Once complete, preview the result in the left panel and download the subtitle file. | ||
|
|
||
| {width=80%} | ||
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we make the screenshot slightly larger?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. All screenshots in this doc perhaps. |
||
|
|
||
| ### Transcribe YouTube videos | ||
|
|
||
| 1. Paste the YouTube video URL into the input field. Whisper-WebUI automatically detects the video's thumbnail, title, and description. | ||
|
|
||
| {width=80%} | ||
|
|
||
| 2. Under **Model**, select a transcription model. | ||
| 3. Under **Language**, specify the video's language. | ||
| 4. Under **File Format**, choose your preferred output format. | ||
| 5. Adjust additional settings if needed, such as filtering background music from the audio before transcription. | ||
| 6. Click **GENERATE SUBTITLE FILE**. | ||
|
|
||
| Once complete, preview the result in the left panel and download the subtitle file. | ||
|
|
||
| {width=80%} | ||
|
|
||
| ### Record and transcribe with microphone | ||
|
|
||
| 1. Click the record button to start recording. You can pause at any time. | ||
|
|
||
| {width=80%} | ||
|
|
||
| 2. Click **Stop** to end the recording. You can preview and trim the audio. | ||
|
|
||
| {width=80%} | ||
|
|
||
| 3. Select the **Model** and **File Format** for transcription. | ||
| 4. Click **GENERATE SUBTITLE FILE**. | ||
|
|
||
| Once complete, preview and download the transcription. | ||
|
|
||
| ### Translate subtitles | ||
|
|
||
| 1. Upload the subtitle file you want to translate. | ||
| 2. Under the **NLLB** tab, select a translation model. | ||
|
|
||
| {width=80%} | ||
|
|
||
| 3. Set the **Source Language** and **Target Language**. | ||
| 4. Click **Generate Translation File**. | ||
|
|
||
| Once complete, preview and download the translated subtitle file. | ||
|
|
||
| ### Separate vocals from background music | ||
|
|
||
| 1. Upload the audio file you want to process. | ||
| 2. Under **Device**, select a processing device based on your hardware. | ||
| 3. Under **Model**, select a separation model. The default segment size is `256`. | ||
| 4. Click **SEPARATE BACKGROUND MUSIC**. | ||
|
|
||
| {width=90%} | ||
|
|
||
| Once complete, find the separated audio tracks in the Files app at the following paths. | ||
| - Instrumental: | ||
| `External/olares/ai/output/whisperwebui/UVR/instrumental` | ||
| - Vocal: `External/olares/ai/output/whisperwebui/UVR/vocals` | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| <!--@include: ../../use-cases/whisper-webui.md--> |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use a table format perhaps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And group models by, for example, speed/language