-
Notifications
You must be signed in to change notification settings - Fork 0
Synthesis
gitpavleenbali edited this page Feb 17, 2026
·
2 revisions
The synthesis module converts text to speech using AI voice models.
from pyai.voice import Synthesizer
from pyai.voice.synthesis import SynthesisResultSynthesizer(
model: str = "tts-1", # TTS model
voice: str = "alloy", # Voice selection
speed: float = 1.0, # Speech speed (0.25-4.0)
response_format: str = "mp3" # Output format
)| Voice | Description |
|---|---|
alloy |
Neutral, balanced |
echo |
Warm, conversational |
fable |
Expressive, British accent |
onyx |
Deep, authoritative |
nova |
Friendly, energetic |
shimmer |
Clear, professional |
synthesizer = Synthesizer(voice="nova")
# Generate audio
result = synthesizer.speak("Hello, how can I help you today?")
# Save to file
result.save("greeting.mp3")
# Get bytes
audio_bytes = result.audio_data# For longer text, stream to reduce latency
for chunk in synthesizer.stream("This is a longer message that will be streamed..."):
play_audio(chunk.data)The result object contains:
result.audio_data # Raw audio bytes
result.format # Audio format (mp3, wav, etc.)
result.duration # Duration in seconds
result.sample_rate # Sample rate
result.voice # Voice used# Save with format
result.save("output.mp3")
result.save("output.wav", format="wav")
result.save("output.ogg", format="opus")# MP3 (default, smallest)
synth = Synthesizer(response_format="mp3")
# Opus (low latency streaming)
synth = Synthesizer(response_format="opus")
# AAC (high quality)
synth = Synthesizer(response_format="aac")
# FLAC (lossless)
synth = Synthesizer(response_format="flac")
# WAV (uncompressed)
synth = Synthesizer(response_format="wav")
# PCM (raw audio)
synth = Synthesizer(response_format="pcm")# Standard quality (faster, cheaper)
synth = Synthesizer(model="tts-1")
# HD quality (higher fidelity)
synth = Synthesizer(model="tts-1-hd")# Slower speech
synth = Synthesizer(speed=0.75)
# Faster speech
synth = Synthesizer(speed=1.5)
# Range: 0.25 to 4.0texts = [
"Welcome to our service.",
"How can I assist you today?",
"Thank you for your patience."
]
results = synthesizer.batch_speak(texts)
for i, result in enumerate(results):
result.save(f"audio_{i}.mp3")For advanced control (when supported):
ssml_text = """
<speak>
<emphasis level="strong">Welcome</emphasis> to our service.
<break time="500ms"/>
How may I <prosody rate="slow">assist you</prosody> today?
</speak>
"""
result = synthesizer.speak(ssml_text, ssml=True)async def generate_speech_async():
synth = Synthesizer()
# Async generation
result = await synth.speak_async("Hello, world!")
# Async streaming
async for chunk in synth.stream_async("Long text here..."):
await play_audio_async(chunk)- Voice-Module - Module overview
- VoiceSession - Real-time voice
- Transcription - Speech-to-text
Intelligence, Embedded.