-
Notifications
You must be signed in to change notification settings - Fork 0
AudioContent
gitpavleenbali edited this page Feb 17, 2026
·
2 revisions
The Audio class handles audio data for multimodal AI interactions.
from pyai.multimodal import Audioaudio = Audio.from_file("recording.mp3")audio = Audio.from_url("https://example.com/audio.wav")with open("audio.wav", "rb") as f:
audio = Audio.from_bytes(f.read(), format="wav")import numpy as np
# Generate audio data
samples = np.sin(2 * np.pi * 440 * np.linspace(0, 1, 44100))
audio = Audio.from_numpy(samples, sample_rate=44100)| Property | Type | Description |
|---|---|---|
duration |
float | Duration in seconds |
sample_rate |
int | Samples per second |
channels |
int | Number of channels |
format |
str | Audio format |
size_bytes |
int | File size |
Convert to different format:
# Convert to MP3
mp3_audio = audio.convert(format="mp3", bitrate=192)
# Convert to WAV
wav_audio = audio.convert(format="wav")
# Convert sample rate
resampled = audio.convert(sample_rate=16000)Trim audio:
# Trim to specific duration
trimmed = audio.trim(start=0.0, end=30.0) # First 30 seconds
# Trim from start
trimmed = audio.trim(start=5.0) # Skip first 5 secondsSplit into segments:
# Split into 30-second chunks
segments = audio.split(segment_duration=30.0)Save to file:
audio.save("output.mp3")
audio.save("output.wav", format="wav")Transcribe audio to text:
text = audio.transcribe()
print(text)
# With options
result = audio.transcribe(
language="en",
timestamps=True
)from pyai.multimodal import MultimodalContent, Audio
content = MultimodalContent()
content.add_text("Please transcribe and summarize this audio:")
content.add_audio(Audio.from_file("meeting.mp3"))
response = agent.run(content)from pyai import ask
from pyai.multimodal import Audio
audio = Audio.from_file("speech.wav")
response = ask(
"What language is being spoken and what is the topic?",
audio=[audio]
)| Format | Read | Write | Notes |
|---|---|---|---|
| MP3 | β | β | Most common |
| WAV | β | β | Uncompressed |
| M4A | β | β | AAC encoded |
| FLAC | β | β | Lossless |
| OGG | β | β | Open format |
| WebM | β | β | Web optimized |
# Increase volume
louder = audio.adjust_volume(gain_db=3.0)
# Decrease volume
quieter = audio.adjust_volume(gain_db=-3.0)
# Normalize
normalized = audio.normalize()# Convert to mono
mono = audio.to_mono()
# Convert to stereo
stereo = audio.to_stereo()
# Extract channel
left_channel = audio.get_channel(0)- Multimodal-Module - Module overview
- ImageContent - Image handling
- VideoContent - Video handling
- Voice-Module - Real-time voice
Intelligence, Embedded.