Problem Statement
Currently, audio data is not supported. Therefore, it would be great if semantic operators supported audio input (.wav, .mp4, .mp3).
Proposed Solution
A possible solution would be to create the AudioArray class (similar to the existing AudioArray). A possible model that supports audio is GPT-4o.
Use Cases
A specific use case is, for instance, semantic filtering based on sound, generation of a label from the sound, e.g., for the multi-modal emotion recognition, animal sound detection, or multi-modal electronic health records that include relational tables, images, and audio.
Alternative Solutions
I tried to use ImageArray, but it does not accept .wav files.
Additional Context
Checklist
Problem Statement
Currently, audio data is not supported. Therefore, it would be great if semantic operators supported audio input (.wav, .mp4, .mp3).
Proposed Solution
A possible solution would be to create the AudioArray class (similar to the existing AudioArray). A possible model that supports audio is GPT-4o.
Use Cases
A specific use case is, for instance, semantic filtering based on sound, generation of a label from the sound, e.g., for the multi-modal emotion recognition, animal sound detection, or multi-modal electronic health records that include relational tables, images, and audio.
Alternative Solutions
I tried to use ImageArray, but it does not accept .wav files.
Additional Context
Checklist