Skip to content

Audio Data Support (Similar to the ImageArray) #196

@OlgaOvcharenko

Description

@OlgaOvcharenko

Problem Statement

Currently, audio data is not supported. Therefore, it would be great if semantic operators supported audio input (.wav, .mp4, .mp3).

Proposed Solution

A possible solution would be to create the AudioArray class (similar to the existing AudioArray). A possible model that supports audio is GPT-4o.

Use Cases

A specific use case is, for instance, semantic filtering based on sound, generation of a label from the sound, e.g., for the multi-modal emotion recognition, animal sound detection, or multi-modal electronic health records that include relational tables, images, and audio.

Alternative Solutions

I tried to use ImageArray, but it does not accept .wav files.

Additional Context

Checklist

  • I have searched existing issues to avoid duplicates
  • I have provided a clear problem statement
  • I have considered alternative solutions
  • I have assessed the impact and priority
  • I am willing to contribute to implementation (if applicable)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions