✨(summary) extended support for all video / audio files#1358
Conversation
FloChehab
commented
May 21, 2026
- Removed constraint on file extension
- Infer audio / video streams from the media with ffmpeg
- Infer the correct processed audio file extension based on actual codec to avoid ffmpeg errors.
7c99781 to
571fe7f
Compare
cameledev
left a comment
There was a problem hiding this comment.
Looks mostly good to me.
One minor structure question/suggestion.
Otherwise, just naming stuff
Thanks for the review! |
lebaudantoine
left a comment
There was a problem hiding this comment.
I think the current structure evolved in a way that makes the file service harder to extend safely.
My initial choice was to encapsulate file utilities inside a service class to support interchangeability of implementations. However, in practice we’ve ended up continuously modifying the same implementation for new features, which increases regression risk for v1 and reduces flexibility.
In hindsight, it may have been better to introduce a stable interface with separate implementations (e.g. v1/v2) and a factory or dependency injection layer to select between them, instead of evolving a single service over time.
We already adapt the behavior of the service with the iflogic base on the file url/path.
This would likely have preserved v1 behavior while allowing faster iteration on v2 without affecting existing logic.
We can’t assume all summary callers will require the same file service implementation. This suggests we may need a more explicit abstraction (e.g. interface + multiple implementations) rather than a single tightly coupled service.
As a general rule, I think we should try to keep clear boundaries between implementations rather than mixing extension logic into a single evolving class.
I think we should clarify the ownership boundaries between module-level utilities and service classes. Right now the split is not always consistent, which makes it harder to understand where new logic should live.
| transcription_res = WhisperXResponse( | ||
| **transcribe_audio( # type: ignore | ||
| task_id=job_id, | ||
| cloud_storage_url=payload.cloud_storage_url, | ||
| language=payload.language, | ||
| raises=True, | ||
| ).model_dump() | ||
| ) |
There was a problem hiding this comment.
not related with this PR, can't transcribe_audio returns directly a WhisperXResponse instance?
There was a problem hiding this comment.
There is indeed work to be done around using from pydantic models, especially in the helpers by @cameledev. I left that for another PR.
| except BaseException as e: | ||
| if isinstance(e, FileNotFoundError): | ||
| logger.error("ffmpeg not found. Please install ffmpeg.") | ||
| elif isinstance(e, subprocess.CalledProcessError): | ||
| logger.error("Audio extraction failed: %s", e.stderr.decode()) | ||
| else: | ||
| logger.error("Unexpected error during audio extraction: %s", e) | ||
|
|
||
| if output_path.exists(): | ||
| os.remove(output_path) | ||
| raise RuntimeError("Failed to extract audio from file") from e |
There was a problem hiding this comment.
in terms of python style I found the initial syntax with several except easier to scan/parse.
maybe, extract the cleanup in a function?
There was a problem hiding this comment.
I think this is to be reworked as a bigger clean of those file handles
|
Regarding your big comment on the file service @lebaudantoine, in #1362 I get rid of old "read from minio" and other stuff from the file service. From my PoV summary was considered kind of experimental in regards to other hosters community (there is a bunch of forced posthog in config for instance). |
* Removed constraint on file extension * Infer audio/video streams from the media with ffmpeg * Infer the correct processed audio file extension based on actual codec to avoid ffmpeg errors We need to support more extensions and make audio extraction dynamic, as we shipped transcript in production and it led to user complaints requesting more formats.
1392009 to
919b7f3
Compare
|


