Skip to content

Live-retrieval: fix file upload crash caused by output path conflict between dispatcher and post-processor#269

Open
AZOGOAT wants to merge 1 commit into
EPFLiGHT:masterfrom
AZOGOAT:fix/file-upload-indexing-crash
Open

Live-retrieval: fix file upload crash caused by output path conflict between dispatcher and post-processor#269
AZOGOAT wants to merge 1 commit into
EPFLiGHT:masterfrom
AZOGOAT:fix/file-upload-indexing-crash

Conversation

@AZOGOAT

@AZOGOAT AZOGOAT commented Apr 4, 2026

Copy link
Copy Markdown
Member

Summary

Fixes a bug where uploading a file via the live-retrieval API (/v1/files, /v1/files/bulk, /v1/files/{fileId}) crashes with IsADirectoryError.

Bug

When uploading a file via the /v1/files API, the process_files_default function uses the same path (./tmp/my_docs) for two different purposes:

  1. The Dispatcher creates it as a folder to store intermediate processing results (images, JSONL files)
  2. The post-processing pipeline tries to open it as a file to save chunked output

Since the folder already exists when the post-processor runs, it crashes with IsADirectoryError: [Errno 21] Is a directory: './tmp/my_docs'. The file gets saved to ./uploads/ but is never indexed, so retrieval returns nothing.

Error logs

[INDEX API 🗂️] Results saved to ./tmp/my_docs/processors/PDFProcessor/results.jsonl
[INDEX API 🗂️] PP Pipeline:
[INDEX API 🗂️] > 1. 🦛 Chunker
🦛 Chunker: 100%|██████████| 1/1 [00:00<00:00, 55.85it/s]
[INDEX API 🗂️] Failed to save samples to ./tmp/my_docs: [Errno 21] Is a directory: './tmp/my_docs'
[INDEX API 🗂️] Error uploading file: [Errno 21] Is a directory: './tmp/my_docs'

File "/app/src/mmore/utils.py", line 240, in process_files_default
chunked = pipeline(raw_documents)
File "/app/src/mmore/process/post_processor/pipeline.py", line 68, in call
return self.run(samples)
File "/app/src/mmore/process/post_processor/pipeline.py", line 98, in run
save_samples(samples, self.output_config.output_path)
File "/app/src/mmore/process/utils.py", line 101, in save_samples
with open(path, mode) as f:
IsADirectoryError: [Errno 21] Is a directory: './tmp/my_docs'

Fix

Append /results.jsonl to the post-processor output path so it writes a file inside the existing directory instead of trying to overwrite it.

@fabnemEPFL

Copy link
Copy Markdown
Collaborator

still needed? @AZOGOAT

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants