Summary
During review of PR #1857 (fix: fix ingestion without page number), it was identified that ingestion may fail when owner_name or owner_email contains non-ASCII characters (e.g., accented letters, CJK characters, Greek characters, etc.).
Problem
owner_name and owner_email are passed via X-Langflow-Global-Var-* HTTP headers in src/services/langflow_file_service.py. HTTP headers must comply with RFC 7230 (ISO-8859-1 / ASCII), so passing non-ASCII values directly without encoding will cause encoding errors or silently corrupt the data.
Example values that would currently break:
owner_name = "José Łukasz 山田"
owner_email = "δοκιμή@παράδειγμα.δοκιμή"
Expected Behaviour
Non-ASCII owner_name and owner_email values should be safely encoded (e.g., percent-encoded or base64-encoded) when passed through HTTP headers, and correctly decoded on the receiving end, so ingestion proceeds without errors and the metadata is preserved faithfully.
Suggested Fix
- Encode
owner_name and owner_email using a UTF-8-safe mechanism (e.g., urllib.parse.quote or RFC 5987 encoding) when setting the X-Langflow-Global-Var-* headers in src/services/langflow_file_service.py.
- Add a corresponding test verifying ingestion succeeds and metadata is preserved for non-ASCII owner names and emails.
References
Summary
During review of PR #1857 (fix: fix ingestion without page number), it was identified that ingestion may fail when
owner_nameorowner_emailcontains non-ASCII characters (e.g., accented letters, CJK characters, Greek characters, etc.).Problem
owner_nameandowner_emailare passed viaX-Langflow-Global-Var-*HTTP headers insrc/services/langflow_file_service.py. HTTP headers must comply with RFC 7230 (ISO-8859-1 / ASCII), so passing non-ASCII values directly without encoding will cause encoding errors or silently corrupt the data.Example values that would currently break:
Expected Behaviour
Non-ASCII
owner_nameandowner_emailvalues should be safely encoded (e.g., percent-encoded or base64-encoded) when passed through HTTP headers, and correctly decoded on the receiving end, so ingestion proceeds without errors and the metadata is preserved faithfully.Suggested Fix
owner_nameandowner_emailusing a UTF-8-safe mechanism (e.g.,urllib.parse.quoteor RFC 5987 encoding) when setting theX-Langflow-Global-Var-*headers insrc/services/langflow_file_service.py.References