Need a raw zlib/deflate inflate primitive in std/zip (or a new std/deflate) so AILANG modules can decompress raw zlib streams without going through a ZIP archive container.
Context
While building PDF annotation extraction in ailang-parse (extracting highlights/comments without invoking the AI multimodal path), we hit the limitation: PDF object streams (/ObjStm) and compressed dictionary entries use FlateDecode, which is raw zlib (RFC 1950 — deflate + 2-byte header + adler32 trailer). Modern PDFs (PDF 1.5+, anything 'optimized for web') bundle small objects including annotations into ObjStm, so any AILANG module wanting to read PDF metadata without shelling out needs raw inflate.
Current stdlib gap
- std/zip exposes ZIP-entry-level APIs only (listEntries, readEntry, readEntryBytes); no way to feed it a raw deflate or zlib byte string
- std/gzip.decompress requires the gzip wrapper (RFC 1952), incompatible with raw zlib streams
- No std/deflate module exists
Use cases beyond PDF
- PDF FlateDecode (annotations, metadata, content streams)
- Any wire protocol using zlib (HTTP Content-Encoding: deflate, WebSocket permessage-deflate, PNG IDAT chunks)
- Custom binary formats with embedded zlib payloads
Proposed API
Either extend std/zip:
inflate(input: string) -> Result[string, string] -- raw deflate, no header
inflateZlib(input: string) -> Result[string, string] -- zlib-wrapped (RFC 1950)
deflate(input: string, level: int) -> Result[string, string]
deflateZlib(input: string, level: int) -> Result[string, string]
Or new std/deflate module with the same shape. Same base64 string convention as std/gzip and std/zip readEntryBytes.
Workaround for ailang-parse today
For Arwin's specific PDFs (Quartz PDFContext output, no ObjStm), we can ship an AILANG-only annotation extractor with pure string scanning — it works because annotations are inline. But it will silently miss annotations on any PDF that compresses objects, so the feature is fragile without this primitive.
Reported by: cli via ailang messages
Need a raw zlib/deflate inflate primitive in std/zip (or a new std/deflate) so AILANG modules can decompress raw zlib streams without going through a ZIP archive container.
Context
While building PDF annotation extraction in ailang-parse (extracting highlights/comments without invoking the AI multimodal path), we hit the limitation: PDF object streams (/ObjStm) and compressed dictionary entries use FlateDecode, which is raw zlib (RFC 1950 — deflate + 2-byte header + adler32 trailer). Modern PDFs (PDF 1.5+, anything 'optimized for web') bundle small objects including annotations into ObjStm, so any AILANG module wanting to read PDF metadata without shelling out needs raw inflate.
Current stdlib gap
Use cases beyond PDF
Proposed API
Either extend std/zip:
inflate(input: string) -> Result[string, string] -- raw deflate, no header
inflateZlib(input: string) -> Result[string, string] -- zlib-wrapped (RFC 1950)
deflate(input: string, level: int) -> Result[string, string]
deflateZlib(input: string, level: int) -> Result[string, string]
Or new std/deflate module with the same shape. Same base64 string convention as std/gzip and std/zip readEntryBytes.
Workaround for ailang-parse today
For Arwin's specific PDFs (Quartz PDFContext output, no ObjStm), we can ship an AILANG-only annotation extractor with pure string scanning — it works because annotations are inline. But it will silently miss annotations on any PDF that compresses objects, so the feature is fragile without this primitive.
Reported by: cli via ailang messages