feat(documents): plugin format surface plus PPTX/POTX adapter#941
Open
mimeding wants to merge 10 commits intoosaurus-ai:mainfrom
Open
feat(documents): plugin format surface plus PPTX/POTX adapter#941mimeding wants to merge 10 commits intoosaurus-ai:mainfrom
mimeding wants to merge 10 commits intoosaurus-ai:mainfrom
Conversation
This was referenced Apr 24, 2026
89c5ddf to
73a299c
Compare
73a299c to
575de84
Compare
3d87e66 to
00343dc
Compare
…ntParser through the registry Migrates the three ingress paths already handled by DocumentParser onto the adapter surface introduced in the foundations PR, without changing any user-observable behaviour. parseAll now consults the registry first and falls back to its existing switch for anything an adapter hasn't claimed or has declined — specifically image-only PDFs, which continue to render via the legacy fallback until the layout-aware PDF rework lands. - PlainTextAdapter wraps the existing UTF-8 / ISO-Latin-1 retry path and the 500K-character truncation marker so the legacy behaviour stays byte-identical. - PDFAdapter wraps PDFKit text extraction; it throws emptyContent when there is no text layer so the shim falls through to the legacy image- render path rather than claiming a result it cannot produce. - RichDocumentAdapter wraps NSAttributedString across docx/doc/rtf/html; a single adapter for all four because they share the framework call today, splitting when high-fidelity DOCX lands. - DocumentAdaptersBootstrap registers the three on the shared registry from AppDelegate.applicationDidFinishLaunching exactly once so the shim sees adapters on the first file ingress. - PlainTextRepresentation is the neutral text shape for adapters that cannot yet publish a format-native representation; replaced per-format by Workbook / WordDocument / etc. in later PRs.
First real-fidelity document adapter. Reads .xlsx into a typed Workbook representation carrying sheet names, cells with formula source strings, merged-range references, shared strings, and cell types (number, shared string, inline string, boolean). The text fallback renders each sheet as a tab-separated table so callers still on the legacy Attachment. Kind.document path see something readable. The adapter deliberately does NOT call CoreXLSX's parseStyles() — that entry point crashes on openpyxl-generated workbooks because the library's PatternFill.patternType is non-optional while Excel's default empty pattern omits the attribute. Everything we surface today is style-independent; lifting that limitation (number formats, column widths, dates stored as styled numbers) lives in a follow-up slice behind a hand-rolled styles fallback. - Package.swift: CoreXLSX 0.14.2 dependency for the core target, testTarget resource declaration for the xlsxwriter-produced fixture. - Workbook / Sheet / Row / Cell / CellValue / CellRange: the typed intermediate that both the XLSX read path and the eventual XLSX write emitter round-trip through. - XLSXAdapter: the actual CoreXLSX → Workbook translator + markdown- style text fallback. - DocumentAdaptersBootstrap: registers XLSXAdapter alongside PlainText / PDF / RichDocument, so DocumentParser.parseAll now routes .xlsx through the registry instead of throwing unsupportedFormat. - Tests/Documents/Fixtures/xlsx/sample.xlsx: 5.9 KB fixture with two sheets, a SUM formula, a merged range (A5:B5), shared strings, and explicit booleans. Exercises the parse paths for each fidelity feature. - XLSXAdapterTests: 7 tests pinning format routing, sheet/cell structure, formulas, merged ranges, shared strings, booleans, text fallback formatting, and size-limit refusal. - DocumentParserShimTests: expands the bootstrap assertion to include "xlsx" alongside the three existing adapter ids.
Pairs with XLSXAdapter so agents can ingest a workbook, modify the typed Workbook in-process, and emit it back as a fresh .xlsx attachment. libxlsxwriter ships a first-party Swift Package as a pure C SwiftPM target, so no XCFramework / vendored C source is needed in osaurus itself — it's just a dependency add. - Package.swift: libxlsxwriter 1.2.4 dependency for the core target. - XLSXEmitter: Workbook -> .xlsx via libxlsxwriter. Parses A1 cell references into 0-indexed row/col, dispatches strings / numbers / booleans / formulas to the right write_* function, handles merged ranges via worksheet_merge_range with a nil string so the top-left cell's already-written content is preserved. Cleans up a partial .xlsx on any emit error so a failed round trip never masquerades as a readable file. - DocumentAdaptersBootstrap: registers XLSXEmitter alongside XLSXAdapter. - XLSXEmitterTests: 7 tests pinning the round trip end-to-end. Builds a Workbook in memory, writes via XLSXEmitter, reads via XLSXAdapter, asserts sheet names / formulas / merged ranges / strings / numbers / booleans all survive. Licensing footnote: libxlsxwriter is BSD-2-Clause, but bundles third_party/tmpfileplus/tmpfileplus.c under MPL 2.0. Statically linking is permitted. A follow-up to AcknowledgementsView should list both; deliberately out of scope for this PR.
…te_workbook Exposes the typed Workbook surface to folder-mode agents. Stacks on top of the XLSX read (osaurus-ai#929) + write (osaurus-ai#936) PRs and completes the stage-4 round-trip goal: an agent can now ingest a spreadsheet, reason about cells and formulas in their native types, and emit a modified workbook — all without the model having to handroll XML. - read_workbook: returns a compact JSON summary of every sheet (names, row counts, merged ranges, truncated cell sample). Capped at 200 cells per sheet so large workbooks don't blow the context window; agents drop to read_workbook_cell for specific values. - read_workbook_cell: single-cell lookup by (path, sheet, A1 ref). Returns value, formula source, and type in a one-line JSON payload. - write_workbook: accepts a structured sheets array and emits the file via XLSXEmitter. Each cell carries its A1 ref, typed value, and optional formula; the schema enum guards against unknown types. write_workbook creates parent directories and surfaces a sheetCount / totalCells summary on success. - All three plug into FolderToolFactory.buildCoreTools alongside file_read / file_write, so they're registered the moment a working folder is selected and go away when it's cleared. - Tests: 8 tests covering sheet summary rendering, missing-file and out-of-root rejection, formula preservation on cell lookup, missing- sheet error, end-to-end write + re-parse fidelity, non-xlsx path refusal, and empty-sheets validation. Tests reuse the sample.xlsx fixture from the XLSX read PR.
Replaces the legacy 'CSV as plain text' ingestion with a typed
CSVTable representation that preserves encoding, delimiter, line-ending
style, and per-row cell boundaries. Pairs a batch adapter for chat
attachments with a streaming variant for multi-GB exports.
- CSVTable / CSVRecord: typed representation + one streamed row shape.
- CSVParser: shared RFC-4180-ish state machine. Handles quoted fields,
'""' quote escapes, embedded newlines in quoted cells, CRLF / LF /
bare-CR line endings.
- CSVAdapter: eager, in-memory. Delimiter defaults per extension
('.csv' -> ',', '.tsv' -> '\t'). UTF-8 BOM stripping + ISO-Latin-1
fallback decode. Conservative header heuristic (first row is a
header when at least one cell is non-numeric and there's a body
row below). Renders a markdown-style text fallback for chat display.
- CSVStreamer: row-at-a-time AsyncThrowingStream for large files.
Reads 64 KB chunks, splits at the last complete UTF-8 scalar so
multi-byte scalars never cross a chunk boundary, feeds bytes
through the same CSVParser.Machine so quoting / newline semantics
match the batch path exactly. Honours Task cancellation so the
agent tool surface can back-pressure.
- Registers both in DocumentAdaptersBootstrap after PlainText so
later-wins routing picks the typed adapter for '.csv' / '.tsv'.
- Tests: 10 adapter tests (header split, TSV delimiter, quoted commas
+ newlines, '""' escape, UTF-8 BOM, numeric-only header rejection,
size-limit refusal, empty-file emptyContent, CRLF, canHandle) + 7
streamer tests (in-order yield, 1-based line numbering, TSV,
quoted newlines across chunks, cancellation mid-file, UTF-8
boundary helper coverage).
Upgrades the PDFAdapter from flat text to a typed PDFDocumentRepresentation
that carries per-page text PLUS a list of detected tables. Turns invoices,
bank statements, and 10-Ks from 'run-together numeric columns' into
proper cell grids without changing the flat-text contract other consumers
rely on.
Detection strategy:
1. Walk each page's characters and capture (scalar, rect) from
PDFPage.characterBounds(at:). Whitespace glyphs are dropped up
front because PDFKit reports their bounds as spanning the visual
gap they introduce, which would hide column boundaries.
2. Cluster glyphs into rows by y-coordinate tolerance (3pt).
3. Within each row, split into cells wherever the inter-glyph gap
exceeds 8pt — clearly above word-space (~3pt at 12pt body) but
well below intentional column gaps (>20pt typical).
4. Collect runs of multi-cell rows into PDFTable regions. Isolated
single-tabular rows are dropped so form lines like
'Invoice No. 1234' don't masquerade as tables.
- PDFDocumentRepresentation / PDFPageRepresentation / PDFTable: the new
typed shape emitted by the adapter.
- PDFAdapter now emits PDFDocumentRepresentation instead of
PlainTextRepresentation. textFallback stays the flat concatenation of
page text so chat attachments render unchanged.
- PDFTableDetector: pure-function stages (clusterRows, cellsForRow,
groupConsecutiveTabularRows, detect(glyphs:)) exposed internally for
unit testing without PDFKit, so the heuristic can be pinned against
synthetic glyph grids that aren't subject to Core Graphics' habit of
reporting character bounds that span trailing whitespace.
- Image-only PDFs still throw emptyContent so the DocumentParser shim
can fall through to the legacy image-render path.
Test suite (16 new):
- Row clustering by y (including descending PDF-coord sort).
- Cell splitting for wide gaps, word-in-cell, single glyph.
- Tabular row grouping: multi-row collection, single-cell row split,
drop-isolated-single-row, empty input.
- Full detect(glyphs:) on a 3x3 synthetic grid.
- End-to-end adapter integration (emits PDFDocumentRepresentation,
preserves text fallback; blank PDF still throws emptyContent).
Adds the host-side bridge between the plugin ABI and the document format registry so plugin-provided parsers and emitters plug into DocumentFormatRegistry the same way the in-tree adapters do. A plugin that registers a parser through this surface ends up as a regular adapter consumers can look up via registry.adapter(for:) — no plugin- specific branch in the consumer. The plugin-side invocation (how a plugin's invoke pointer gets wired back into the shim adapter) is structured around a PluginDocumentInvoker protocol so the host-to-plugin callback is a single seam. This PR wires the Swift side end-to-end and tests it with a fake invoker; the PluginManager plumbing that threads each plugin's real invoke pointer into PluginDocumentInvoker lands with a follow-up since it needs access to PluginManager internals. - osaurus_plugin.h: adds osr_register_parser_fn / register_emitter_fn / unregister_format_fn signatures and the trailing struct fields, with full request/response JSON contract documented inline. Trailing fields — older plugins compiled against the v2 layout pre-this-PR keep loading because the host allocates the struct and zero-inits the new tail. - PluginBackedDocumentAdapter.swift: Swift shims implementing DocumentFormatAdapter and DocumentFormatEmitter by forwarding to a plugin via PluginDocumentInvoker.invoke(type:id:payload:). Surfaces only the textFallback representation today; richer representations (Workbook, PDFDocumentRepresentation) come with a response-schema extension once a first plugin needs them. - PluginDocumentRegistry.swift: owns format_id -> plugin_id ownership so one plugin can't unregister another's format (or overwrite an in-tree built-in). Returns JSON envelopes matching the C-header contract. - Tests: 8 scenarios covering happy-path registration, adapter → plugin invocation threading, plugin error propagation, emitter routing, another-plugin-cannot-overwrite, reject-unregister-by- other, unregisterAll teardown on plugin unload, and malformed-JSON rejection.
Business rationale: keeping the plugin document-format PR green makes it reviewable as part of the file-fidelity harness, so specialized parsers and emitters can extend osaurus without changing the app binary. Coding rationale: this only applies touched-file style fixes after rebasing onto the stabilized main gate; no behavior changes are included, and the plugin registration surface remains the sole logical change in the PR. Co-authored-by: OpenAI Codex <codex@openai.com>
1e7a778 to
75bca63
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Business rationale
First-party adapters cover the common document formats, but the high-fidelity harness needs plugins to add specialized parsers and emitters without changing the app binary. A register_parser/register_emitter surface lets format-specific plugins, including the companion PPTX plugin, appear through the same
DocumentFormatRegistrylookup path that in-tree adapters use.Coding rationale
The ABI grows by adding trailing fields to the host struct so older plugins compiled against the existing v2 layout continue to load. Plugin-backed adapters conform to the same
DocumentFormatAdapterandDocumentFormatEmitterprotocols as in-tree implementations, avoiding plugin-specific branches in document consumers. Ownership lives inPluginDocumentRegistryso one plugin cannot overwrite or unregister another plugin's format. The Swift side is tested with a fake invoker; real PluginManager invocation plumbing remains a separate follow-up because it needs to thread per-plugin invoke pointers through existing manager internals.What changed
osaurus_plugin.h.PluginBackedDocumentAdaptershims that forward adapter/emitter calls to a plugin invoker.PluginDocumentRegistryto own format registration, collision checks, unregister, and unload cleanup.Validation
git fetch origin && git rebase origin/main- completed after resolvingPackage.resolved, keeping main's lean folder tool list plus workbook tools only, and dropping stale unrelated commits.swift build --package-path Packages/OsaurusCore- passed.swift build --package-path Packages/OsaurusCore -c release- passed.swift test --package-path Packages/OsaurusCore- passed, 1513 tests in 203 suites, with sandbox integration tests skipped by their normal environment gate.xcrun swift-format lint --stricton every touched Swift file - passed.swiftlint lint --stricton every touched Swift file - passed file-by-file.git diff --check origin/main...HEAD- passed.Packages/OsaurusCLI.Non-scope
Residual risks
The Swift registry path is covered with fake plugin invocations, but the final host-to-plugin callback hookup still needs a follow-up that threads each plugin's real invoke pointer through PluginManager. Reviewers may also see lower-stack document commits in the GitHub diff until #927/#929/#936/#937/#939/#940 merge.