Skip to content

[codex] Clean up repo for S+ baseline#230

Draft
arcaputo3 wants to merge 3 commits into
mainfrom
splus-repo-cleanup
Draft

[codex] Clean up repo for S+ baseline#230
arcaputo3 wants to merge 3 commits into
mainfrom
splus-repo-cleanup

Conversation

@arcaputo3
Copy link
Copy Markdown
Contributor

Summary

  • Move CI to JDK 25 and clean up compile warnings in the CLI paths.
  • Route in-memory --stream workbook writes through the SAX/StAX OOXML backend so metadata such as merges, comments, tables, row/column properties, and freeze panes is preserved.
  • Enable previously ignored OOXML edge tests, add streaming metadata regressions, and make unsupported streaming batch operations fail clearly.
  • Reconcile docs around streaming modes, evaluator status, security status, test guidance, and published artifacts.

Validation

  • ./mill --no-server __.checkFormat
  • ./mill --no-server __.compile
  • ./mill --no-server __.test (901/901 success)

Notes

gh auth status reports an invalid local token, so this PR was opened through the GitHub connector after pushing the branch with git.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 26, 2026

Code Review: [codex] Clean up repo for S+ baseline

Overview

This PR does four things: upgrades CI to JDK 25, radically simplifies writeWorkbookStream by delegating to the full SAX/StAX OOXML writer, enables previously-ignored tests, and reconciles docs to match the new reality. The core idea — replace a fragile 170-line two-phase implementation with two lines — is excellent.


Strengths

  • writeWorkbookStream simplification is a significant win. The old implementation (temp files per sheet, manual cleanup, custom ZIP assembly without SST/comments/tables, style remapping) was complex and incomplete. The new delegation to writeWith with XmlBackend.SaxStax is correct, simpler, and actually preserves the full metadata the old path silently dropped.
  • Test enablement is substantive. The four previously-ignored XlsxWriterRichTextEdgeCasesSpec tests each required real fixes to assertions (self-closing row regex, <family> element vs attribute, SST reference path) — these aren't just flip-the-switch enablements.
  • nextOption() fix (ImageMagick.scala:275) is a correct Scala 3.4+ fix for the deprecation of nullary method calls without parentheses.
  • Docs are now accurate. The previous STATUS.md listed security hardening as despite being implemented; performance-guide.md referred to a never-shipped P7.5 two-phase writer. These corrections reduce maintenance confusion.

Issues and Suggestions

1. Potential pattern-match bug in streaming batch unsupported ops

StreamingWriteCommands.scala — the new unsupported-ops case:

_: BatchParser.BatchOp.Freeze | BatchParser.BatchOp.Unfreeze |
_: BatchParser.BatchOp.CopyRange =>

Freeze uses _: Type (typed pattern, matches any instance of Freeze). Unfreeze has no _: prefix — it's matched as a value/singleton. If Unfreeze is a case class (not a case object), this pattern will only match the companion object itself, not instances, and the throw will never be reached for Unfreeze ops.

The test suite covers freeze rejection but has no corresponding test for unfreeze rejection. Please verify whether Unfreeze is a case object or case class. If it's a case class, add _: and add a test.

2. writeWorkbookStream guards redundant isEmpty check

if wb.sheets.isEmpty then
  Async[F].raiseError(new IllegalArgumentException("Workbook must have at least one sheet"))
else
  val saxConfig = config.copy(backend = com.tjclp.xl.ooxml.XmlBackend.SaxStax)
  writeWith(wb, path, saxConfig)

If writeWith already handles empty workbooks (raises or writes an empty-but-valid file), the guard here is either redundant or it masks a different behaviour. Worth checking whether writeWith on an empty workbook errors, succeeds silently, or does something else — and document the intent explicitly.

3. JDK 25 is non-LTS

JDK 25 is not an LTS release (JDK 21 is the current LTS; JDK 25 will be superseded by JDK 27 LTS). For a library targeting broad compatibility, pinning CI to a non-LTS JDK can surprise users on LTS runtimes (21, 17) if there are accidental 25-only APIs or behaviour changes. What drove the 21 → 25 jump? If it's a compile-warning cleanup (e.g. the PR title mentions "compile warnings"), a comment in the CI file noting the reason would be useful. Consider also running a JDK 21 matrix job to retain LTS coverage.

4. Duplicate comment in Main.scala

// Sheet-level write: --file, --sheet, and --output (required)
// Sheet-level write: --file, --sheet, and --output (required)

The second line is newly edited but the first line remains a duplicate of it. Minor, but worth cleaning up.

5. Test assertion on user-facing string

assert(result.contains("Saved (streaming)"), result)

This ties the test to a specific output string. If the message is ever rephrased, the test breaks silently (it passes until the message changes). Prefer testing a semantic property (the output file exists and is readable with the expected metadata) rather than the exact human-readable message.


Minor Notes

  • The @SuppressWarnings on XlsxWriterRichTextEdgeCasesSpec suppresses OptionPartial — the get calls that were safe because isDefined was asserted first. No issue here.
  • Import cleanup (FileInputStream, XmlEvent) is correct given the removed two-phase code.
  • The StreamingXmlWriter.worksheetBodyStyled docstring update is accurate; the method is still used by pure row-stream paths, not writeWorkbookStream.

Summary

The core change (simplifying writeWorkbookStream) is the right call and should be merged. The pattern-match bug risk for Unfreeze (#1) is the only thing worth confirming before merge; everything else is minor polish. The JDK 25 choice (#3) is worth a quick comment explaining the rationale.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 30, 2026

Code Review: PR #230 — Clean up repo for S+ baseline

Overview

This PR does four things: upgrades CI to JDK 25, simplifies writeWorkbookStream by delegating to the SAX/StAX OOXML backend, fixes a security bypass in WriterConfig.secure, and enables previously-ignored OOXML edge tests. The direction is good—the old two-phase implementation was complex and incomplete—but there are a few items worth addressing.


Strengths

writeWorkbookStream simplification (ExcelIO.scala:971–654)

The old 165-line two-phase implementation was silently dropping merges, comments, tables, freeze panes, and row/column properties. Replacing it with:

val saxConfig = config.copy(backend = com.tjclp.xl.ooxml.XmlBackend.SaxStax)
writeWith(wb, path, saxConfig)

is the right call. Correctness over a complex bespoke path.

Security fix in XlsxWriter.scala

This is a real bug fix. Previously, WriterConfig.secure was silently bypassed for "clean" workbooks because isClean triggered a verbatim file copy before formula escaping was applied. The dual fix—guarding the fast-path check and hoisting sheetsToRegenerate to include escapeFormulas—closes the bypass correctly. The new tests in FormulaInjectionSpec and ExcelIOSpec verify both the SST and round-trip behavior.

Test enablements

Enabling four previously-ignored OOXML edge tests with accurate updated assertions (self-closing row regex, SST-referenced assertions for xml:space, family element checks) is solid work.


Issues

1. Pattern match inconsistency in streaming batch rejection (StreamingWriteCommands.scala:707–989)

case _: BatchParser.BatchOp.Freeze | BatchParser.BatchOp.Unfreeze |
    _: BatchParser.BatchOp.CopyRange =>

Freeze and CopyRange use _: (case classes), but Unfreeze is a bare reference with no _:. If Unfreeze is a case class or case class with parameters, this matches the companion object, not instances—the rejection would silently not fire. The existing cases all use _: consistently. Either add _: if Unfreeze takes parameters, or document why it's a case object. A test for unfreeze rejection (parallel to the freeze test added) would make this unambiguous.

2. Performance claim drift

The old writeWorkbookStream was documented as "O(1) output memory." That claim drove its existence. The new implementation uses the SAX/StAX backend, which has lower allocation than the DOM backend but is not O(1)—the full Workbook is materialized and the SAX emitter buffers per-sheet output. The docs have been updated to say "lower-allocation" which is accurate, but the --stream flag in the CLI still implies streaming semantics to users. Consider whether the flag name/description in help text needs a note that --stream for writes means "SAX backend" not "constant-memory streaming."

3. StylePerformanceSpec un-ignored (xl-ooxml/test/...StylePerformanceSpec.scala:1462)

test("performance comparison: 100 vs 1000 styles scales sub-quadratically") {

Performance tests in CI are inherently environment-sensitive. The previous .ignore may have been defensive. If this test asserts timing ratios rather than absolute thresholds it's probably fine, but verify it doesn't use wall-clock assertions that could flap on loaded CI runners.


Minor

  • nextOption() fix in ImageMagick.scala — correct Scala 3 call-site syntax.
  • Import cleanup in ExcelIO.scala (removing FileInputStream, ZipInputStream, XmlEvent) is clean.
  • The writeWorkbook private helper is duplicated verbatim across CellCommands, CommentCommands, ImportCommands, SheetCommands, and WriteCommands. Not introduced by this PR, but worth a follow-up to extract to a shared utility.
  • CLAUDE.md lists 82 functions but docs/LIMITATIONS.md still says 81 functions—minor doc drift (line 23 of the LIMITATIONS diff).

Summary

The security fix and writeWorkbookStream simplification are the most valuable changes here. The Unfreeze pattern match is the one item that needs verification before merge—it could silently fail to reject that batch op in streaming mode. Everything else is either correct or minor cleanup.

@claude
Copy link
Copy Markdown

claude Bot commented May 1, 2026

Code Review — PR #230: Clean up repo for S+ baseline

Overview

This PR delivers four distinct improvements:

  1. CI JDK upgrade: 21 → 25 (LTS)
  2. Security fix: WriterConfig.secure now correctly escapes formula-injection strings even when the source workbook is "clean" (was silently bypassed via verbatim copy path)
  3. writeWorkbookStream simplification: Replaces ~150 lines of two-phase temp-file assembly with a 5-line delegation to XlsxWriter.writeWith(SaxStax), fixing the metadata-loss bug (merges, comments, freeze panes, tables)
  4. Test enablement + new tests: 4 previously-ignored edge tests re-enabled, plus new streaming metadata regressions

Overall the PR is well-structured and addresses real correctness gaps. A few things worth discussing below.


Security Fix (XlsxWriter.scala + SharedStrings.scala)

This is the most important change. The old code had a silent bypass:

// OLD — verbatim copy even with WriterConfig.secure
case Some(ctx) if ctx.isClean => copyVerbatim(...)

// NEW — also gate on escape flag
case Some(ctx) if ctx.isClean && !escapeFormulas => copyVerbatim(...)

The fix is correct and the test in FormulaInjectionSpec validates the exact bypass scenario (read clean workbook → write with WriterConfig.secure → verify SST contains '=DANGER and not raw =DANGER). The SST-level assertion is a nice belt-and-suspenders check.

The sheetsRequiringRemapping parameter threaded through StyleIndex.fromWorkbook is the right way to prevent TJC-751 style-stripping from regressing on secure rewrites of clean workbooks. Good catch.


writeWorkbookStream Simplification

// Before: ~150 lines — two-phase temp files, partial ZIP assembly, no merges/comments/freeze
// After:
val saxConfig = config.copy(backend = com.tjclp.xl.ooxml.XmlBackend.SaxStax)
writeWith(wb, path, saxConfig)

The simplification is clearly correct. The old implementation was essentially a partial re-implementation of XlsxWriter that missed significant parts of the OOXML spec (merge cells, comments, table relationships, freeze panes). Delegating to the full writer with the SaxStax backend is the right call.

One thing to confirm: BoundsAccumulator was used by the removed code. Is it still referenced elsewhere, or is it now dead code that should also be removed?


Streaming Batch Operations (StreamingWriteCommands.scala)

_: BatchParser.BatchOp.AddSheet | _: BatchParser.BatchOp.RenameSheet |
_: BatchParser.BatchOp.Freeze | BatchParser.BatchOp.Unfreeze |
_: BatchParser.BatchOp.CopyRange =>

Minor: BatchParser.BatchOp.Unfreeze lacks the _: prefix that all sibling patterns have. If Unfreeze is a case object this is equivalent, but it's inconsistent with the surrounding style. Worth making uniform either way for readability.


JDK 25

JDK 25 is LTS (released September 2025) so the upgrade is sound long-term. A few questions:

  • Is there a plan to maintain a CI matrix that also tests JDK 21 for users who haven't upgraded? JDK 21 is still the widely-deployed LTS.
  • Have any Mill/Coursier plugins been confirmed compatible with JDK 25? The test suite passes (901/901 per the PR description), which is good evidence, but a note in the PR would help reviewers evaluate the risk.

Re-enabled Tests (XlsxWriterRichTextEdgeCasesSpec.scala)

The regex fix for empty row matching is correct:

// OLD — dotall greedy, can match across child elements
"""<row r="1"[^>]*>.*?</row>"""

// NEW — handles self-closing tags correctly
"""<row r="1"[^>]*/>|<row r="1"[^>]*></row>"""

The xml:space assertion change (A1 now validated through SST instead of inline string) reflects real behavior and is asserted correctly. The family element check "<family" && """val="1""" is more precise than the old family= string match.


Documentation

The doc changes are accurate and improve clarity:

  • Renaming "Streaming Path" → "Row-Streaming Path" removes a long-standing source of confusion
  • Security status updated from ❌ to ✅ for ZIP bomb, XXE, formula injection — these were already shipped
  • Removing hardcoded test counts (263, 636) in favor of ./mill __.test is strictly better
  • writeWorkbookStream correctly described as "lower-allocation full OOXML write, not row-input streaming"

One small inconsistency: CLAUDE.md still says 82 functions but docs/LIMITATIONS.md and docs/STATUS.md say 81 functions. This pre-exists the PR but surfaced while reading.


Minor

  • ImageMagick.scala: nextOptionnextOption() is a correct compiler-warning fix.
  • Import cleanup (FileInputStream, XmlEvent removed) is tidy.
  • ExcelIOSpec secure test mirrors the FormulaInjectionSpec test well; the two-layer coverage (unit + integration) is appropriate given the security sensitivity.

Verdict

Approve with minor nits. The security fix and writeWorkbookStream correctness fix are the right calls and well-tested. Suggest resolving:

  1. Whether BoundsAccumulator is now orphaned
  2. Unfreeze pattern consistency
  3. JDK matrix question (can be post-merge if intentional)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant