Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ xl-core/ → Pure domain model (Cell, Sheet, Workbook, Patch, Style), ma
xl-ooxml/ → Pure OOXML mapping (XlsxReader, XlsxWriter, SharedStrings, Styles)
xl-cats-effect/ → IO interpreters and streaming (Excel[F], ExcelIO, SAX-based streaming)
xl-benchmarks/ → JMH performance benchmarks
xl-evaluator/ → Formula parser/evaluator (TExpr GADT, 82 functions, dependency graphs)
xl-evaluator/ → Formula parser/evaluator (TExpr GADT, 88 functions, dependency graphs)
xl-testkit/ → Test laws, generators, helpers [future]
xl-agent/ → AI agent benchmark runner (Anthropic API, skill comparison)
```
Expand Down Expand Up @@ -92,7 +92,7 @@ excel.read(path).flatMap(wb => excel.write(wb, outPath))

```bash
./mill __.compile # Compile all
./mill __.test # Run all tests (731+)
./mill __.test # Run all tests (1080+)
./mill xl-core.test # Test specific module
./mill __.reformat # Format (Scalafmt 3.10.1)
./mill __.checkFormat # CI check
Expand Down Expand Up @@ -355,7 +355,7 @@ sheet.evaluateFormula("=SUM(A1:A10)") // XLResult[CellValue]
sheet.evaluateWithDependencyCheck() // Safe eval with cycle detection
```

**82 Functions**: SUM, SUMIF, SUMIFS, SUMPRODUCT, COUNT, COUNTA, COUNTBLANK, COUNTIF, COUNTIFS, AVERAGE, AVERAGEIF, AVERAGEIFS, MEDIAN, STDEV, STDEVP, VAR, VARP, MIN, MAX, IF, AND, OR, NOT, ISNUMBER, ISTEXT, ISBLANK, ISERR, ISERROR, CONCATENATE, LEFT, RIGHT, MID, LEN, UPPER, LOWER, TRIM, SUBSTITUTE, TEXT, VALUE, TODAY, NOW, DATE, YEAR, MONTH, DAY, HOUR, MINUTE, SECOND, EOMONTH, ABS, ROUND, ROUNDUP, ROUNDDOWN, INT, MOD, POWER, SQRT, LOG, LN, EXP, FLOOR, CEILING, TRUNC, SIGN, PMT, FV, PV, RATE, NPER, NPV, IRR, VLOOKUP, XLOOKUP, PI, ROW, COLUMN, ROWS, COLUMNS, ADDRESS, TRANSPOSE
**88 Functions**: SUM, SUMIF, SUMIFS, SUMPRODUCT, COUNT, COUNTA, COUNTBLANK, COUNTIF, COUNTIFS, AVERAGE, AVERAGEIF, AVERAGEIFS, MEDIAN, STDEV, STDEVP, VAR, VARP, MIN, MAX, IF, IFERROR, AND, OR, NOT, ISNUMBER, ISTEXT, ISBLANK, ISERR, ISERROR, CONCATENATE, LEFT, RIGHT, MID, LEN, UPPER, LOWER, TRIM, FIND, SUBSTITUTE, TEXT, VALUE, TODAY, NOW, DATE, YEAR, MONTH, DAY, EOMONTH, EDATE, DATEDIF, NETWORKDAYS, WORKDAY, YEARFRAC, ABS, ROUND, ROUNDUP, ROUNDDOWN, INT, MOD, POWER, SQRT, LOG, LN, EXP, FLOOR, CEILING, TRUNC, SIGN, PMT, FV, PV, RATE, NPER, NPV, IRR, XNPV, XIRR, VLOOKUP, XLOOKUP, INDEX, MATCH, PI, ROW, COLUMN, ROWS, COLUMNS, ADDRESS, TRANSPOSE

### Rich Text
```scala
Expand Down Expand Up @@ -387,12 +387,12 @@ Styles deduplicated by `CellStyle.canonicalKey`. Build style index before emitti

**Framework**: MUnit + ScalaCheck | **Generators**: `xl-core/test/src/com/tjclp/xl/Generators.scala`

**980+ tests**: addressing (17), patch (21), style (60), datetime (8), codec (42), batch (46), syntax (18), optics (34), OOXML (24), streaming (18), RichText (5), formula (51+), v0.3.0 regressions (36), CLI (100+)
**1080+ tests**: addressing (17), patch (21), style (60), datetime (8), codec (42), batch (46), syntax (18), optics (34), OOXML (24), streaming (18), RichText (5), formula (51+), v0.3.0 regressions (36), CLI (100+)

## Documentation

- **Roadmap**: `docs/plan/roadmap.md` (single source of truth for work scheduling)
- **Status**: `docs/STATUS.md` (current capabilities, 980+ tests)
- **Status**: `docs/STATUS.md` (current capabilities, 1080+ tests)
- **Design**: `docs/design/*.md` (architecture, purity charter, domain model)
- **Reference**: `docs/reference/*.md` (examples, scaffolds, performance guide)

Expand Down
8 changes: 4 additions & 4 deletions docs/LIMITATIONS.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# XL Current Limitations and Future Roadmap

**Last Updated**: 2025-12-27 (Docs Cleanup)
**Current Phase**: Core domain + OOXML + streaming I/O complete; formula system complete (**81 functions** + cross-sheet support); tables + benchmarks complete; row/column serialization complete; **security hardening complete** (ZIP bomb detection, XXE prevention, formula injection guards in both in-memory and streaming writes).
**Current Phase**: Core domain + OOXML + streaming I/O complete; formula system complete (**88 functions** + cross-sheet support); tables + benchmarks complete; row/column serialization complete; **security hardening complete** (ZIP bomb detection, XXE prevention, formula injection guards in both in-memory and streaming writes).

This document provides a comprehensive overview of what XL can and cannot do today, with clear links to future implementation plans.

Expand Down Expand Up @@ -113,7 +113,7 @@ This document provides a comprehensive overview of what XL can and cannot do tod

#### 6. Formula System ✅ **PRODUCTION READY**
**Status**: Complete (WI-07, WI-08, WI-09a-h + TJC-351 cross-sheet formulas)
**Features**: Parser, evaluator, **81 functions** (including SUMIF, COUNTIF, SUMIFS, COUNTIFS, XLOOKUP, INDEX, MATCH, XIRR, XNPV), dependency graph, cycle detection, cross-sheet references
**Features**: Parser, evaluator, **88 functions** (including SUMIF, COUNTIF, SUMIFS, COUNTIFS, XLOOKUP, INDEX, MATCH, XIRR, XNPV), dependency graph, cycle detection, cross-sheet references
**Plan**: [Formula System](plan/formula-system.md)
**Phase**: WI-07, WI-08, WI-09a/b/c/d Complete + Financial Functions + Cross-Sheet Formulas

Expand Down Expand Up @@ -700,7 +700,7 @@ See: [plan/23-security.md](plan/23-security.md)
| **Streaming Read** | ✅ | ✅ | XL: 55k rows/s, POI: ~40k rows/s |
| **Multi-sheet** | ✅ | ✅ | XL: Arbitrary, POI: Sequential |
| **Styles** | ✅ | ✅ | XL: Full in-memory; streaming uses minimal default styles |
| **Formulas (eval)** | ✅ | ✅ | XL: 81 functions, dependency graph, cycle detection |
| **Formulas (eval)** | ✅ | ✅ | XL: 88 functions, dependency graph, cycle detection |
| **Tables** | ✅ | ✅ | XL: Full table support with AutoFilter, structured refs |
| **Charts** | ❌ | ✅ | POI: Full support |
| **Drawings** | ❌ | ✅ | POI: Images/shapes |
Expand Down Expand Up @@ -757,7 +757,7 @@ SAX parsing is inherently synchronous - the `parser.parse()` call blocks until t
- Multi-sheet workbooks
- Core cell types and rich text
- Styling in in-memory workflows (full styles supported)
- Formula evaluation (81 functions, dependency graph, cycle detection)
- Formula evaluation (88 functions, dependency graph, cycle detection)
- Excel Tables (structured data with AutoFilter, headers, styling)
- Performance-critical workloads (benchmarked vs POI)

Expand Down
27 changes: 14 additions & 13 deletions docs/STATUS.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@
- ✅ HTML export: `sheet.toHtml(range"A1:B10")`
- ✅ **Formula Parsing** (WI-07 complete): TExpr GADT, FormulaParser, FormulaPrinter with round-trip verification and scientific notation
- ✅ **Formula Evaluation** (WI-08 complete): Pure functional evaluator with total error handling, short-circuit semantics, and Excel-compatible behavior
- ✅ **Function Library** (WI-09a-h complete): **81 built-in functions** (aggregate, conditional, logical, text, date, financial, lookup, math), extensible type class parser, evaluation API
- ✅ **Function Library** (WI-09a-h + TJC-1055 complete): **88 built-in functions** (aggregate, conditional, logical, text, date, financial, lookup, math), extensible type class parser, evaluation API. Text functions include TRIM, MID, FIND, SUBSTITUTE, VALUE, TEXT (added in TJC-1055 / GH-116).
- ✅ **Dependency Graph** (WI-09d complete): Circular reference detection (Tarjan's SCC), topological sort (Kahn's algorithm), safe evaluation with cycle detection
- ✅ **Cross-Sheet Formula References** (TJC-351): Single cell refs (`=Sales!A1`), range refs (`=SUM(Sales!A1:A10)`), arithmetic with cross-sheet refs, workbook-level cycle detection (`DependencyGraph.fromWorkbook`)

Expand Down Expand Up @@ -78,7 +78,7 @@

### Test Coverage

**980+ tests across 6 modules** (includes P7+P8 string interpolation + WI-07/08/09/09d formula system + TJC-351 cross-sheet formulas + WI-10 table support + WI-15 benchmarks + WI-17 SAX streaming write + v0.3.0 regressions):
**1080+ tests across 6 modules** (includes P7+P8 string interpolation + WI-07/08/09/09d formula system + TJC-351 cross-sheet formulas + WI-10 table support + WI-15 benchmarks + WI-17 SAX streaming write + v0.3.0 regressions + TJC-1055 text functions):
- **xl-core**: ~500+ tests
- 17 addressing (Column, Row, ARef, CellRange laws)
- 21 patch (Monoid laws, application semantics)
Expand All @@ -102,7 +102,7 @@
- **xl-cats-effect**: ~30+ tests
- True streaming I/O with fs2-data-xml (constant memory, 100k+ rows)
- Memory tests (O(1) verification, concurrent streams)
- **xl-evaluator**: ~280 tests (parser, evaluator, function library, evaluation API, dependency graph, cross-sheet formulas, integration)
- **xl-evaluator**: ~338 tests (parser, evaluator, function library, evaluation API, dependency graph, cross-sheet formulas, integration)
- **Parser (WI-07)**: 57 tests
- 7 property-based round-trip tests (parse ∘ print = id)
- 26 parser unit tests (literals, operators, functions, edge cases)
Expand Down Expand Up @@ -130,16 +130,17 @@
**Formula System** (WI-07, WI-08, WI-09a/b/c/d - Production Ready):
- ✅ **Parsing** (WI-07): Typed AST (TExpr GADT), FormulaParser, FormulaPrinter, round-trip verification, 57 tests
- ✅ **Evaluation** (WI-08): Pure functional evaluator, total error handling, short-circuit semantics, 58 tests
- ✅ **Function Library** (WI-09a-h complete): **81 built-in functions**, extensible type class parser, evaluation API, 174 tests
- **Aggregate** (9): SUM, COUNT, COUNTA, COUNTBLANK, AVERAGE, MEDIAN, MIN, MAX, STDEV, STDEVP, VAR, VARP
- **Conditional** (6): SUMIF, COUNTIF, SUMIFS, COUNTIFS, AVERAGEIF, AVERAGEIFS, SUMPRODUCT
- **Logical** (8): IF, AND, OR, NOT, ISNUMBER, ISTEXT, ISBLANK, ISERR, ISERROR
- **Text** (12): CONCATENATE, LEFT, RIGHT, MID, LEN, UPPER, LOWER, TRIM, SUBSTITUTE, TEXT, VALUE
- **Date** (13): TODAY, NOW, DATE, YEAR, MONTH, DAY, HOUR, MINUTE, SECOND, EOMONTH, EDATE, DATEDIF, NETWORKDAYS, WORKDAY, YEARFRAC
- ✅ **Function Library** (WI-09a-h + TJC-1055 complete): **88 built-in functions**, extensible type class parser, evaluation API, 236 tests
- **Aggregate** (12): SUM, COUNT, COUNTA, COUNTBLANK, AVERAGE, MEDIAN, MIN, MAX, STDEV, STDEVP, VAR, VARP
- **Conditional** (7): SUMIF, COUNTIF, SUMIFS, COUNTIFS, AVERAGEIF, AVERAGEIFS, SUMPRODUCT
- **Logical** (9): IF, IFERROR, AND, OR, NOT, ISNUMBER, ISTEXT, ISBLANK, ISERR, ISERROR
- **Text** (12): CONCATENATE, LEFT, RIGHT, MID, LEN, UPPER, LOWER, TRIM, FIND, SUBSTITUTE, TEXT, VALUE
- **Date** (12): TODAY, NOW, DATE, YEAR, MONTH, DAY, EOMONTH, EDATE, DATEDIF, NETWORKDAYS, WORKDAY, YEARFRAC
- **Math** (16): ABS, ROUND, ROUNDUP, ROUNDDOWN, INT, MOD, POWER, SQRT, LOG, LN, EXP, FLOOR, CEILING, TRUNC, SIGN, PI
- **Financial** (7): NPV, IRR, XNPV, XIRR, PMT, FV, PV, RATE, NPER
- **Financial** (9): NPV, IRR, XNPV, XIRR, PMT, FV, PV, RATE, NPER
- **Lookup** (4): VLOOKUP, XLOOKUP, INDEX, MATCH
- **Info** (4): ROW, COLUMN, ROWS, COLUMNS, ADDRESS
- **Info** (5): ROW, COLUMN, ROWS, COLUMNS, ADDRESS
- **Array** (1): TRANSPOSE
- FunctionSpec registry: macro-collected specs with extensible registry
- APIs: sheet.evaluateFormula(), sheet.evaluateCell(), sheet.evaluateAllFormulas()
- Clock trait for pure date/time functions (deterministic testing)
Expand Down Expand Up @@ -210,7 +211,7 @@
- ✅ P7: String interpolation Phase 1 (runtime validation for all macros)
- ✅ P8: String interpolation Phase 2 (compile-time optimization)
- ✅ P31: Optics, RichText, HTML export, enhanced ergonomics
- ✅ **Formula System** (WI-07/08/09): Parser, evaluator, 81 functions, dependency graph, cycle detection
- ✅ **Formula System** (WI-07/08/09): Parser, evaluator, 88 functions, dependency graph, cycle detection
- ✅ **Excel Tables** (WI-10): Structured data with headers, AutoFilter, styling
- ✅ **Benchmarks** (WI-15): JMH performance suite (XL vs POI)
- ✅ **SAX Write** (WI-17): Fast SAX/StAX streaming write path
Expand Down Expand Up @@ -304,7 +305,7 @@ xl-cats-effect/src/com/tjclp/xl/io/
```

### Completed Modules (Additional)
- `xl-evaluator/` ✅ **Complete** (WI-07/08/09 - formula parsing, evaluation, 81 functions, dependency graph)
- `xl-evaluator/` ✅ **Complete** (WI-07/08/09 - formula parsing, evaluation, 88 functions, dependency graph)
- `xl-benchmarks/` ✅ **Complete** (WI-15 - JMH performance benchmarks)

### Not Started (Future Phases)
Expand Down
2 changes: 1 addition & 1 deletion docs/design/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -184,6 +184,6 @@ The evaluator implements: `Evaluator.eval: TExpr[A] => Sheet => Either[EvalError
- Topological sort for evaluation order (Kahn's algorithm)
- Short-circuit evaluation for And/Or
- Division by zero handling (returns `CellError.Div0`)
- 81 Excel functions: SUM, AVERAGE, IF, VLOOKUP, XLOOKUP, SUMIF, COUNTIF, NPV, IRR, and more
- 88 Excel functions: SUM, AVERAGE, IF, VLOOKUP, XLOOKUP, SUMIF, COUNTIF, NPV, IRR, and more

See `docs/STATUS.md` for the complete function list.
5 changes: 3 additions & 2 deletions docs/plan/roadmap.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@

## TL;DR

**Current Status**: Production-ready with **81 formula functions**, SAX streaming (36% faster than POI), Excel tables, and full OOXML round-trip. 733+ tests passing.
**Current Status**: Production-ready with **88 formula functions**, SAX streaming (36% faster than POI), Excel tables, and full OOXML round-trip. 1080+ tests passing.

**Current Version**: 0.6.1

Expand Down Expand Up @@ -73,7 +73,8 @@ CLI expansion with 7 new commands and evaluator fixes:
All completed phases are documented in git history. Key milestones:

- **P0-P8**: Foundation, OOXML, streaming, codecs, macros
- **WI-07/08/09**: Formula parser, evaluator, 81 functions
- **WI-07/08/09**: Formula parser, evaluator, 88 functions
- **TJC-1055** (closes GH-116): Text functions — TRIM, MID, FIND, SUBSTITUTE, VALUE, TEXT (88 functions total)
- **WI-10**: Excel table support
- **WI-17**: SAX streaming write (36% faster than POI)
- **WI-19**: Row/column property serialization
Expand Down
1 change: 1 addition & 0 deletions docs/reference/examples.md
Original file line number Diff line number Diff line change
Expand Up @@ -252,6 +252,7 @@ scala-cli examples/dependency-analysis.sc
scala-cli examples/data-validation.sc
scala-cli examples/sales-pipeline.sc
scala-cli examples/evaluator-demo.sc
scala-cli examples/text_functions_demo.sc # TRIM, MID, FIND, SUBSTITUTE, VALUE, TEXT
```

## 4) Chart spec (Future - WI-11)
Expand Down
117 changes: 117 additions & 0 deletions examples/text_functions_demo.sc
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
#!/usr/bin/env -S scala-cli shebang
//> using file project.scala


// Demonstrates the 6 text functions added in TJC-1055 / GH-116:
// TRIM, MID, FIND, SUBSTITUTE, VALUE, TEXT
//
// Each section: build a small workbook, apply realistic formulas, and print
// "formula = result (expected: ...)" so any divergence pops out visually.
//
// Run with:
// 1. Publish locally: ./mill xl.publishLocal
// 2. Run script: scala-cli run examples/text_functions_demo.sc

import com.tjclp.xl.{*, given}
import com.tjclp.xl.cells.CellValue

println("=== XL Text Functions Demo (TJC-1055 / GH-116) ===\n")

/** Evaluate a formula on the given sheet and stringify the result. */
def eval(formula: String, sheet: Sheet): String =
sheet.evaluateFormula(formula) match
case Right(CellValue.Text(s)) => s"\"$s\""
case Right(CellValue.Number(n)) => n.toString
case Right(CellValue.Bool(b)) => b.toString
case Right(other) => other.toString
case Left(err) => s"<ERROR: $err>"

/** Print a formula result alongside the expected value. Mismatches visually pop. */
def show(formula: String, sheet: Sheet, expected: String): Unit =
val got = eval(formula, sheet)
val mark = if got == expected then "✓" else "✗"
println(f" $mark%s $formula%-50s = $got%-30s (expected: $expected)")


// =====================================================================
// 1. TRIM + SUBSTITUTE — clean messy CSV-imported data
// =====================================================================
println("\n--- 1. Cleanup pipeline (TRIM, SUBSTITUTE) ---")

val cleanup = Sheet("Cleanup")
.put(ref"A1", CellValue.Text(" alice@example.com "))
.put(ref"A2", CellValue.Text("Name: Bob; Age: 42"))
.put(ref"A3", CellValue.Text("a,b,,c,,,d"))

show("=TRIM(A1)", cleanup, "\"alice@example.com\"")
show("=SUBSTITUTE(A2, \"; \", \" | \")", cleanup, "\"Name: Bob | Age: 42\"")
show("=SUBSTITUTE(A3, \",,\", \",\")", cleanup, "\"a,b,c,,d\"")
show("=SUBSTITUTE(SUBSTITUTE(A3, \",,\", \",\"), \",,\", \",\")", cleanup, "\"a,b,c,d\"")


// =====================================================================
// 2. VALUE — parse currency / percent / accounting strings
// =====================================================================
println("\n--- 2. Numeric parsing (VALUE) ---")

val parsing = Sheet("Parsing")
.put(ref"A1", CellValue.Text("$1,234.56"))
.put(ref"A2", CellValue.Text("(500)"))
.put(ref"A3", CellValue.Text("45.5%"))
.put(ref"A4", CellValue.Text(" $-1,000 "))

show("=VALUE(A1)", parsing, "1234.56")
show("=VALUE(A2)", parsing, "-500")
show("=VALUE(A3)", parsing, "0.455")
show("=VALUE(A4)", parsing, "-1000")


// =====================================================================
// 3. TEXT — format numbers / dates for display
// =====================================================================
println("\n--- 3. Display formatting (TEXT) ---")

val formatting = Sheet("Formatting")
.put(ref"A1", CellValue.Number(BigDecimal("1234567.89")))
.put(ref"A2", CellValue.Number(BigDecimal("0.075")))
.put(ref"A3", CellValue.Number(BigDecimal("-1234.5")))

show("=TEXT(A1, \"#,##0.00\")", formatting, "\"1,234,567.89\"")
show("=TEXT(A2, \"0.00%\")", formatting, "\"7.50%\"")
show("=TEXT(A3, \"#,##0.00;-#,##0.00\")", formatting, "\"-1,234.50\"")
show("=TEXT(A1, \"0\")", formatting, "\"1234568\"")


// =====================================================================
// 4. FIND + MID — extract email domain (function composition)
// =====================================================================
println("\n--- 4. Extract email domain (FIND + MID) ---")

val emails = Sheet("Emails")
.put(ref"A1", CellValue.Text("alice@example.com"))
.put(ref"A2", CellValue.Text("bob@tjclp.com"))
.put(ref"A3", CellValue.Text("charlie+filter@gmail.co.uk"))

// =MID(A1, FIND("@", A1) + 1, 100) — MID handles overflow by clamping
show("=MID(A1, FIND(\"@\", A1) + 1, 100)", emails, "\"example.com\"")
show("=MID(A2, FIND(\"@\", A2) + 1, 100)", emails, "\"tjclp.com\"")
show("=MID(A3, FIND(\"@\", A3) + 1, 100)", emails, "\"gmail.co.uk\"")


// =====================================================================
// 5. Round-trip: TEXT(VALUE(s)) — normalize messy currency input
// =====================================================================
println("\n--- 5. Round-trip: messy → number → canonical (TEXT(VALUE(...))) ---")

val roundtrip = Sheet("Roundtrip")
.put(ref"A1", CellValue.Text("$1,234.56"))
.put(ref"A2", CellValue.Text("(2,500)"))
.put(ref"A3", CellValue.Text("78.9%"))

show("=TEXT(VALUE(A1), \"#,##0.00\")", roundtrip, "\"1,234.56\"")
show("=TEXT(VALUE(A2), \"#,##0.00;-#,##0.00\")", roundtrip, "\"-2,500.00\"")
show("=TEXT(VALUE(A3), \"0.00%\")", roundtrip, "\"78.90%\"")


println("\n=== Demo Complete ===")
println("Tip: change a formula above and re-run to explore behavior.")
Loading
Loading