Fix to_markdown producing double spaces for empty cells (| | → | |).
When YAML frontmatter has a title AND actual H1 headers exist in the document, the frontmatter is now treated as a Document section instead of the workbook root. The H1 header is used for workbook auto-detection as usual. Previously, frontmatter title unconditionally preempted H1 auto-detection, causing 0 sheets to be parsed when both coexisted.
YAML frontmatter is now preserved during round-trip (parse → generate → re-parse). Frontmatter data is stored in metadata["frontmatter"] sub-dict with metadata["header_type"] set to "frontmatter", enabling accurate regeneration of the original format.
Supports YAML Frontmatter block parsing and Multi-H1 routing.
Additionally, introduces GFM Task List items ([ ] / [x]) to boolean conversion mapping in ConversionSchema.
fix: update repository URL to fy-labs for npm provenance
Add flexible workbook detection and doc sheet support.
Workbook Auto-Detection (root_marker=None):
- Single H1 header is auto-detected as workbook root
- H1 containing
<!-- md-spreadsheet- ... -->metadata comment is detected - Fallback search for
# Tablesor# Workbook - Sheet/table header levels auto-calculated from workbook level
New Sheet Fields:
Sheet.type:"table"or"doc"Sheet.content: Raw markdown for doc sheets
New Workbook Fields:
Workbook.name: Extracted from detected root markerWorkbook.root_content: Captures content between workbook header and first sheet header
Backward Compatibility:
- Explicit
root_markerpreserves existing behavior table_header_level=Nonewith explicit root still means "no table headers"
docs: Add PengSheets announcement and update NPM READMEs
- Core: Added PengSheets announcement to
README.mdandREADME.ja.md. - Core: Fixed invalid
update_table_metadataexample inREADME.md. - NPM: Added
packages/npm/README.ja.md(Japanese translation). - NPM: Updated logic parity claims and Excel limitations text.
Ensure JSON output for CLI and metadata uses Unicode characters instead of escape sequences (e.g. Japanese).
Added consistent CRUD operation methods to Workbook, Sheet, and Table models:
Workbook:
move_sheet(from_index, to_index)- reorder sheetsreplace_sheet(index, sheet)- replace sheet at indexrename_sheet(index, new_name)- rename sheet at index
Sheet:
rename(new_name)- rename the sheetadd_table(name?)- append a new empty tabledelete_table(index)- remove table at indexreplace_table(index, table)- replace table at indexmove_table(from_index, to_index)- reorder tables
Table:
rename(new_name)- rename the tablemove_row(from_index, to_index)- reorder rowsmove_column(from_index, to_index)- reorder columns
Comprehensive improvements to the NPM package testing infrastructure:
- Migrated from monolithic
scripts/test.mjsto modulartests/directory - Split tests into dedicated files:
parsing.test.mjs,table.test.mjs,sheet.test.mjs,workbook.test.mjs,tomodels.test.mjs - Added shared
helpers.mjswith assertion utilities andrunner.mjstest orchestrator
replaceTableandreplaceSheetnow auto-convert model instances to DTO (no explicit.toDTO()required)- Full API parity with Python: users can pass
Table/Sheetinstances directly
- Added tests for:
deleteRow,deleteColumn,insertRow,insertColumn,clearColumnData - Added
Sheet.getTabletests - Added
parseTableFromFile,parseWorkbookFromFile,scanTablesFromFilefunction verification
- Created
DEVELOPMENT.mddocumenting the Python-to-NPM workflow Added comprehensive E2E tests for NPM package (118 test cases) covering parseTable, parseWorkbook, scanTables, Table/Sheet/Workbook methods, deep metadata type verification, toModels with Plain Object and Zod schemas, and mutation API return values. Also improved .gitignore documentation, fixed verify_api_coverage.py, and added Limitations section to README documenting parseExcel unavailability.
Fixed Object.assign(this, res) usage in NPM package causing metadata to remain as a JSON string.
The TypeScript wrapper generator (scripts/generate_wit.py) now produces proper hydration code that reconstructs objects via the constructor, ensuring:
metadatais parsed from JSON string toRecord<string, any>(matching Python'sdict[str, any])- Nested models (e.g.,
SheetwithinWorkbook,TablewithinSheet) are properly instantiated as class instances
Fixed nested models in NPM package constructors to properly wrap child elements.
- Sheet constructor now wraps tables array items as Table instances
- Workbook constructor now wraps sheets array items as Sheet instances
- This ensures
jsongetter recursively returns objects with proper metadata types - Previously, nested metadata was returned as strings instead of objects
Added json getter to Table, Sheet, and Workbook classes in the NPM package.
- The
jsongetter mirrors Python's.jsonproperty - Returns a JSON-compatible plain object representation
- Recursively converts nested models (e.g., Sheet.json includes all tables.json)
Fixed WASM loading in Vite dev mode by using import.meta.url for proper path resolution.
- Modified build script to post-process JCO transpile output
- Replaced relative path
fetch('./parser.core.wasm')withfetch(new URL('./parser.core.wasm', import.meta.url)) - This ensures WASM files are correctly resolved in both bundled and development environments
NPM package now builds and works correctly in browser environments (Vite, Webpack, etc.).
- Core APIs (
parseTable,parseWorkbook,scanTables, etc.) work seamlessly in both Node.js and browser environments - File-based APIs (
parseTableFromFile,parseWorkbookFromFile,scanTablesFromFile) are now async functions that:- Work correctly in Node.js with lazy WASI filesystem initialization
- Throw a clear error message in browser environments with guidance to use string-based alternatives
- Fixed
_addPreopen is not exportederror when using Vite or other browser bundlers
Fixed Workbook.to_markdown() to accept an optional schema argument, defaulting to a standard MultiTableParsingSchema. This aligns the API with Sheet.to_markdown() and Table.to_markdown().
Reduced NPM package size by excluding redundant intermediate WASM files.
Fixed an issue where parseWorkbookFromFile failed with FileNotFoundError in the NPM package environment.
- Configured WASI preopens to map the system root (e.g.,
/on macOS,C:\on Windows) to the Guest root. - Implemented
resolveToVirtualPathto automatically resolve relative paths against the Host's CWD and absolute paths against the system root. parseWorkbookFromFilenow correctly handles both relative and absolute paths in Node.js environments.
Fixed a critical bug in the NPM package where Workbook.getSheet() and Sheet.getTable() returned plain objects instead of class instances. Now verifies that proper Sheet and Table instances are returned, restoring API compatibility.
Also fixed an issue where optional return types (like optional<Sheet>) were not correctly handled in the wrapper.
Move @bytecodealliance/preview2-shim to dependencies to ensure it is available at runtime. This fixes ERR_MODULE_NOT_FOUND when using the package in a fresh environment.
Update NPM publishing workflow to use Trusted Publishing (OIDC) instead of secret tokens.
Introduced comprehensive support for building an NPM package (md-spreadsheet-parser) powered by the Python core via WebAssembly (WASM).
- WASM Compilation: Uses
componentize-pyto compile the Python library into a WASM Component, enabling usage in Node.js environments. - TypeScript Wrappers: Automatically generates high-fidelity TypeScript class wrappers that mirror the Python object model (API Parity).
- Python
Table,Workbook,Sheetclasses are fully exposed in TypeScript. - Methods like
toMarkdown,updateCell, andaddSheetare available directly on TypeScript objects.
- Python
- Seamless Integration:
- JSON Marshalling: Metadata dictionaries are automatically handled (serialized/deserialized) across the boundary.
- Optional Arguments: Python default arguments are correctly mapped to optional TypeScript parameters (e.g.,
schema?). - Client-Side Mapping:
Table.toModelssupports passing browser-side schema classes or Zod-like validators.
- Verification: Added a robust verification environment (
verification-env) ensuring cross-language compatibility.
Fixed a bug in parsing.py where the parser was incorrectly looking for <!-- md-spreadsheet-metadata: ... --> instead of <!-- md-spreadsheet-table-metadata: ... --> when extracting tables from blocks. This ensures consistency with the generator and specification.
Metadata: Updated PyPI Development Status to Production/Stable.
Metadata: Updated PyPI Development Status to Production/Stable.
Documentation: Added announcement for the official VS Code Extension PengSheets release. Remove outdated roadmap and features section from READMEs. Complete README.ja.md translation and update metadata tag example in README.md.
Added Japanese translation for README.md and COOKBOOK.md.
Configured mkdocs-static-i18n to support bilingual documentation (English/Japanese).
Added language switcher with globe icon to the documentation site.
- Added robustness test
test_root_marker_robustness.pyto verify behavior when# Tablesroot marker is missing.
- BREAKING: Renamed
<!-- md-spreadsheet-metadata: ... -->to<!-- md-spreadsheet-table-metadata: ... -->for consistency. - Backward compatibility for the old tag has been dropped. Existing files with the old tag will still be parsed as tables, but the visual metadata (column widths, validation, etc.) will be ignored until manually updated.
Added SECURITY.md with reporting instructions.
Add GitHub Actions workflows for PyPI and TestPyPI publishing.
- Fix: Relaxed the location requirement for Workbook metadata. It can now appear anywhere in the file (e.g., before additional documentation sections), not just at the strictly last non-empty line.
Added metadata field to the Workbook model, allowing arbitrary data storage at the workbook level. This aligns the Workbook model with Sheet and Table models.
wb = Workbook(sheets=[], metadata={"author": "Alice"})
# Metadata is persisted at the end of the file:
# <!-- md-spreadsheet-workbook-metadata: {"author": "Alice"} -->- Fix: Improved hierarchical header flattening for vertically merged cells (e.g., prohibiting trailing separators like
Status -). - Enhancement: Cleaner string conversion for Excel numbers; integer-floats (e.g.,
1.0) are now automatically converted to valid integers ("1") instead of preserving the decimal ("1.0").
Add Excel parsing support with merged cell handling
New functions:
parse_excel(): Parse Excel data from Worksheet, TSV/CSV string, or 2D arrayparse_excel_text(): Core function for processing 2D string arrays
Features:
- Forward-fill for merged header cells
- 2-row header flattening ("Parent - Child" format)
- Auto-detect openpyxl.Worksheet if installed
Added a script
scripts/build_pyc_wheel.pyto generate optimized wheels containing pre-compiled bytecode (.pyconly) for faster loading in Pyodide environments (specifically for the VS Code extension).
See GitHub Releases: https://github.com/f-y/md-spreadsheet-parser/releases