Clone entity-table for NANSEN R2025a+ table backend by jesshaley · Pull Request #42 · Waltham-Data-Science/nansen-pulakat

jesshaley · 2026-05-06T18:53:11Z

Summary

Adapts nansen-pulakat to VervaekeLab/NANSEN PR #77, which modernizes NANSEN's MetaTableViewer for MATLAB R2025a+. The PR splits the viewer into a legacy Java backend (pre-R2025a) and a modern uifigure/EntityTable backend (R2025a+); the modern path requires the new ehennestad/entity-table package on the MATLAB path. Pre-R2025a installs continue to use the legacy path with no behavioral change.

What changed

install.m — clone entity-table as a sibling of the other repos under ~/ndi/tools/ (step 5b, between NANSEN clone and nansen_install). NANSEN's nansen.util.ensureEntityTableOnPath probes <nansenRoot>/../entity-table as one of its candidate paths, so the sibling clone is automatically discovered.
src/ndi/+ndi/+nansen/startup.m — sync entity-table alongside NANSEN/openMINDS/NDI-matlab on each ndi.nansen.startup invocation so updates flow through the same git pull mechanism as the rest of the upstream repos.
.github/workflows/matlab-tests.yml — add actions/checkout@v4 of ehennestad/entity-table and addpath(genpath('entity-table/src')) in both the offline-tests and cloud-tests jobs. Necessary so the cloud Startup test (which launches the Nansen GUI) does not break under R2025a, and as defense-in-depth for the offline plugin-discovery tests.
CLAUDE.md — add entity-table to the upstream-repos table, the install-flow clone order, the cross-repo coupling table, and a new "MATLAB version coupling" section that documents getCellDisplayStringForContext and notes that pulakat's plain-text tablevariables work under both backends without per-context branches.

Why no source changes in pulakat

The new API hook nansen.metadata.abstract.TableColumnFormatter.getCellDisplayStringForContext(obj, displayContext) is additive — the abstract base class supplies a default that delegates to the existing getCellDisplayString(). Pulakat's +tablevariable/* classes override only getCellDisplayString and emit plain strings (no <html> markup, no getIconHtmlString icon embeds), so they render correctly under the modern uifigure backend through the default fallback. If a future formatter ever needs HTML/icon rendering it should override getCellDisplayStringForContext directly, as documented in CLAUDE.md.

Test plan

Offline CI on latest MATLAB (default) — entity-table clone + addpath should be a no-op for plugin-discovery tests
Manually trigger workflow with R2025a to confirm the modern table backend loads (no NANSEN:MissingEntityTable error)
Manually trigger workflow with R2024b to confirm pre-R2025a path still passes
Run install.m end-to-end on a clean machine; verify ~/ndi/tools/entity-table/ is created and a subsequent pulakat.startup syncs it without errors
Cloud Startup test passes against R2025a (when secrets are configured)

https://claude.ai/code/session_01S7oxviMzuwcExHpW2J1eEs

Generated by Claude Code

project_info.subjectFileColumns describes two distinct sets of columns that the import flow conflated: * Genuine source-CSV columns (Strain, BiologicalSex, Animal, Cage, Label, Treatment, ...) that the user supplies in animal_mapping.csv. * Auto-computed columns (ElectronicFileName, added by tableFromFile itself; SubjectIdentifier, added by ndi.nansen.import.subject) that documents.m needs to know about for ontologyTableRow document creation but that are never present in the source file. Listing the auto-computed pair in subjectFileColumns produced two problems on Session > Import > Subjects > File: 1. validateTable warned every time about missing ElectronicFileName / SubjectIdentifier — a spurious warning because those columns are by design not in the CSV. 2. setvartype and SelectedVariableNames threw "Unknown variable name: 'SubjectIdentifier'", aborting the import. tableFromFile now distinguishes the two sets: * Pass only the genuinely-expected-in-CSV columns to validateTable (filtered by post-rename name not appearing in the AUTO_ADDED list). * Use intersect with importOptions.VariableNames to scope setvartype and SelectedVariableNames to columns the file actually carries. * Rename only the columns that ended up in the stacked table. documents.m's ontology-row logic still reads the full subjectFileColumns list, so SubjectIdentifier and ElectronicFileName remain part of the document schema — the patch is contained to the CSV-shape checks.

Replaces the hardcoded AUTO_ADDED list in tableFromFile with a proper "auto" boolean on each project_info column entry. Two fields in pulakat's project_info.json now declare their origin explicitly: * SubjectIdentifier — auto: true (added by ndi.nansen.import.subject) * ElectronicFileName — auto: true (added by ndi.nansen.import.subject.tableFromFile) * FileIdentifier in dataFileColumns — auto: true (added by ndi.nansen.import.file) * everything else — auto: false (genuine source CSV column) The flag is read via a new ndi.nansen.fun.isAutoColumn helper that defaults to false when the field is absent, so older project_info files (and the testlab unit-test fixture) keep working without changes. Imports use the mask to skip auto columns when validating CSV shape; documents.m and ontologyTableRow.m read the full column list because the auto columns are present in the post-import table. manual.m's input dialog now also skips auto columns — the user shouldn't be prompted for a SubjectIdentifier or ElectronicFileName since those get populated by import code, not by the operator. FunHelpers gains two regression cases: * isAutoColumnReadsFlag — mixed entries return the right mask. * isAutoColumnDefaultsFalseWhenFieldMissing — older project_info files without the auto field continue to work.

nansen.metadata.MetaTable.updateTableVariable invokes Cloud.update once per metatable row. The Cloud tablevariable's update method calls ndi.nansen.fun.getCloudStatus, which previously did two heavyweight operations per call: * ndi.nansen.fun.datasetID2Object(datasetID) — re-opens the NDI dataset from disk via ndi.dataset.dir, parsing config files and initializing the document store. * ndi.nansen.sync.status(dataset) — reads the sync index file and enumerates every local document, then computes the upload diff. For a Pulakat-scale Subject metatable (300 rows) that's ~600 disk- heavy operations per Cloud column refresh; clicking Document or Import-and-refresh triggers a multi-minute hang. The earlier missing- columns fix (which made addMissingVarsToMetaTable add Cloud as a real column with HasUpdateFunction) exposed this latent perf issue. Add a persistent cache keyed by DatasetIdentifier with a 30-second TTL. Same approach as listUniqueMetaTableValues and countUniqueMetaTableValues. The TTL is short enough that a Sync click followed by a metatable refresh picks up the new cloud state on the next pass; `clear functions` is the explicit reset for anything tighter.

ndi.nansen.startup runs ndi.cloud.sync.downloadNew against an already-downloaded local dataset, which calls ndi.cloud.api.files.waitForAllBulkUploads. That polling hangs when the server thinks an upload is still in flight (e.g. after a force-quit MATLAB left an upload half-committed) and there is no clean way to interrupt it once it has started. New SkipCloudSync option (default false) on ndi.nansen.startup and pulakat.startup lets the caller bypass downloadNew when needed: pulakat.startup('SkipCloudSync', true) When set, the local dataset loads as-is, the GUI launches against the local state, and the dataset Sync table-method action remains the explicit way to reconcile with the cloud (still does the full round-trip when clicked). The initial dataset download is unaffected — SkipCloudSync only suppresses the incremental sync against an existing local copy.

When the caller asks to skip cloud sync at startup, also short-circuit the ndi.cloud.testLogin verification + ndi.cloud.uilogin prompt. The same root cause that motivates SkipCloudSync (testLogin or downloadNew hanging against a flaky cloud) makes the login dance pointless and adds another way to hang at startup. Bundling them under one flag keeps the escape hatch effective.

User is fixing testLogin behaviour upstream in NDI-matlab. SkipCloudSync goes back to its narrower meaning: skip ndi.cloud.sync.downloadNew only. testLogin / uilogin run on every startup regardless of the flag, so users who genuinely aren't authenticated still get the login prompt.

When the dataset and a session share a directory (the common case), ndi.session.dir(path) without a reference creates a brand-new session object with a freshly-minted identifier instead of opening the existing session at that path. The metatable was populated via dataset.open_session(id), which sets .identifier to the persistent value, so subsequent lookups by ndi.session.dir(path).id never matched anything in the metatable — Subject and File imports were filtering on a SessionIdentifier that didn't exist anywhere. Fix: pass SessionName as the reference everywhere we construct an ndi.session.dir from a path. Six callsites updated: - +sessionmethod/+methods/+import/+data/file.m, folder.m - +sessionmethod/+methods/+import/+subjects/file.m, manual.m - +objectmethod/+subject/+methods/document.m - +objectmethod/+file/+methods/document.m The two document.m sites loop over unique session paths from a working table; they now also look up the parallel SessionName from that table. ndi.nansen.import.session already passed (sessionName, sessionPath) so it was correct.

Replace class(metaTable) with metaTable.MetaTableClass because class() method override of MetaTable was removed in NANSEN

- Use getMetaTableFilePath of MetaTableCatalog instead of entry.SavePath, entry.FileName because entry.SavePath was removed in Nansen - Use isfile instead of exist(filename, 'file') for better performance

…riables NANSEN PR #78 removed MetaTable.addMissingVarsToMetaTable in favour of Project.synchronizeMetaTableVariables, which derives the table type from the metatable itself. PR #39 caught getMetaTableFilePath and MetaTableClass but missed this third call site, so MetatableMerge tests fail in CI with an undefined-method error against NANSEN dev.

…s-xfNaj Replace removed addMissingVarsToMetaTable in merge.m

…table-refactor Fix: Adapt to NANSEN `MetaTable` refactor

NANSEN PR #78 changed MetaTable.updateTableVariable from looking up the project's TableVariable update function internally to taking it as an explicit third argument: updateTableVariable(obj, variableName, tableRowIndices, updateFunction) The new signature defaults updateFunction to function_handle.empty() and unconditionally invokes it, so any 2-arg call now errors with "Invoked function handles must be scalar." This fires from ndi.nansen.metatable.updateAll's second pass during pulakat.startup and trips the GUI-launch guard in ndi.nansen.startup. Look up UpdateFunctionName from the project's TableVariable table at each call site (matching NANSEN's own addMissingTableVariables) and pass str2func(name) as the third argument.

ndi.nansen.import.subject line 155 assigned the raw return of ndi.nansen.fun.getMetaTableValue (a char like 'DS-123' or '') to subjectTable_new{:,'DatasetIdentifier'}. When subjectTable_new has more than one row, MATLAB can't broadcast a 1xM char to N rows of a cell-typed column and errors with "the number of rows must match the height of the table." The three lines just above already wrap their scalar values in { } for exactly this reason; this aligns line 155 with that pattern.

When a dataset and a session share a directory (the empty dataset-default 'pulakat' session sitting on top of the same path as the active 'test_2026' session), ndi.session.dir(reference, path) silently resolves to the dataset's default session. Downstream code in subject/file/data imports then writes the dataset session's id into SessionIdentifier columns, producing duplicate Subject rows under the wrong session and orphaning per-session metadata. ndi.session.dir takes an undocumented third positional arg -- session_id -- that pins resolution to a specific session. The GUI's sessionObject already carries SessionIdentifier as the source of truth, so feed it in. An assertion right after the constructor ensures any future regression in NDI's resolution fails loud rather than misroutes silently. Applied to all four sessionmethod entry points that opened an existing session via sessionObject: - import/subjects/file.m (the original repro path) - import/subjects/manual.m - import/data/file.m - import/data/folder.m The two +objectmethod/document.m loops use unique(SessionPath) and need a different shape; left for a follow-up.

The +objectmethod/document.m loops in subject and file did unique(SessionPath) and pulled SessionName from the first matching row, then called ndi.session.dir(name, path). When a dataset and a session share a directory (the empty 'pulakat' default session sitting on top of the same path as the active session), unique-by- path collapsed both onto a single iteration and ndi.session.dir silently resolved to the dataset's default session -- documents got attached to the wrong session. Group by SessionIdentifier instead; that's the actual identity of a session in the metatable. Pull the path/name for the call from any matching row, and pass the id as ndi.session.dir's third arg (undocumented but supported) so the path-collision case can't misroute. Assert the resolved id matches.

The orphan subject rows produced by the earlier session-misrouting bug carry an empty DatasetIdentifier (a 0x1 cell), because getMetaTableValue returned '' when the row was first inserted and the import wrapped that empty into the cell-typed column. Selecting one of those rows for removal hit datasetID2Object's text-scalar validator, blocking the very cleanup needed to delete the orphans. Read DatasetIdentifier from the project's Dataset metatable instead of the selected subject row -- a project has exactly one Dataset entry, and its identifier is always populated. The dataset object here is only used for the post-remove metatable.update calls, which don't care which subject row supplied the id. Other object-method paths (edit, document, mergeDuplicates) still pull DatasetIdentifier from the selected row; they can hit the same error if the user happens to act on an orphan first, but the immediate need is to unblock cleanup. Once orphans are removed they fall out of every other path.

…ough it A datasetID column on a metatable row can be empty -- orphan rows produced before their parent Session was registered carry a 0x1 cell. Until now, every code path that opened the dataset object via ndi.nansen.fun.datasetID2Object(<row>.DatasetIdentifier) crashed when the user happened to act on an orphan: the validator on datasetID2Object rejects empty input and the operation aborts before any cleanup can run. Add ndi.nansen.fun.resolveDatasetID, which returns the candidate identifier when it is a non-empty char/string (unwrapping a 1x1 cell on the way) and falls back to the DatasetIdentifier stored in the project's Dataset metatable otherwise. A project has exactly one Dataset entry, so the fallback is unambiguous. Wrap every per-row datasetID2Object call site through resolveDatasetID: Object methods: - +subject/+methods/{remove, edit, mergeDuplicates, document}.m - +subject/+methods/+export/files.m - +file/+methods/{document, export}.m Session methods: - +methods/remove.m - +methods/+export/files.m - +methods/+import/+subjects/{file, manual}.m - +methods/+import/+data/{file, folder}.m Table variables: - +tablevariable/+{subject,file,session}/DateAdded.m The Dataset table-variable callsites (+tablevariable/+dataset/*) are left alone -- they iterate the Dataset row itself, whose DatasetIdentifier is by construction never empty. getCloudStatus is the one exception: an orphan row's Cloud value can't be looked up in any sync table, so it short-circuits to the default value when DatasetIdentifier is empty rather than reaching for the project's dataset (which would return a status for the wrong row).

Two bugs and a UX change in subject and file Document object methods: 1. isDocument was computed as isequal(table.<X>DocumentIdentifier, {'N/A'}) which compares the entire cell column against a length-1 cell. That is only true for a single-row table; for any multi-row selection it returns scalar false, so ~isDocument is scalar true and every row (including already-documented ones) reaches documents(). The upstream NDI framework then runs createOntologyTableRowDoc, which has its own bug: it calls isnan() on options.OldDocs{1} before checking isa(..., 'ndi.document'), so as soon as a pre-existing document is reached the run errors out. Use per-row strcmp(..., 'N/A') so already-documented rows are correctly excluded from the loop. Even with the upstream isnan bug unfixed, this stops triggering it in practice. 2. Surface validation up front. The previous flow validated, ran documents() against the surviving rows, and only showed the validation report at the end. A user with bad inputs would learn about it after partial work was already on disk. Now: if any row is invalid or already documented, prompt with a count breakdown and Continue/Cancel before any document is created. Cancel shows the detailed report and returns; Continue proceeds with the valid, not-yet-documented rows. When everything is clean the prompt is skipped and the original silent path runs. When nothing can be created (every selected row is invalid or already documented), show an informational dialog instead of pretending to do work. The upstream isnan-on-ndi.document bug at ndi.setup.NDIMaker.tableDocMaker.createOntologyTableRowDoc:129 is left for a separate VH-Lab/NDI-matlab patch.

The earlier prompt was a plain Continue/Cancel without surfacing what exactly was wrong. Users couldn't tell from the count breakdown which fields on which rows were missing -- they had to cancel and discover the report dialog separately. Restructure: when any selected row is invalid or already documented, show the validation report dialog first (so the user sees row-level error messages with red/green styling), then prompt: - "Edit invalid first" -- close, return; user fixes and re-triggers - "Create valid documents" -- proceed with isValid & ~isDocument - "Cancel" -- close, return When every selected row is invalid or already documented, the prompt collapses to a single OK dialog ("Nothing to create. Edit the invalid subjects and try again."). When everything is clean (no invalid, no already-documented), the report and prompt are both skipped and the old silent path runs through to creation. Drop the post-creation showValidationReport call. The up-front report covers the "here's what's wrong" surface; the metatable's cell updates after creation cover the "here's what we did" confirmation. A trailing report on the all-clean path was a redundant "All N subjects are valid" toast.

…eport Two related changes to the Document flow: 1. Show whether each row already has a document. The validation report now takes an optional IsDocumented logical (same length as isValid) and renders rows in three colors instead of two: - blue-tinted: already documented; ErrorMessage replaced with "Already has a document" - red-tinted: invalid; ErrorMessage from the report table - green-tinted: valid and ready for creation The note under the header now mentions that documented rows are listed for context and will be skipped, so users can tell at a glance which rows would be acted on. 2. Move the decision into the report window. Previously the dialog showed the table and a Close button, then the calling code popped a separate questdlg with three options. Now showValidationReport accepts a Buttons cell and a Default highlight; with two or more buttons it becomes a blocking modal and returns the chosen text. The single-button form ({'Close'}) keeps the legacy non-blocking behaviour, including the "All selected subjects are valid" alert on the all-clean path. The +subject/ and +file/ Document object methods now drive everything through one showValidationReport call: - dirty (any invalid or any already documented): buttons = {Edit invalid first, Create N document(s), Cancel} Default = Edit invalid first Cancel via title-bar X resolves to "Cancel" - dirty but nothing to create (only invalid + documented): buttons = {Close} - everything clean: skip the dialog, proceed silently. Title-bar X mapping uses any Cancel/Close button in the list when present so the X behaves the same as the explicit cancel.

Previously 'Edit invalid first' just returned from Document, leaving the user to re-select the invalid rows manually and click Edit themselves. Now Document hands the invalid subject objects directly to pulakat.objectmethod.subject.methods.edit, which opens its own import-table GUI populated with only those rows. After edit returns, Document returns; the user re-issues Document once their changes settle. Match invalid rows back to subjectObject by SubjectIdentifier rather than positional index because metaTable.getEntry doesn't promise to return rows in the requested order, so subjectTable's row order may differ from subjectObject's. When nToCreate == 0 but there are still invalid rows to fix, offer 'Edit invalid' / 'Cancel' instead of just 'Close', so the path to fixing things is one click away even when nothing is creatable. File Document drops 'Edit invalid first' entirely because file rows are derived from on-disk file headers and there's no row-edit object method for them; the dirty-row buttons collapse to 'Create N' / 'Cancel'.

The "Failed to update variable Cloud / Dot indexing is not supported" warning fires somewhere inside the lookup path but the trace doesn't say where -- NANSEN's variable-update wrapper swallows the stack and just reports the message text. Move the lookup body into an inner helper, wrap the call in try/catch, and on failure print obj's class and field names so the next occurrence is diagnosable instead of opaque. Returns the default value on failure rather than letting the exception propagate as a row-update warning. Also harden the document-id strcmp at the bottom: unwrap a 1x1 cell DocumentIdentifier and skip the lookup if the value isn't text. This covers an orphan-style row where the per-row DocumentIdentifier is empty or a non-text default.

After clicking 'Edit invalid first' the user fixed identifying fields and the editor closed, but Document then returned -- forcing them to manually re-trigger Document and walk through the dialog a second time to confirm the edits resolved the issues. With several rounds of fixes per import that's a lot of clicking. Wrap the validate/review/decide block in a while loop. On each iteration we re-fetch the selected rows from the metatable (Edit mutates them) and re-run validate, so: - if all rows are now clean, the loop exits and creation proceeds silently; - if some are still invalid, the report dialog reopens with fresh isValid/isDocument and an updated count so the user can keep correcting until satisfied; - clicking 'Create N document(s)' breaks out and proceeds with the rows that passed; - Cancel / Close / X return without creating anything. Also tightens the button-set logic: 'Edit invalid first' is only offered when at least one row is actually invalid. Previously the nToCreate>0 branch always included the Edit button regardless, so a selection where everything was valid but some were already documented would still surface a dead Edit option.

Three small fixes around the per-row Validate object method: 1. Focus. Subject Validate's report uifigure went behind the main MATLAB window because NANSEN's task runner prints "Task completed: validate" to the command window after the method returns and the command window steals focus. Force the report forward right after the figure is built (drawnow + figure(reportFig)) in both the interactive and single-button paths. 2. Already-documented rows. Subject Validate now passes IsDocumented to the report, so already-documented rows render in blue with the "Already has a document" message instead of being indistinguishable from rows pending creation. Symmetric change for File Validate even though files can't be edited. 3. Edit loop. Subject Validate now offers an "Edit invalid" button when any selected row is invalid, hands the invalid rows to the Subject Edit object method, and re-fetches/re-validates after the edit finishes. The user can iterate edits until everything is clean and click Close. File Validate stays informational because there's no file edit object method.

NANSEN PR #77 (VervaekeLab/NANSEN) split MetaTableViewer into a legacy Java backend (pre-R2025a) and a modern uifigure/EntityTable backend (R2025a+). The modern path requires the entity-table package on the MATLAB path; NANSEN resolves it via nansen.util.ensureEntityTableOnPath, which probes a fixed set of candidate paths including <nansenRoot>/../entity-table. Clone entity-table as a sibling of the other repos in install.m and sync it on each ndi.nansen.startup so the modern backend loads without requiring users to re-run nansen_install. Mirror the checkout in both CI jobs so R2025a runs do not break when the viewer initializes. Pulakat's tablevariable classes only override getCellDisplayString with plain text, so they inherit the abstract default for the new getCellDisplayStringForContext hook and render correctly under both backends with no per-context branching.

codecov · 2026-05-06T18:59:02Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

Flag	Coverage Δ
cloud	`5.96% <ø> (-0.54%)`	⬇️
offline	`21.95% <ø> (-1.76%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.
see 33 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

ProjectTestCase.createFreshProject and CreateProject.createProject both mkdir-ed the project path before calling nansen.config.project.ProjectManager.createProject. The current NANSEN dev branch refuses to write into a pre-existing folder ("Can not create project because a folder already exist in this location"), so every test inheriting from ProjectTestCase errored in setup. Pass a non-existent tempname-based path and let createProject create the folder itself. Surfaced in CI on the matlab-2025a-compatibility branch but unrelated to the entity-table changes.

claude and others added 26 commits April 29, 2026 18:51

Update mergeMetaTableCells.m

e6d4726

Replace class(metaTable) with metaTable.MetaTableClass because class() method override of MetaTable was removed in NANSEN

fix: use metatable catalog path resolver

cc96d84

- Use getMetaTableFilePath of MetaTableCatalog instead of entry.SavePath, entry.FileName because entry.SavePath was removed in Nansen - Use isfile instead of exist(filename, 'file') for better performance

Merge pull request #40 from Waltham-Data-Science/claude/fix-pr39-test…

ecd9237

…s-xfNaj Replace removed addMissingVarsToMetaTable in merge.m

Merge pull request #39 from Waltham-Data-Science/adapt-to-nansen-meta…

c8b57a7

…table-refactor Fix: Adapt to NANSEN `MetaTable` refactor

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clone entity-table for NANSEN R2025a+ table backend#42

Clone entity-table for NANSEN R2025a+ table backend#42
jesshaley wants to merge 27 commits into
mainfrom
claude/matlab-2025a-compatibility-bjclg

jesshaley commented May 6, 2026

Uh oh!

codecov Bot commented May 6, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jesshaley commented May 6, 2026

Summary

What changed

Why no source changes in pulakat

Test plan

Uh oh!

codecov Bot commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov Bot commented May 6, 2026 •

edited

Loading