Skip to content

Clone entity-table for NANSEN R2025a+ table backend#42

Draft
jesshaley wants to merge 27 commits into
mainfrom
claude/matlab-2025a-compatibility-bjclg
Draft

Clone entity-table for NANSEN R2025a+ table backend#42
jesshaley wants to merge 27 commits into
mainfrom
claude/matlab-2025a-compatibility-bjclg

Conversation

@jesshaley
Copy link
Copy Markdown
Contributor

Summary

Adapts nansen-pulakat to VervaekeLab/NANSEN PR #77, which modernizes NANSEN's MetaTableViewer for MATLAB R2025a+. The PR splits the viewer into a legacy Java backend (pre-R2025a) and a modern uifigure/EntityTable backend (R2025a+); the modern path requires the new ehennestad/entity-table package on the MATLAB path. Pre-R2025a installs continue to use the legacy path with no behavioral change.

What changed

  • install.m — clone entity-table as a sibling of the other repos under ~/ndi/tools/ (step 5b, between NANSEN clone and nansen_install). NANSEN's nansen.util.ensureEntityTableOnPath probes <nansenRoot>/../entity-table as one of its candidate paths, so the sibling clone is automatically discovered.
  • src/ndi/+ndi/+nansen/startup.m — sync entity-table alongside NANSEN/openMINDS/NDI-matlab on each ndi.nansen.startup invocation so updates flow through the same git pull mechanism as the rest of the upstream repos.
  • .github/workflows/matlab-tests.yml — add actions/checkout@v4 of ehennestad/entity-table and addpath(genpath('entity-table/src')) in both the offline-tests and cloud-tests jobs. Necessary so the cloud Startup test (which launches the Nansen GUI) does not break under R2025a, and as defense-in-depth for the offline plugin-discovery tests.
  • CLAUDE.md — add entity-table to the upstream-repos table, the install-flow clone order, the cross-repo coupling table, and a new "MATLAB version coupling" section that documents getCellDisplayStringForContext and notes that pulakat's plain-text tablevariables work under both backends without per-context branches.

Why no source changes in pulakat

The new API hook nansen.metadata.abstract.TableColumnFormatter.getCellDisplayStringForContext(obj, displayContext) is additive — the abstract base class supplies a default that delegates to the existing getCellDisplayString(). Pulakat's +tablevariable/* classes override only getCellDisplayString and emit plain strings (no <html> markup, no getIconHtmlString icon embeds), so they render correctly under the modern uifigure backend through the default fallback. If a future formatter ever needs HTML/icon rendering it should override getCellDisplayStringForContext directly, as documented in CLAUDE.md.

Test plan

  • Offline CI on latest MATLAB (default) — entity-table clone + addpath should be a no-op for plugin-discovery tests
  • Manually trigger workflow with R2025a to confirm the modern table backend loads (no NANSEN:MissingEntityTable error)
  • Manually trigger workflow with R2024b to confirm pre-R2025a path still passes
  • Run install.m end-to-end on a clean machine; verify ~/ndi/tools/entity-table/ is created and a subsequent pulakat.startup syncs it without errors
  • Cloud Startup test passes against R2025a (when secrets are configured)

https://claude.ai/code/session_01S7oxviMzuwcExHpW2J1eEs


Generated by Claude Code

claude and others added 26 commits April 29, 2026 18:51
project_info.subjectFileColumns describes two distinct sets of
columns that the import flow conflated:

  * Genuine source-CSV columns (Strain, BiologicalSex, Animal,
    Cage, Label, Treatment, ...) that the user supplies in
    animal_mapping.csv.
  * Auto-computed columns (ElectronicFileName, added by
    tableFromFile itself; SubjectIdentifier, added by
    ndi.nansen.import.subject) that documents.m needs to know
    about for ontologyTableRow document creation but that are
    never present in the source file.

Listing the auto-computed pair in subjectFileColumns produced two
problems on Session > Import > Subjects > File:

  1. validateTable warned every time about missing
     ElectronicFileName / SubjectIdentifier — a spurious warning
     because those columns are by design not in the CSV.
  2. setvartype and SelectedVariableNames threw "Unknown variable
     name: 'SubjectIdentifier'", aborting the import.

tableFromFile now distinguishes the two sets:

  * Pass only the genuinely-expected-in-CSV columns to
    validateTable (filtered by post-rename name not appearing in
    the AUTO_ADDED list).
  * Use intersect with importOptions.VariableNames to scope
    setvartype and SelectedVariableNames to columns the file
    actually carries.
  * Rename only the columns that ended up in the stacked table.

documents.m's ontology-row logic still reads the full
subjectFileColumns list, so SubjectIdentifier and ElectronicFileName
remain part of the document schema — the patch is contained to
the CSV-shape checks.
Replaces the hardcoded AUTO_ADDED list in tableFromFile with a
proper "auto" boolean on each project_info column entry. Two
fields in pulakat's project_info.json now declare their origin
explicitly:

  * SubjectIdentifier — auto: true (added by
    ndi.nansen.import.subject)
  * ElectronicFileName — auto: true (added by
    ndi.nansen.import.subject.tableFromFile)
  * FileIdentifier in dataFileColumns — auto: true (added by
    ndi.nansen.import.file)
  * everything else — auto: false (genuine source CSV column)

The flag is read via a new ndi.nansen.fun.isAutoColumn helper that
defaults to false when the field is absent, so older project_info
files (and the testlab unit-test fixture) keep working without
changes. Imports use the mask to skip auto columns when validating
CSV shape; documents.m and ontologyTableRow.m read the full
column list because the auto columns are present in the
post-import table.

manual.m's input dialog now also skips auto columns — the user
shouldn't be prompted for a SubjectIdentifier or
ElectronicFileName since those get populated by import code, not
by the operator.

FunHelpers gains two regression cases:
  * isAutoColumnReadsFlag — mixed entries return the right mask.
  * isAutoColumnDefaultsFalseWhenFieldMissing — older project_info
    files without the auto field continue to work.
nansen.metadata.MetaTable.updateTableVariable invokes Cloud.update
once per metatable row. The Cloud tablevariable's update method
calls ndi.nansen.fun.getCloudStatus, which previously did two
heavyweight operations per call:

  * ndi.nansen.fun.datasetID2Object(datasetID) — re-opens the NDI
    dataset from disk via ndi.dataset.dir, parsing config files and
    initializing the document store.
  * ndi.nansen.sync.status(dataset) — reads the sync index file and
    enumerates every local document, then computes the upload diff.

For a Pulakat-scale Subject metatable (300 rows) that's ~600 disk-
heavy operations per Cloud column refresh; clicking Document or
Import-and-refresh triggers a multi-minute hang. The earlier missing-
columns fix (which made addMissingVarsToMetaTable add Cloud as a
real column with HasUpdateFunction) exposed this latent perf issue.

Add a persistent cache keyed by DatasetIdentifier with a 30-second
TTL. Same approach as listUniqueMetaTableValues and
countUniqueMetaTableValues. The TTL is short enough that a Sync
click followed by a metatable refresh picks up the new cloud state
on the next pass; `clear functions` is the explicit reset for
anything tighter.
ndi.nansen.startup runs ndi.cloud.sync.downloadNew against an
already-downloaded local dataset, which calls
ndi.cloud.api.files.waitForAllBulkUploads. That polling hangs when
the server thinks an upload is still in flight (e.g. after a
force-quit MATLAB left an upload half-committed) and there is no
clean way to interrupt it once it has started.

New SkipCloudSync option (default false) on ndi.nansen.startup and
pulakat.startup lets the caller bypass downloadNew when needed:

    pulakat.startup('SkipCloudSync', true)

When set, the local dataset loads as-is, the GUI launches against
the local state, and the dataset Sync table-method action remains
the explicit way to reconcile with the cloud (still does the full
round-trip when clicked). The initial dataset download is
unaffected — SkipCloudSync only suppresses the incremental sync
against an existing local copy.
When the caller asks to skip cloud sync at startup, also short-circuit the ndi.cloud.testLogin verification + ndi.cloud.uilogin prompt. The same root cause that motivates SkipCloudSync (testLogin or downloadNew hanging against a flaky cloud) makes the login dance pointless and adds another way to hang at startup. Bundling them under one flag keeps the escape hatch effective.
User is fixing testLogin behaviour upstream in NDI-matlab. SkipCloudSync goes back to its narrower meaning: skip ndi.cloud.sync.downloadNew only. testLogin / uilogin run on every startup regardless of the flag, so users who genuinely aren't authenticated still get the login prompt.
When the dataset and a session share a directory (the common case), ndi.session.dir(path) without a reference creates a brand-new session object with a freshly-minted identifier instead of opening the existing session at that path. The metatable was populated via dataset.open_session(id), which sets .identifier to the persistent value, so subsequent lookups by ndi.session.dir(path).id never matched anything in the metatable — Subject and File imports were filtering on a SessionIdentifier that didn't exist anywhere.

Fix: pass SessionName as the reference everywhere we construct an ndi.session.dir from a path. Six callsites updated:

  - +sessionmethod/+methods/+import/+data/file.m, folder.m

  - +sessionmethod/+methods/+import/+subjects/file.m, manual.m

  - +objectmethod/+subject/+methods/document.m

  - +objectmethod/+file/+methods/document.m

The two document.m sites loop over unique session paths from a working table; they now also look up the parallel SessionName from that table. ndi.nansen.import.session already passed (sessionName, sessionPath) so it was correct.
Replace class(metaTable) with metaTable.MetaTableClass because class() method override of MetaTable was removed in NANSEN
- Use getMetaTableFilePath of MetaTableCatalog instead of entry.SavePath, entry.FileName because entry.SavePath was removed in Nansen
- Use isfile instead of exist(filename, 'file') for better performance
…riables

NANSEN PR #78 removed MetaTable.addMissingVarsToMetaTable in favour of
Project.synchronizeMetaTableVariables, which derives the table type from
the metatable itself. PR #39 caught getMetaTableFilePath and
MetaTableClass but missed this third call site, so MetatableMerge tests
fail in CI with an undefined-method error against NANSEN dev.
…s-xfNaj

Replace removed addMissingVarsToMetaTable in merge.m
…table-refactor

Fix: Adapt to NANSEN `MetaTable` refactor
NANSEN PR #78 changed MetaTable.updateTableVariable from looking up
the project's TableVariable update function internally to taking it
as an explicit third argument:

    updateTableVariable(obj, variableName, tableRowIndices, updateFunction)

The new signature defaults updateFunction to function_handle.empty()
and unconditionally invokes it, so any 2-arg call now errors with
"Invoked function handles must be scalar." This fires from
ndi.nansen.metatable.updateAll's second pass during pulakat.startup
and trips the GUI-launch guard in ndi.nansen.startup.

Look up UpdateFunctionName from the project's TableVariable table at
each call site (matching NANSEN's own addMissingTableVariables) and
pass str2func(name) as the third argument.
ndi.nansen.import.subject line 155 assigned the raw return of
ndi.nansen.fun.getMetaTableValue (a char like 'DS-123' or '') to
subjectTable_new{:,'DatasetIdentifier'}. When subjectTable_new has
more than one row, MATLAB can't broadcast a 1xM char to N rows of a
cell-typed column and errors with "the number of rows must match
the height of the table." The three lines just above already wrap
their scalar values in { } for exactly this reason; this aligns
line 155 with that pattern.
When a dataset and a session share a directory (the empty
dataset-default 'pulakat' session sitting on top of the same path
as the active 'test_2026' session), ndi.session.dir(reference, path)
silently resolves to the dataset's default session. Downstream code
in subject/file/data imports then writes the dataset session's id
into SessionIdentifier columns, producing duplicate Subject rows
under the wrong session and orphaning per-session metadata.

ndi.session.dir takes an undocumented third positional arg --
session_id -- that pins resolution to a specific session. The
GUI's sessionObject already carries SessionIdentifier as the
source of truth, so feed it in. An assertion right after the
constructor ensures any future regression in NDI's resolution
fails loud rather than misroutes silently.

Applied to all four sessionmethod entry points that opened an
existing session via sessionObject:
  - import/subjects/file.m   (the original repro path)
  - import/subjects/manual.m
  - import/data/file.m
  - import/data/folder.m

The two +objectmethod/document.m loops use unique(SessionPath) and
need a different shape; left for a follow-up.
The +objectmethod/document.m loops in subject and file did
unique(SessionPath) and pulled SessionName from the first matching
row, then called ndi.session.dir(name, path). When a dataset and a
session share a directory (the empty 'pulakat' default session
sitting on top of the same path as the active session), unique-by-
path collapsed both onto a single iteration and ndi.session.dir
silently resolved to the dataset's default session -- documents got
attached to the wrong session.

Group by SessionIdentifier instead; that's the actual identity of a
session in the metatable. Pull the path/name for the call from any
matching row, and pass the id as ndi.session.dir's third arg
(undocumented but supported) so the path-collision case can't
misroute. Assert the resolved id matches.
The orphan subject rows produced by the earlier session-misrouting bug
carry an empty DatasetIdentifier (a 0x1 cell), because getMetaTableValue
returned '' when the row was first inserted and the import wrapped that
empty into the cell-typed column. Selecting one of those rows for
removal hit datasetID2Object's text-scalar validator, blocking the very
cleanup needed to delete the orphans.

Read DatasetIdentifier from the project's Dataset metatable instead of
the selected subject row -- a project has exactly one Dataset entry, and
its identifier is always populated. The dataset object here is only
used for the post-remove metatable.update calls, which don't care which
subject row supplied the id.

Other object-method paths (edit, document, mergeDuplicates) still pull
DatasetIdentifier from the selected row; they can hit the same error if
the user happens to act on an orphan first, but the immediate need is
to unblock cleanup. Once orphans are removed they fall out of every
other path.
…ough it

A datasetID column on a metatable row can be empty -- orphan rows
produced before their parent Session was registered carry a 0x1 cell.
Until now, every code path that opened the dataset object via
ndi.nansen.fun.datasetID2Object(<row>.DatasetIdentifier) crashed when
the user happened to act on an orphan: the validator on
datasetID2Object rejects empty input and the operation aborts before
any cleanup can run.

Add ndi.nansen.fun.resolveDatasetID, which returns the candidate
identifier when it is a non-empty char/string (unwrapping a 1x1 cell
on the way) and falls back to the DatasetIdentifier stored in the
project's Dataset metatable otherwise. A project has exactly one
Dataset entry, so the fallback is unambiguous.

Wrap every per-row datasetID2Object call site through resolveDatasetID:

Object methods:
  - +subject/+methods/{remove, edit, mergeDuplicates, document}.m
  - +subject/+methods/+export/files.m
  - +file/+methods/{document, export}.m

Session methods:
  - +methods/remove.m
  - +methods/+export/files.m
  - +methods/+import/+subjects/{file, manual}.m
  - +methods/+import/+data/{file, folder}.m

Table variables:
  - +tablevariable/+{subject,file,session}/DateAdded.m

The Dataset table-variable callsites (+tablevariable/+dataset/*) are
left alone -- they iterate the Dataset row itself, whose
DatasetIdentifier is by construction never empty.

getCloudStatus is the one exception: an orphan row's Cloud value can't
be looked up in any sync table, so it short-circuits to the default
value when DatasetIdentifier is empty rather than reaching for the
project's dataset (which would return a status for the wrong row).
Two bugs and a UX change in subject and file Document object methods:

1. isDocument was computed as
       isequal(table.<X>DocumentIdentifier, {'N/A'})
   which compares the entire cell column against a length-1 cell. That
   is only true for a single-row table; for any multi-row selection it
   returns scalar false, so ~isDocument is scalar true and every row
   (including already-documented ones) reaches documents(). The
   upstream NDI framework then runs createOntologyTableRowDoc, which
   has its own bug: it calls isnan() on options.OldDocs{1} before
   checking isa(..., 'ndi.document'), so as soon as a pre-existing
   document is reached the run errors out.

   Use per-row strcmp(..., 'N/A') so already-documented rows are
   correctly excluded from the loop. Even with the upstream isnan
   bug unfixed, this stops triggering it in practice.

2. Surface validation up front. The previous flow validated, ran
   documents() against the surviving rows, and only showed the
   validation report at the end. A user with bad inputs would learn
   about it after partial work was already on disk. Now: if any row
   is invalid or already documented, prompt with a count breakdown
   and Continue/Cancel before any document is created. Cancel shows
   the detailed report and returns; Continue proceeds with the valid,
   not-yet-documented rows. When everything is clean the prompt is
   skipped and the original silent path runs.

   When nothing can be created (every selected row is invalid or
   already documented), show an informational dialog instead of
   pretending to do work.

The upstream isnan-on-ndi.document bug at
ndi.setup.NDIMaker.tableDocMaker.createOntologyTableRowDoc:129 is left
for a separate VH-Lab/NDI-matlab patch.
The earlier prompt was a plain Continue/Cancel without surfacing what
exactly was wrong. Users couldn't tell from the count breakdown which
fields on which rows were missing -- they had to cancel and discover
the report dialog separately.

Restructure: when any selected row is invalid or already documented,
show the validation report dialog first (so the user sees row-level
error messages with red/green styling), then prompt:

  - "Edit invalid first" -- close, return; user fixes and re-triggers
  - "Create valid documents" -- proceed with isValid & ~isDocument
  - "Cancel"               -- close, return

When every selected row is invalid or already documented, the prompt
collapses to a single OK dialog ("Nothing to create. Edit the invalid
subjects and try again."). When everything is clean (no invalid, no
already-documented), the report and prompt are both skipped and the
old silent path runs through to creation.

Drop the post-creation showValidationReport call. The up-front report
covers the "here's what's wrong" surface; the metatable's cell
updates after creation cover the "here's what we did" confirmation.
A trailing report on the all-clean path was a redundant "All N
subjects are valid" toast.
…eport

Two related changes to the Document flow:

1. Show whether each row already has a document. The validation report
   now takes an optional IsDocumented logical (same length as isValid)
   and renders rows in three colors instead of two:
     - blue-tinted: already documented; ErrorMessage replaced with
       "Already has a document"
     - red-tinted:  invalid; ErrorMessage from the report table
     - green-tinted: valid and ready for creation
   The note under the header now mentions that documented rows are
   listed for context and will be skipped, so users can tell at a
   glance which rows would be acted on.

2. Move the decision into the report window. Previously the dialog
   showed the table and a Close button, then the calling code popped
   a separate questdlg with three options. Now showValidationReport
   accepts a Buttons cell and a Default highlight; with two or more
   buttons it becomes a blocking modal and returns the chosen text.
   The single-button form ({'Close'}) keeps the legacy non-blocking
   behaviour, including the "All selected subjects are valid" alert
   on the all-clean path.

The +subject/ and +file/ Document object methods now drive everything
through one showValidationReport call:
  - dirty (any invalid or any already documented):
      buttons = {Edit invalid first, Create N document(s), Cancel}
      Default = Edit invalid first
      Cancel via title-bar X resolves to "Cancel"
  - dirty but nothing to create (only invalid + documented):
      buttons = {Close}
  - everything clean: skip the dialog, proceed silently.

Title-bar X mapping uses any Cancel/Close button in the list when
present so the X behaves the same as the explicit cancel.
Previously 'Edit invalid first' just returned from Document, leaving
the user to re-select the invalid rows manually and click Edit
themselves. Now Document hands the invalid subject objects directly
to pulakat.objectmethod.subject.methods.edit, which opens its own
import-table GUI populated with only those rows. After edit returns,
Document returns; the user re-issues Document once their changes
settle.

Match invalid rows back to subjectObject by SubjectIdentifier rather
than positional index because metaTable.getEntry doesn't promise to
return rows in the requested order, so subjectTable's row order may
differ from subjectObject's.

When nToCreate == 0 but there are still invalid rows to fix, offer
'Edit invalid' / 'Cancel' instead of just 'Close', so the path to
fixing things is one click away even when nothing is creatable.

File Document drops 'Edit invalid first' entirely because file rows
are derived from on-disk file headers and there's no row-edit object
method for them; the dirty-row buttons collapse to 'Create N' / 'Cancel'.
The "Failed to update variable Cloud / Dot indexing is not supported"
warning fires somewhere inside the lookup path but the trace doesn't
say where -- NANSEN's variable-update wrapper swallows the stack and
just reports the message text. Move the lookup body into an inner
helper, wrap the call in try/catch, and on failure print obj's class
and field names so the next occurrence is diagnosable instead of
opaque. Returns the default value on failure rather than letting the
exception propagate as a row-update warning.

Also harden the document-id strcmp at the bottom: unwrap a 1x1 cell
DocumentIdentifier and skip the lookup if the value isn't text. This
covers an orphan-style row where the per-row DocumentIdentifier is
empty or a non-text default.
After clicking 'Edit invalid first' the user fixed identifying fields
and the editor closed, but Document then returned -- forcing them to
manually re-trigger Document and walk through the dialog a second
time to confirm the edits resolved the issues. With several rounds of
fixes per import that's a lot of clicking.

Wrap the validate/review/decide block in a while loop. On each
iteration we re-fetch the selected rows from the metatable (Edit
mutates them) and re-run validate, so:

  - if all rows are now clean, the loop exits and creation proceeds
    silently;
  - if some are still invalid, the report dialog reopens with fresh
    isValid/isDocument and an updated count so the user can keep
    correcting until satisfied;
  - clicking 'Create N document(s)' breaks out and proceeds with the
    rows that passed;
  - Cancel / Close / X return without creating anything.

Also tightens the button-set logic: 'Edit invalid first' is only
offered when at least one row is actually invalid. Previously the
nToCreate>0 branch always included the Edit button regardless, so a
selection where everything was valid but some were already documented
would still surface a dead Edit option.
Three small fixes around the per-row Validate object method:

1. Focus. Subject Validate's report uifigure went behind the main
   MATLAB window because NANSEN's task runner prints "Task completed:
   validate" to the command window after the method returns and the
   command window steals focus. Force the report forward right after
   the figure is built (drawnow + figure(reportFig)) in both the
   interactive and single-button paths.

2. Already-documented rows. Subject Validate now passes IsDocumented
   to the report, so already-documented rows render in blue with the
   "Already has a document" message instead of being indistinguishable
   from rows pending creation. Symmetric change for File Validate
   even though files can't be edited.

3. Edit loop. Subject Validate now offers an "Edit invalid" button
   when any selected row is invalid, hands the invalid rows to the
   Subject Edit object method, and re-fetches/re-validates after the
   edit finishes. The user can iterate edits until everything is
   clean and click Close. File Validate stays informational because
   there's no file edit object method.
NANSEN PR #77 (VervaekeLab/NANSEN) split MetaTableViewer into a
legacy Java backend (pre-R2025a) and a modern uifigure/EntityTable
backend (R2025a+). The modern path requires the entity-table package
on the MATLAB path; NANSEN resolves it via
nansen.util.ensureEntityTableOnPath, which probes a fixed set of
candidate paths including <nansenRoot>/../entity-table.

Clone entity-table as a sibling of the other repos in install.m and
sync it on each ndi.nansen.startup so the modern backend loads
without requiring users to re-run nansen_install. Mirror the
checkout in both CI jobs so R2025a runs do not break when the
viewer initializes. Pulakat's tablevariable classes only override
getCellDisplayString with plain text, so they inherit the abstract
default for the new getCellDisplayStringForContext hook and render
correctly under both backends with no per-context branching.
@codecov
Copy link
Copy Markdown

codecov Bot commented May 6, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

Flag Coverage Δ
cloud 5.96% <ø> (-0.54%) ⬇️
offline 21.95% <ø> (-1.76%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.
see 33 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

ProjectTestCase.createFreshProject and CreateProject.createProject
both mkdir-ed the project path before calling
nansen.config.project.ProjectManager.createProject. The current
NANSEN dev branch refuses to write into a pre-existing folder
("Can not create project because a folder already exist in this
location"), so every test inheriting from ProjectTestCase errored in
setup. Pass a non-existent tempname-based path and let
createProject create the folder itself.

Surfaced in CI on the matlab-2025a-compatibility branch but
unrelated to the entity-table changes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants