Clone entity-table for NANSEN R2025a+ table backend#42
Draft
jesshaley wants to merge 27 commits into
Draft
Conversation
project_info.subjectFileColumns describes two distinct sets of
columns that the import flow conflated:
* Genuine source-CSV columns (Strain, BiologicalSex, Animal,
Cage, Label, Treatment, ...) that the user supplies in
animal_mapping.csv.
* Auto-computed columns (ElectronicFileName, added by
tableFromFile itself; SubjectIdentifier, added by
ndi.nansen.import.subject) that documents.m needs to know
about for ontologyTableRow document creation but that are
never present in the source file.
Listing the auto-computed pair in subjectFileColumns produced two
problems on Session > Import > Subjects > File:
1. validateTable warned every time about missing
ElectronicFileName / SubjectIdentifier — a spurious warning
because those columns are by design not in the CSV.
2. setvartype and SelectedVariableNames threw "Unknown variable
name: 'SubjectIdentifier'", aborting the import.
tableFromFile now distinguishes the two sets:
* Pass only the genuinely-expected-in-CSV columns to
validateTable (filtered by post-rename name not appearing in
the AUTO_ADDED list).
* Use intersect with importOptions.VariableNames to scope
setvartype and SelectedVariableNames to columns the file
actually carries.
* Rename only the columns that ended up in the stacked table.
documents.m's ontology-row logic still reads the full
subjectFileColumns list, so SubjectIdentifier and ElectronicFileName
remain part of the document schema — the patch is contained to
the CSV-shape checks.
Replaces the hardcoded AUTO_ADDED list in tableFromFile with a
proper "auto" boolean on each project_info column entry. Two
fields in pulakat's project_info.json now declare their origin
explicitly:
* SubjectIdentifier — auto: true (added by
ndi.nansen.import.subject)
* ElectronicFileName — auto: true (added by
ndi.nansen.import.subject.tableFromFile)
* FileIdentifier in dataFileColumns — auto: true (added by
ndi.nansen.import.file)
* everything else — auto: false (genuine source CSV column)
The flag is read via a new ndi.nansen.fun.isAutoColumn helper that
defaults to false when the field is absent, so older project_info
files (and the testlab unit-test fixture) keep working without
changes. Imports use the mask to skip auto columns when validating
CSV shape; documents.m and ontologyTableRow.m read the full
column list because the auto columns are present in the
post-import table.
manual.m's input dialog now also skips auto columns — the user
shouldn't be prompted for a SubjectIdentifier or
ElectronicFileName since those get populated by import code, not
by the operator.
FunHelpers gains two regression cases:
* isAutoColumnReadsFlag — mixed entries return the right mask.
* isAutoColumnDefaultsFalseWhenFieldMissing — older project_info
files without the auto field continue to work.
nansen.metadata.MetaTable.updateTableVariable invokes Cloud.update
once per metatable row. The Cloud tablevariable's update method
calls ndi.nansen.fun.getCloudStatus, which previously did two
heavyweight operations per call:
* ndi.nansen.fun.datasetID2Object(datasetID) — re-opens the NDI
dataset from disk via ndi.dataset.dir, parsing config files and
initializing the document store.
* ndi.nansen.sync.status(dataset) — reads the sync index file and
enumerates every local document, then computes the upload diff.
For a Pulakat-scale Subject metatable (300 rows) that's ~600 disk-
heavy operations per Cloud column refresh; clicking Document or
Import-and-refresh triggers a multi-minute hang. The earlier missing-
columns fix (which made addMissingVarsToMetaTable add Cloud as a
real column with HasUpdateFunction) exposed this latent perf issue.
Add a persistent cache keyed by DatasetIdentifier with a 30-second
TTL. Same approach as listUniqueMetaTableValues and
countUniqueMetaTableValues. The TTL is short enough that a Sync
click followed by a metatable refresh picks up the new cloud state
on the next pass; `clear functions` is the explicit reset for
anything tighter.
ndi.nansen.startup runs ndi.cloud.sync.downloadNew against an
already-downloaded local dataset, which calls
ndi.cloud.api.files.waitForAllBulkUploads. That polling hangs when
the server thinks an upload is still in flight (e.g. after a
force-quit MATLAB left an upload half-committed) and there is no
clean way to interrupt it once it has started.
New SkipCloudSync option (default false) on ndi.nansen.startup and
pulakat.startup lets the caller bypass downloadNew when needed:
pulakat.startup('SkipCloudSync', true)
When set, the local dataset loads as-is, the GUI launches against
the local state, and the dataset Sync table-method action remains
the explicit way to reconcile with the cloud (still does the full
round-trip when clicked). The initial dataset download is
unaffected — SkipCloudSync only suppresses the incremental sync
against an existing local copy.
When the caller asks to skip cloud sync at startup, also short-circuit the ndi.cloud.testLogin verification + ndi.cloud.uilogin prompt. The same root cause that motivates SkipCloudSync (testLogin or downloadNew hanging against a flaky cloud) makes the login dance pointless and adds another way to hang at startup. Bundling them under one flag keeps the escape hatch effective.
User is fixing testLogin behaviour upstream in NDI-matlab. SkipCloudSync goes back to its narrower meaning: skip ndi.cloud.sync.downloadNew only. testLogin / uilogin run on every startup regardless of the flag, so users who genuinely aren't authenticated still get the login prompt.
When the dataset and a session share a directory (the common case), ndi.session.dir(path) without a reference creates a brand-new session object with a freshly-minted identifier instead of opening the existing session at that path. The metatable was populated via dataset.open_session(id), which sets .identifier to the persistent value, so subsequent lookups by ndi.session.dir(path).id never matched anything in the metatable — Subject and File imports were filtering on a SessionIdentifier that didn't exist anywhere. Fix: pass SessionName as the reference everywhere we construct an ndi.session.dir from a path. Six callsites updated: - +sessionmethod/+methods/+import/+data/file.m, folder.m - +sessionmethod/+methods/+import/+subjects/file.m, manual.m - +objectmethod/+subject/+methods/document.m - +objectmethod/+file/+methods/document.m The two document.m sites loop over unique session paths from a working table; they now also look up the parallel SessionName from that table. ndi.nansen.import.session already passed (sessionName, sessionPath) so it was correct.
Replace class(metaTable) with metaTable.MetaTableClass because class() method override of MetaTable was removed in NANSEN
- Use getMetaTableFilePath of MetaTableCatalog instead of entry.SavePath, entry.FileName because entry.SavePath was removed in Nansen - Use isfile instead of exist(filename, 'file') for better performance
…riables NANSEN PR #78 removed MetaTable.addMissingVarsToMetaTable in favour of Project.synchronizeMetaTableVariables, which derives the table type from the metatable itself. PR #39 caught getMetaTableFilePath and MetaTableClass but missed this third call site, so MetatableMerge tests fail in CI with an undefined-method error against NANSEN dev.
…s-xfNaj Replace removed addMissingVarsToMetaTable in merge.m
…table-refactor Fix: Adapt to NANSEN `MetaTable` refactor
NANSEN PR #78 changed MetaTable.updateTableVariable from looking up
the project's TableVariable update function internally to taking it
as an explicit third argument:
updateTableVariable(obj, variableName, tableRowIndices, updateFunction)
The new signature defaults updateFunction to function_handle.empty()
and unconditionally invokes it, so any 2-arg call now errors with
"Invoked function handles must be scalar." This fires from
ndi.nansen.metatable.updateAll's second pass during pulakat.startup
and trips the GUI-launch guard in ndi.nansen.startup.
Look up UpdateFunctionName from the project's TableVariable table at
each call site (matching NANSEN's own addMissingTableVariables) and
pass str2func(name) as the third argument.
ndi.nansen.import.subject line 155 assigned the raw return of
ndi.nansen.fun.getMetaTableValue (a char like 'DS-123' or '') to
subjectTable_new{:,'DatasetIdentifier'}. When subjectTable_new has
more than one row, MATLAB can't broadcast a 1xM char to N rows of a
cell-typed column and errors with "the number of rows must match
the height of the table." The three lines just above already wrap
their scalar values in { } for exactly this reason; this aligns
line 155 with that pattern.
When a dataset and a session share a directory (the empty dataset-default 'pulakat' session sitting on top of the same path as the active 'test_2026' session), ndi.session.dir(reference, path) silently resolves to the dataset's default session. Downstream code in subject/file/data imports then writes the dataset session's id into SessionIdentifier columns, producing duplicate Subject rows under the wrong session and orphaning per-session metadata. ndi.session.dir takes an undocumented third positional arg -- session_id -- that pins resolution to a specific session. The GUI's sessionObject already carries SessionIdentifier as the source of truth, so feed it in. An assertion right after the constructor ensures any future regression in NDI's resolution fails loud rather than misroutes silently. Applied to all four sessionmethod entry points that opened an existing session via sessionObject: - import/subjects/file.m (the original repro path) - import/subjects/manual.m - import/data/file.m - import/data/folder.m The two +objectmethod/document.m loops use unique(SessionPath) and need a different shape; left for a follow-up.
The +objectmethod/document.m loops in subject and file did unique(SessionPath) and pulled SessionName from the first matching row, then called ndi.session.dir(name, path). When a dataset and a session share a directory (the empty 'pulakat' default session sitting on top of the same path as the active session), unique-by- path collapsed both onto a single iteration and ndi.session.dir silently resolved to the dataset's default session -- documents got attached to the wrong session. Group by SessionIdentifier instead; that's the actual identity of a session in the metatable. Pull the path/name for the call from any matching row, and pass the id as ndi.session.dir's third arg (undocumented but supported) so the path-collision case can't misroute. Assert the resolved id matches.
The orphan subject rows produced by the earlier session-misrouting bug carry an empty DatasetIdentifier (a 0x1 cell), because getMetaTableValue returned '' when the row was first inserted and the import wrapped that empty into the cell-typed column. Selecting one of those rows for removal hit datasetID2Object's text-scalar validator, blocking the very cleanup needed to delete the orphans. Read DatasetIdentifier from the project's Dataset metatable instead of the selected subject row -- a project has exactly one Dataset entry, and its identifier is always populated. The dataset object here is only used for the post-remove metatable.update calls, which don't care which subject row supplied the id. Other object-method paths (edit, document, mergeDuplicates) still pull DatasetIdentifier from the selected row; they can hit the same error if the user happens to act on an orphan first, but the immediate need is to unblock cleanup. Once orphans are removed they fall out of every other path.
…ough it
A datasetID column on a metatable row can be empty -- orphan rows
produced before their parent Session was registered carry a 0x1 cell.
Until now, every code path that opened the dataset object via
ndi.nansen.fun.datasetID2Object(<row>.DatasetIdentifier) crashed when
the user happened to act on an orphan: the validator on
datasetID2Object rejects empty input and the operation aborts before
any cleanup can run.
Add ndi.nansen.fun.resolveDatasetID, which returns the candidate
identifier when it is a non-empty char/string (unwrapping a 1x1 cell
on the way) and falls back to the DatasetIdentifier stored in the
project's Dataset metatable otherwise. A project has exactly one
Dataset entry, so the fallback is unambiguous.
Wrap every per-row datasetID2Object call site through resolveDatasetID:
Object methods:
- +subject/+methods/{remove, edit, mergeDuplicates, document}.m
- +subject/+methods/+export/files.m
- +file/+methods/{document, export}.m
Session methods:
- +methods/remove.m
- +methods/+export/files.m
- +methods/+import/+subjects/{file, manual}.m
- +methods/+import/+data/{file, folder}.m
Table variables:
- +tablevariable/+{subject,file,session}/DateAdded.m
The Dataset table-variable callsites (+tablevariable/+dataset/*) are
left alone -- they iterate the Dataset row itself, whose
DatasetIdentifier is by construction never empty.
getCloudStatus is the one exception: an orphan row's Cloud value can't
be looked up in any sync table, so it short-circuits to the default
value when DatasetIdentifier is empty rather than reaching for the
project's dataset (which would return a status for the wrong row).
Two bugs and a UX change in subject and file Document object methods:
1. isDocument was computed as
isequal(table.<X>DocumentIdentifier, {'N/A'})
which compares the entire cell column against a length-1 cell. That
is only true for a single-row table; for any multi-row selection it
returns scalar false, so ~isDocument is scalar true and every row
(including already-documented ones) reaches documents(). The
upstream NDI framework then runs createOntologyTableRowDoc, which
has its own bug: it calls isnan() on options.OldDocs{1} before
checking isa(..., 'ndi.document'), so as soon as a pre-existing
document is reached the run errors out.
Use per-row strcmp(..., 'N/A') so already-documented rows are
correctly excluded from the loop. Even with the upstream isnan
bug unfixed, this stops triggering it in practice.
2. Surface validation up front. The previous flow validated, ran
documents() against the surviving rows, and only showed the
validation report at the end. A user with bad inputs would learn
about it after partial work was already on disk. Now: if any row
is invalid or already documented, prompt with a count breakdown
and Continue/Cancel before any document is created. Cancel shows
the detailed report and returns; Continue proceeds with the valid,
not-yet-documented rows. When everything is clean the prompt is
skipped and the original silent path runs.
When nothing can be created (every selected row is invalid or
already documented), show an informational dialog instead of
pretending to do work.
The upstream isnan-on-ndi.document bug at
ndi.setup.NDIMaker.tableDocMaker.createOntologyTableRowDoc:129 is left
for a separate VH-Lab/NDI-matlab patch.
The earlier prompt was a plain Continue/Cancel without surfacing what
exactly was wrong. Users couldn't tell from the count breakdown which
fields on which rows were missing -- they had to cancel and discover
the report dialog separately.
Restructure: when any selected row is invalid or already documented,
show the validation report dialog first (so the user sees row-level
error messages with red/green styling), then prompt:
- "Edit invalid first" -- close, return; user fixes and re-triggers
- "Create valid documents" -- proceed with isValid & ~isDocument
- "Cancel" -- close, return
When every selected row is invalid or already documented, the prompt
collapses to a single OK dialog ("Nothing to create. Edit the invalid
subjects and try again."). When everything is clean (no invalid, no
already-documented), the report and prompt are both skipped and the
old silent path runs through to creation.
Drop the post-creation showValidationReport call. The up-front report
covers the "here's what's wrong" surface; the metatable's cell
updates after creation cover the "here's what we did" confirmation.
A trailing report on the all-clean path was a redundant "All N
subjects are valid" toast.
…eport
Two related changes to the Document flow:
1. Show whether each row already has a document. The validation report
now takes an optional IsDocumented logical (same length as isValid)
and renders rows in three colors instead of two:
- blue-tinted: already documented; ErrorMessage replaced with
"Already has a document"
- red-tinted: invalid; ErrorMessage from the report table
- green-tinted: valid and ready for creation
The note under the header now mentions that documented rows are
listed for context and will be skipped, so users can tell at a
glance which rows would be acted on.
2. Move the decision into the report window. Previously the dialog
showed the table and a Close button, then the calling code popped
a separate questdlg with three options. Now showValidationReport
accepts a Buttons cell and a Default highlight; with two or more
buttons it becomes a blocking modal and returns the chosen text.
The single-button form ({'Close'}) keeps the legacy non-blocking
behaviour, including the "All selected subjects are valid" alert
on the all-clean path.
The +subject/ and +file/ Document object methods now drive everything
through one showValidationReport call:
- dirty (any invalid or any already documented):
buttons = {Edit invalid first, Create N document(s), Cancel}
Default = Edit invalid first
Cancel via title-bar X resolves to "Cancel"
- dirty but nothing to create (only invalid + documented):
buttons = {Close}
- everything clean: skip the dialog, proceed silently.
Title-bar X mapping uses any Cancel/Close button in the list when
present so the X behaves the same as the explicit cancel.
Previously 'Edit invalid first' just returned from Document, leaving the user to re-select the invalid rows manually and click Edit themselves. Now Document hands the invalid subject objects directly to pulakat.objectmethod.subject.methods.edit, which opens its own import-table GUI populated with only those rows. After edit returns, Document returns; the user re-issues Document once their changes settle. Match invalid rows back to subjectObject by SubjectIdentifier rather than positional index because metaTable.getEntry doesn't promise to return rows in the requested order, so subjectTable's row order may differ from subjectObject's. When nToCreate == 0 but there are still invalid rows to fix, offer 'Edit invalid' / 'Cancel' instead of just 'Close', so the path to fixing things is one click away even when nothing is creatable. File Document drops 'Edit invalid first' entirely because file rows are derived from on-disk file headers and there's no row-edit object method for them; the dirty-row buttons collapse to 'Create N' / 'Cancel'.
The "Failed to update variable Cloud / Dot indexing is not supported" warning fires somewhere inside the lookup path but the trace doesn't say where -- NANSEN's variable-update wrapper swallows the stack and just reports the message text. Move the lookup body into an inner helper, wrap the call in try/catch, and on failure print obj's class and field names so the next occurrence is diagnosable instead of opaque. Returns the default value on failure rather than letting the exception propagate as a row-update warning. Also harden the document-id strcmp at the bottom: unwrap a 1x1 cell DocumentIdentifier and skip the lookup if the value isn't text. This covers an orphan-style row where the per-row DocumentIdentifier is empty or a non-text default.
After clicking 'Edit invalid first' the user fixed identifying fields
and the editor closed, but Document then returned -- forcing them to
manually re-trigger Document and walk through the dialog a second
time to confirm the edits resolved the issues. With several rounds of
fixes per import that's a lot of clicking.
Wrap the validate/review/decide block in a while loop. On each
iteration we re-fetch the selected rows from the metatable (Edit
mutates them) and re-run validate, so:
- if all rows are now clean, the loop exits and creation proceeds
silently;
- if some are still invalid, the report dialog reopens with fresh
isValid/isDocument and an updated count so the user can keep
correcting until satisfied;
- clicking 'Create N document(s)' breaks out and proceeds with the
rows that passed;
- Cancel / Close / X return without creating anything.
Also tightens the button-set logic: 'Edit invalid first' is only
offered when at least one row is actually invalid. Previously the
nToCreate>0 branch always included the Edit button regardless, so a
selection where everything was valid but some were already documented
would still surface a dead Edit option.
Three small fixes around the per-row Validate object method: 1. Focus. Subject Validate's report uifigure went behind the main MATLAB window because NANSEN's task runner prints "Task completed: validate" to the command window after the method returns and the command window steals focus. Force the report forward right after the figure is built (drawnow + figure(reportFig)) in both the interactive and single-button paths. 2. Already-documented rows. Subject Validate now passes IsDocumented to the report, so already-documented rows render in blue with the "Already has a document" message instead of being indistinguishable from rows pending creation. Symmetric change for File Validate even though files can't be edited. 3. Edit loop. Subject Validate now offers an "Edit invalid" button when any selected row is invalid, hands the invalid rows to the Subject Edit object method, and re-fetches/re-validates after the edit finishes. The user can iterate edits until everything is clean and click Close. File Validate stays informational because there's no file edit object method.
NANSEN PR #77 (VervaekeLab/NANSEN) split MetaTableViewer into a legacy Java backend (pre-R2025a) and a modern uifigure/EntityTable backend (R2025a+). The modern path requires the entity-table package on the MATLAB path; NANSEN resolves it via nansen.util.ensureEntityTableOnPath, which probes a fixed set of candidate paths including <nansenRoot>/../entity-table. Clone entity-table as a sibling of the other repos in install.m and sync it on each ndi.nansen.startup so the modern backend loads without requiring users to re-run nansen_install. Mirror the checkout in both CI jobs so R2025a runs do not break when the viewer initializes. Pulakat's tablevariable classes only override getCellDisplayString with plain text, so they inherit the abstract default for the new getCellDisplayStringForContext hook and render correctly under both backends with no per-context branching.
Codecov Report✅ All modified and coverable lines are covered by tests.
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
ProjectTestCase.createFreshProject and CreateProject.createProject
both mkdir-ed the project path before calling
nansen.config.project.ProjectManager.createProject. The current
NANSEN dev branch refuses to write into a pre-existing folder
("Can not create project because a folder already exist in this
location"), so every test inheriting from ProjectTestCase errored in
setup. Pass a non-existent tempname-based path and let
createProject create the folder itself.
Surfaced in CI on the matlab-2025a-compatibility branch but
unrelated to the entity-table changes.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adapts nansen-pulakat to VervaekeLab/NANSEN PR #77, which modernizes NANSEN's
MetaTableViewerfor MATLAB R2025a+. The PR splits the viewer into a legacy Java backend (pre-R2025a) and a modern uifigure/EntityTable backend (R2025a+); the modern path requires the newehennestad/entity-tablepackage on the MATLAB path. Pre-R2025a installs continue to use the legacy path with no behavioral change.What changed
install.m— cloneentity-tableas a sibling of the other repos under~/ndi/tools/(step 5b, between NANSEN clone andnansen_install). NANSEN'snansen.util.ensureEntityTableOnPathprobes<nansenRoot>/../entity-tableas one of its candidate paths, so the sibling clone is automatically discovered.src/ndi/+ndi/+nansen/startup.m— syncentity-tablealongside NANSEN/openMINDS/NDI-matlab on eachndi.nansen.startupinvocation so updates flow through the same git pull mechanism as the rest of the upstream repos..github/workflows/matlab-tests.yml— addactions/checkout@v4ofehennestad/entity-tableandaddpath(genpath('entity-table/src'))in both theoffline-testsandcloud-testsjobs. Necessary so the cloudStartuptest (which launches the Nansen GUI) does not break under R2025a, and as defense-in-depth for the offline plugin-discovery tests.CLAUDE.md— add entity-table to the upstream-repos table, the install-flow clone order, the cross-repo coupling table, and a new "MATLAB version coupling" section that documentsgetCellDisplayStringForContextand notes that pulakat's plain-text tablevariables work under both backends without per-context branches.Why no source changes in pulakat
The new API hook
nansen.metadata.abstract.TableColumnFormatter.getCellDisplayStringForContext(obj, displayContext)is additive — the abstract base class supplies a default that delegates to the existinggetCellDisplayString(). Pulakat's+tablevariable/*classes override onlygetCellDisplayStringand emit plain strings (no<html>markup, nogetIconHtmlStringicon embeds), so they render correctly under the modern uifigure backend through the default fallback. If a future formatter ever needs HTML/icon rendering it should overridegetCellDisplayStringForContextdirectly, as documented in CLAUDE.md.Test plan
latestMATLAB (default) — entity-table clone + addpath should be a no-op for plugin-discovery testsR2025ato confirm the modern table backend loads (noNANSEN:MissingEntityTableerror)R2024bto confirm pre-R2025a path still passesinstall.mend-to-end on a clean machine; verify~/ndi/tools/entity-table/is created and a subsequentpulakat.startupsyncs it without errorsStartuptest passes against R2025a (when secrets are configured)https://claude.ai/code/session_01S7oxviMzuwcExHpW2J1eEs
Generated by Claude Code