Skip to content

Admin-only DwC-A export (taxa + localities + occurrences)#1152

Draft
karilint wants to merge 23 commits into
mainfrom
feature/1150-dwc-export
Draft

Admin-only DwC-A export (taxa + localities + occurrences)#1152
karilint wants to merge 23 commits into
mainfrom
feature/1150-dwc-export

Conversation

@karilint
Copy link
Copy Markdown
Collaborator

@karilint karilint commented Apr 23, 2026

Implements the first, admin-only Darwin Core Archive (DwC-A) exports for taxa, localities, and occurrences (v1).

Taxa export

  • Backend: GET /species/export/dwc-archive (Role.Admin only) returns a ZIP with taxon.csv, measurementorfact.csv, meta.xml, eml.xml.
  • Frontend: adds an export menu item on /species for admins only.

Locality export

  • Backend: GET /locality/export/dwc-archive (Role.Admin only) returns a ZIP with location.csv, geologicalcontext.csv, measurementorfact.csv, meta.xml, eml.xml.
  • Frontend: adds an export menu item on /localities for admins only.

Occurrence export

  • Backend: GET /occurrence/export/dwc-archive (Role.Admin only) returns a ZIP with occurrence.csv, measurementorfact.csv, location.csv, geologicalcontext.csv, taxon.csv, meta.xml, eml.xml.
  • Frontend: adds an export menu item on /occurrence for admins only.
  • now_ls MeasurementOrFact rows prefix verbatimMeasurementType values with now_ls. so fields such as body_mass, mesowear, mw_value, and microwear do not collide with com_species measurements.

Tests

  • Mapping/ZIP unit tests + admin-only API test(s).

Docs

  • documentation/functionality/dwc_export.md
  • documentation/functionality/dwc_export_localities.md
  • documentation/functionality/dwc_export_occurrences.md

Fixes #1150.

@karilint karilint changed the title Admin-only DwC-A export (taxa + measurements) Admin-only DwC-A export (taxa + localities) Apr 27, 2026
Copy link
Copy Markdown
Collaborator Author

Pushed follow-up commit ec4edd46 for the locality export field adjustments:

  • Removed now_plr project rows and now_lau last-update rows from the locality DwC-A export and Prisma selection.
  • Added explicit MeasurementOrFact rows for basin, subbasin, plant_pres, and invert_pres.
  • Confirmed/covered existing export rows for bipedal_footprints, nutrients, and pers_pollen_*.
  • Updated locality export docs and unit coverage.

Validation run locally:

  • src/unit-tests/dwcArchiveExportLocalities.test.ts
  • npm run tsc:backend
  • npm run lint:backend
  • locality export API test against isolated db-test
  • commit hook: full lint + full TypeScript

@karilint karilint changed the title Admin-only DwC-A export (taxa + localities) Admin-only DwC-A export (taxa + localities + occurrences) Apr 28, 2026
Copy link
Copy Markdown
Collaborator Author

Pushed commit 51fca00a for the separate occurrence DwC-A export.

Summary:

  • Restored locality export scope to the earlier Location + GeologicalContext + MeasurementOrFact package.
  • Added admin-only GET /occurrence/export/dwc-archive and an Occurrences-list menu item.
  • Occurrence ZIP includes occurrence.csv, measurementorfact.csv, location.csv, geologicalcontext.csv, taxon.csv, meta.xml, eml.xml.
  • now_ls MeasurementOrFact rows use verbatimMeasurementType values prefixed with now_ls. to avoid collisions with same-named com_species fields.
  • Added occurrence export docs and unit/API coverage.

Validation:

  • src/unit-tests/dwcArchiveExportOccurrences.test.ts
  • src/unit-tests/dwcArchiveExportLocalities.test.ts
  • src/api-tests/occurrence/dwcArchiveExportOccurrences.test.ts against isolated localhost:3307/now_test
  • commit hook: full lint + full TypeScript
  • additional local checks: npm run lint:backend, npm run lint:frontend, npm run tsc:backend, npm run tsc:frontend

Copy link
Copy Markdown
Collaborator Author

Follow-up for the occurrence export OOM: added commit e90e24d2 (Stream DwC occurrence export).

What changed:

  • occurrence DwC-A export now pages through now_ls instead of loading the full occurrence graph
  • occurrence and measurement CSVs are written to temp files with backpressure
  • location and taxon lookup files still reuse the existing locality/taxon mapping structures
  • the route now pipes a JSZip node stream instead of sending a fully buffered zip

Validation:

  • cd backend && npm run build
  • cd backend && npm run lint
  • cd backend && npx jest src/unit-tests/dwcArchiveExportOccurrences.test.ts --runInBand --config jest-config.js
  • cd backend && npm run test:api:local -- --runTestsByPath src/api-tests/occurrence/dwcArchiveExportOccurrences.test.ts
  • commit hook also ran root lint + root tsc successfully

Copy link
Copy Markdown
Collaborator Author

Follow-up: added commit 3a5e68fc (Show DwC occurrence export progress).

What changed:

  • occurrence export accepts a client-generated exportId and reports generation progress server-side
  • added admin-only progress polling endpoint for that export id
  • occurrence export notification now updates with messages like Generating occurrence rows: 1000/10000 generated, then switches to ZIP download progress once the response stream starts
  • progress entries are cleaned up after completion/failure

Validation:

  • cd backend && npm run build
  • cd backend && npm run lint
  • cd frontend && npm run lint
  • cd frontend && npx tsc --noEmit
  • cd backend && npx jest src/unit-tests/dwcArchiveExportOccurrences.test.ts --runInBand --config jest-config.js
  • cd backend && npm run test:api:local -- --runTestsByPath src/api-tests/occurrence/dwcArchiveExportOccurrences.test.ts
  • commit hook also ran root lint + root tsc successfully

Copy link
Copy Markdown
Collaborator Author

Follow-up: added commit 190859c3 (Tolerate invalid locality reference dates).

This fixes the dev backend crash where GET /locality/:id failed while Prisma was deserializing a nested ref_ref.exact_date containing a legacy MySQL zero month/day value. Locality details now load nested update references without selecting exact_date, then attach exact_date: null, matching the existing species-detail tolerance pattern.

Validation:

  • cd backend && npm run build
  • cd backend && npm run lint
  • cd backend && npm run test:api:local -- --runTestsByPath src/api-tests/locality/projectLink.test.ts
  • commit hook also ran root lint + root tsc successfully

Copy link
Copy Markdown
Collaborator Author

Follow-up: added commit 172aa16c (Tolerate invalid occurrence reference dates).

This fixes the occurrence detail crash where getOccurrenceUpdates() loaded nested update references through now_lau/now_lr and now_sau/now_sr, causing Prisma to deserialize legacy ref_ref.exact_date zero-date values. Occurrence update references now use a no-exact_date select and restore exact_date: null in the response shape.

Validation:

  • cd backend && npm run build
  • cd backend && npm run lint
  • cd backend && npm run test:api:local -- --runTestsByPath src/api-tests/occurrence/getByCompositeKey.test.ts
  • commit hook also ran root lint + root tsc successfully

Copy link
Copy Markdown
Collaborator Author

Follow-up: added commit 647773d1 (Harden reference date reads).

I swept for the remaining places that could hit the same Prisma zero-date failure from ref_ref.exact_date.

Found and addressed:

  • backend/src/services/timeUnit.ts: nested now_tau -> now_tr -> ref_ref
  • backend/src/services/timeBound.ts: nested now_bau -> now_br -> ref_ref
  • backend/src/services/reference.ts: direct getReferenceDetails() full ref_ref.findUnique()
  • consolidated the locality/occurrence fixes through backend/src/services/utils/referenceDate.ts

Also checked remaining ref_ref.findMany usages: getAllReferences() and species details already use narrow selects that do not include exact_date, so they should not deserialize the bad date column.

Validation:

  • cd backend && npm run build
  • cd backend && npm run lint
  • focused API batch against isolated test DB: occurrence detail, reference create/detail, time-unit fetch, time-bound update/detail (4 passed, 18 tests)
  • commit hook also ran root lint + root tsc successfully

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DwC export for taxa and measurements

1 participant