Skip to content

DwC export for taxa and measurements #1150

@karilint

Description

@karilint

You are working in the GitHub repository nowcommunity/nowdatabase.

Your task is to IMPLEMENT a first, simple Darwin Core Archive (https://dwc.tdwg.org/terms/) export for TAXA and MEASUREMENTS, intended for ADMIN USERS ONLY for initial testing.

Goal

  • Add an admin-only export option that generates standard DwC output files for:
    1. taxon.csv
    2. measurementorfact.csv
    3. archive metadata files needed for a simple DwC-A package:
      • meta.xml
      • eml.xml or, if a full EML generator is too heavy for the first step, a minimal metadata XML file with clear TODO markers and a simple implementation path
  • Keep the first version intentionally simple.
  • We will add more measurements/traits later.

Important scope constraints

  • This is a TEST / ADMIN-ONLY feature for now.
  • Add it to the /species page, as one export option
  • Do NOT try to implement the full final production workflow yet.
  • Do NOT add all possible traits from com_species yet.
  • Start with a limited, clear subset of fields and make the code easy to extend.

Implementation target

  • Add a new export option in the backend that is only available to admin users (for now).
  • The export should produce a ZIP file containing the DwC files.
  • Use UTF-8 encoded CSV files.
  • The ZIP should be downloadable through an admin-only route or admin-only backend action from the /species page, as one export option.
  • Keep the design modular so we can extend the mappings later.

Repository/database context
The Prisma schema contains at least these relevant models and fields:

com_species

  • species_id
  • class_name
  • order_name
  • family_name
  • subclass_or_superorder_name
  • suborder_or_superfamily_name
  • subfamily_name
  • genus_name
  • species_name
  • unique_identifier
  • taxonomic_status
  • common_name
  • sp_author
  • body_mass
  • brain_mass
  • diet1
  • diet2
  • diet3
  • diet_description
  • locomo1
  • locomo2
  • locomo3
  • activity
  • crowntype
  • microwear
  • mesowear
  • mw_value
  • sp_comment

com_taxa_synonym

  • synonym_id
  • species_id
  • syn_genus_name
  • syn_species_name
  • syn_comment

DwC output design for v1

  1. taxon.csv
    Create one row per taxon from com_species.

Use these columns in taxon.csv for v1:

  • taxonID
  • scientificName
  • scientificNameAuthorship
  • vernacularName
  • taxonRank
  • taxonomicStatus
  • class
  • order
  • family
  • genus
  • specificEpithet
  • infraspecificEpithet
  • higherClassification
  • taxonRemarks
  • taxonConceptID

Mapping rules:

  • taxonID = com_species.species_id
  • scientificName:
    • build from genus_name, species_name, and authorship
  • scientificNameAuthorship = sp_author
  • vernacularName = common_name
  • taxonRank:
    • default to species
    • this needs more validatios because of values like 'indet.', 'gen.', 'sp.' in the taxonomic fields
  • taxonomicStatus = taxonomic_status
    • if empty/null, fallback to "accepted"
    • if unique_identifier <> '-' it may be a subspecies, but needs more specifications in the future
  • class = class_name
  • order = order_name
  • family = family_name
  • genus = genus_name
  • specificEpithet = species_name
  • infraspecificEpithet = unique_identifier is a good candidate
  • higherClassification:
    • concatenate available higher ranks in order:
      class_name | subclass_or_superorder_name | order_name | suborder_or_superfamily_name | family_name | subfamily_name
    • skip empty values
    • use | as separator
  • taxonRemarks = sp_comment
  • taxonConceptID = think if anuthing fits here

Important:

  • For v1, export only com_species rows as taxa.
  • Do NOT yet create separate synonym Taxon rows from com_taxa_synonym.
  • Leave clear TODO comments where synonym export can be added later.
  1. measurementorfact.csv
    Create a long-format measurement file linked back to taxa.

Use these columns in measurementorfact.csv for v1:

  • taxonID
  • measurementID
  • measurementType
  • measurementValue
  • measurementUnit
  • measurementMethod
  • measurementRemarks

For v1, ONLY export these selected measurements from com_species:

  • body_mass
  • brain_mass
  • diet1
  • diet2
  • diet3
  • diet_description
  • locomo1
  • locomo2
  • locomo3
  • activity
  • crowntype
  • microwear
  • mesowear
  • mw_value

Mapping rules:

  • taxonID = species_id
  • measurementID = deterministic string like:
    • NOW:<species_id>:<field_name>
  • measurementType values:
    • body_mass -> body mass
    • brain_mass -> brain mass
    • diet1 -> diet category 1
    • diet2 -> diet category 2
    • diet3 -> diet category 3
    • diet_description -> diet description
    • locomo1 -> locomotion 1
    • locomo2 -> locomotion 2
    • locomo3 -> locomotion 3
    • activity -> activity
    • crowntype -> crown type
    • microwear -> microwear
    • mesowear -> mesowear
    • mw_value -> mesowear value
  • measurementValue = the field value converted to string
  • measurementUnit:
    • body_mass -> g
    • brain_mass -> g
    • everything else -> empty string or NA or similar if there is a recommendation for it

Important:

  • Only emit a measurement row if the source field is non-null and non-empty.
  • Keep implementation simple and deterministic.
  • This should be easy to extend later by adding more field mappings.
  1. meta.xml
    Create a valid DwC-A meta.xml that describes:
  • one core file: taxon.csv
  • one extension file: measurementorfact.csv

Requirements:

  • point to the correct file names
  • define field indices in the order written to CSV
  • use a Taxon core
  • use MeasurementOrFact extension
  • keep this valid and minimal
  1. eml.xml / metadata file
    Implement a simple metadata file for the archive.

For v1:

  • If feasible, generate a minimal eml.xml
  • If a full EML implementation is too much for the first change, generate a simple minimal metadata XML file and structure the code so replacing it with real EML later is easy

Populate with placeholder/admin-configurable values such as:

  • title: NOW database Darwin Core test export or similar
  • abstract: short explanation that this is an admin-only test export or similar
  • creator / contact: placeholder values from config or constants or similar
  • publication date: current date
  • rights: placeholder string
  • packageId: deterministic simple identifier if needed
  • see https://nowdatabase.org/ for some ideas

Code design requirements

  • Implement this cleanly and minimally.
  • Prefer a dedicated export service/module rather than spreading logic across controllers.
  • Separate:
    1. data fetching
    2. DwC row mapping
    3. CSV writing
    4. XML metadata generation
    5. ZIP packaging
  • Add clear TODO comments for:
    • synonym export
    • additional traits/measurements
    • richer EML generation
    • broader access than admin-only

Admin-only access requirement

  • The new export route/action must be restricted to admin users only.
  • Reuse existing auth/authorization patterns already present in the project.
  • Do not invent a parallel auth system.
  • If there is an existing admin role/group check, use that.

Output behavior

  • The admin action should generate a ZIP archive containing:
    • taxon.csv
    • measurementorfact.csv
    • meta.xml
    • eml.xml or the temporary metadata XML file
  • Use stable filenames.
  • The archive filename can include the date, for example:
    • now_dwc_test_export_YYYYMMDD.zip

Testing requirements
Add lightweight tests for:

  • taxon row mapping from a sample com_species record
  • measurement row generation for a sample com_species record
  • omission of null/empty measurement fields
  • admin-only access protection
  • ZIP contains expected files

Implementation guidance

  • First inspect the current backend structure and identify:
    • existing routing
    • auth/role checks
    • file download patterns
    • Prisma access patterns
  • Then implement the feature in the style already used by the repo.
  • Do not refactor unrelated code.
  • Keep the patch focused.

Deliverables

  1. Backend implementation
  2. Any route/controller/service additions needed
  3. Minimal tests
  4. Short developer note in comments or a small markdown file explaining:
    • what was implemented
    • which fields are included in v1
    • where to extend mappings later

Before coding, briefly summarize:

  • where you plan to place the export code
  • how admin-only access will be enforced
  • how the ZIP and files will be generated

Then implement the change.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions