Skip to content

Feat/improve datacite workflow#482

Open
JessyBarrette wants to merge 15 commits into
developmentfrom
feat/improve-datacite-workflow
Open

Feat/improve datacite workflow#482
JessyBarrette wants to merge 15 commits into
developmentfrom
feat/improve-datacite-workflow

Conversation

@JessyBarrette

Copy link
Copy Markdown
Member

Datacite DOI creation workflow revision

This PR attempt to fix the different issues brought up by @gannej in the document:
https://docs.google.com/spreadsheets/d/1NH7dltkaiwV1rqhb_e7hUxc_4XFsdTAp8MXia0lz6NM/edit?gid=994908484#gid=994908484

metadata-entry-form

DOI generation uses record UID as suffix — The generated DOI now includes the first 8 characters of record.recordID as the DataCite DOI suffix, instead of relying on DataCite's random auto-generation. This makes DOIs deterministic and traceable back to the record.

Catalogue URL uses record's primary language — The DOI catalogue URL now uses record.language (the form's primary language setting) rather than the browser's current UI language, ensuring the DOI always links to the correct language page.

Language validation on DOI creation — If no primary language is assigned to the record, DOI operations now throw a clear user-facing error: "Please assign a primary language to the record before creating a DOI."

Region default publisher with ROR — When no contact has the publisher role, the region's organization name is used as the default publisher in the DataCite record. Each region now has a ror field in regions.js (pacific, stlaurent, amundsen have ROR IDs; atlantic, canwin, test are pending), and the ROR identifier is included in the DataCite publisher metadata.

Publisher visibility in the form — The Contact tab shows an info alert when no contact has the publisher role, indicating the region default will be used. The DOI section shows which publisher is active (contact name or region default).

cioos-metadata-conversion

Fixed resourceType mappingresourceType (biological, oceanographic, other) was lost during Firebase→CIOOS conversion and hardcoded to "" in DataCite output. Now correctly passed through and populated. resourceTypeGeneral was reading from the wrong path (record["metadataScope"] instead of record["metadata"]["scope"]); now fixed.

Fixed multilingual title handling — All titles were incorrectly marked as TranslatedTitle. Now the default language title has no titleType (primary title per DataCite schema), and only other languages get TranslatedTitle.

Fixed abstract/description — Only the default language abstract is now included in the DataCite output; translated abstracts are dropped.

Fixed subjects/keywords_get_eov_subjects and _get_keyword_subjects were reading from record["metadata"] (nonexistent path); now correctly read from record["identification"]["keywords"] with both EN and FR keywords included.

Enhanced geoLocationPlace — Now includes vertical extent info (depth/height range in meters) alongside the location description, using the record's default language.

Test plan

  • Verify DOI generation creates a DOI with the first 8 chars of the record UID as suffix
  • Verify a French-language record produces a catalogue URL using the French catalogue page
  • Verify generating a DOI without a primary language shows an error in the form
  • Verify the DOI section displays the correct publisher (contact or region default with label)
  • Verify the Contact tab shows the default publisher notice when no publisher role is assigned
  • Verify the DataCite record includes ROR identifiers for regions that have them
  • Verify resourceType and resourceTypeGeneral are correctly populated in DataCite output
  • Verify the primary title has no titleType and the translated title has TranslatedTitle
  • Verify keywords and EOVs appear as subjects in both languages
  • Verify vertical extent appears in geoLocationPlace

@JessyBarrette JessyBarrette requested a review from gannej April 6, 2026 19:03
@JessyBarrette JessyBarrette changed the base branch from main to development April 6, 2026 19:04
@github-actions

github-actions Bot commented Apr 6, 2026

Copy link
Copy Markdown

Visit the preview URL for this PR (updated for commit 27ea052):

https://cioos-metadata-form-dev-258dc--pr482-feat-improve-data-1991skuv.web.app

(expires Fri, 03 Jul 2026 18:09:26 GMT)

🔥 via Firebase Hosting GitHub Action 🌎

Sign: c9b6275cb4b6311b719349f5e25e457b5691d09c

@JessyBarrette

Copy link
Copy Markdown
Member Author

@gannej I think I fixed most of the issues brought up during the revision. The only thing I wasn't too sure is how to map the UID to the new DOI. For now it grabs the 8 first characters from the UI and generate DOI suffix from it.

@gannej

gannej commented Apr 7, 2026

Copy link
Copy Markdown
Contributor

@gannej I think I fixed most of the issues brought up during the revision. The only thing I wasn't too sure is how to map the UID to the new DOI. For now it grabs the 8 first characters from the UI and generate DOI suffix from it.

Thanks! would it be possible to have the first 16 characters instead of the first 8 (this is what we use at the moment)?

@JessyBarrette

Copy link
Copy Markdown
Member Author

Thanks! would it be possible to have the first 16 characters instead of the first 8 (this is what we use at the moment)?

We can certainly do that! My only concern is that resulting format goes a bit agains the datacite recommendations: https://support.datacite.org/docs/doi-basics

The easiest and recommended option is to use a randomly generated suffix. The auto-generated DOI strings use a-z and 0-9. They avoid i, l, o as they are easily mixed up with 0, 1. We group the suffix into blocks of 4, separated by a hyphen. You can generate a random suffix in both Fabrica and the API and your DOI will look something like this: 10.5438/9te8-5h68.

If you choose not to use this option, remember:

- The DOI suffix must be unique within each prefix. The optimum length of a DOI suffix is 6–10 characters.
- Only use a-z, 0-9 and - in a DOI suffix. Other characters might have special meaning or will be escaped. DOI suffixes are [not case sensitive](https://support.datacite.org/docs/datacite-doi-display-guidelines#dois-urls-and-case-sensitivity).
- Avoid human-readable information in a DOI suffix because any meaning may change over time.

@JessyBarrette

Copy link
Copy Markdown
Member Author

One thing I just realized is that the DOI remain in draft on Datacite even when the record is set as published within the metadata form.

Should we change the status of the DOI when a record is published? Or we leave the region manager to go to datacite interface and flick the status of the record.

@gannej Let me know your thoughts?

@gannej

gannej commented May 28, 2026

Copy link
Copy Markdown
Contributor

One thing I just realized is that the DOI remain in draft on Datacite even when the record is set as published within the metadata form.

Should we change the status of the DOI when a record is published? Or we leave the region manager to go to datacite interface and flick the status of the record.

@gannej Let me know your thoughts?

After consulting the data team, we would rather keep the process as is and go on datacite to change The DOI status (draft to public) it will be easier for embargoed data where DOI should not be public yet.

@JessyBarrette

Copy link
Copy Markdown
Member Author

OK I think I will create a option in the admin to define what to do on publish, not everybody have the same needs. We can add a checker to see if the doi is active or not too.

@gannej

gannej commented May 29, 2026

Copy link
Copy Markdown
Contributor

OK I think I will create a option in the admin to define what to do on publish, not everybody have the same needs. We can add a checker to see if the doi is active or not too.

Perfect! thanks!

- Added DataciteStatusDialog component for managing DOI state transitions during publish/unpublish actions.
- Integrated DOI state management into DOIInput component, allowing users to register, publish, or demote DOIs.
- Updated Admin component to include DOI status management settings.
- Enhanced RecordActions component to handle DOI state transitions with the new dialog.
- Updated UserProvider to include new functions for DOI management (publishDoi, registerDoi, hideDoi).
- Modified package-lock.json to resolve merge conflicts and ensure proper dependency management.
@JessyBarrette

Copy link
Copy Markdown
Member Author

@gannej

OK I think I finally took the time to iron out the DOI management here's the general workflow when the form manage the doi status or not

DOI Workflow

DOIs are reserved via DataCite directly from the metadata form and follow DataCite's three-state lifecycle: Draft → Registered → Findable. Each region chooses who drives the status transitions through an admin setting, doiStatusManagement, with two modes:

Common to both modes

  • Generate — Reserve a draft DOI with DataCite (suffix auto-generated, from the record identifier, or manual). No metadata is sent yet.
  • Update metadata — Once the record is submitted or published, the full metadata is pushed to DataCite (automatically after generation, and on demand).
  • Delete — Only draft DOIs can be deleted; registered/findable DOIs are permanent.
    The live DataCite record and catalogue landing page links are always viewable.

form — "Managed from this form"

The DOI's lifecycle state is driven from within the app:

  • The DOI status dropdown is editable in the form, transitioning Draft → Registered → - Findable (or demoting Findable → Registered), each with a confirmation dialog explaining the consequences. Editing requires the dataset to be submitted/published.
  • Publishing/un-publishing a record from the record list prompts a DataCite status dialog, so the DOI state stays in sync with the record's publication state.
  • The DOI status column is shown in the record table.

datacite — "Managed via DataCite portal" (default)

The form does not drive the DOI's lifecycle state — that's handled externally on the DataCite portal:

  • The status field is read-only; it only reflects the current state DataCite reports (fetched on load), and the form never transitions it.
  • Publishing/un-publishing a record performs the normal action without a DataCite status dialog.
  • The DOI status column is hidden in the record table, since the value isn't authoritative/managed here.

In short: form mode makes the entry form the source of truth for DOI state and exposes the full set of transition controls; datacite mode treats the form as create-and-push-metadata only, leaving state changes to the DataCite portal.

@JessyBarrette

Copy link
Copy Markdown
Member Author

I think the only left right now for me is to prevent anyone but admin and reviewers to modify a DOI status and genration.

Not sure what are the thoughts at SLGO @gannej

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

Status: Todo

Development

Successfully merging this pull request may close these issues.

Review Datacite JSON/Schema Output for CIOOS Metadata Mapping

2 participants