diff --git a/docs/source/_static/doi_format.png b/docs/source/_static/doi_format.png new file mode 100644 index 0000000..6fc4aea Binary files /dev/null and b/docs/source/_static/doi_format.png differ diff --git a/docs/source/_static/repository_obligations_table.png b/docs/source/_static/repository_obligations_table.png new file mode 100644 index 0000000..76efa98 Binary files /dev/null and b/docs/source/_static/repository_obligations_table.png differ diff --git a/docs/source/data_management/citing_and_publishing_datasets.md b/docs/source/data_management/citing_and_publishing_datasets.md new file mode 100644 index 0000000..1c58ed0 --- /dev/null +++ b/docs/source/data_management/citing_and_publishing_datasets.md @@ -0,0 +1,69 @@ +# Publishing and Citing Datasets + +Guidelines for making datasets publicly available, creating DOIs, and properly citing datasets. + +## Purpose + +This guideline supports Data Systems workflows by ensuring datasets referenced in publications are +openly accessible, properly identified with DOIs, and cited according to community standards. It +aligns with publisher and funder policies and promotes scientific reproducibility. + +## How to Publish and Cite Datasets + +**Make Data Open and Accessible** + Ensure datasets associated with publications are stored in publicly accessible, machine-readable formats. + +**Create a DOI and Landing Page** + Digital Object Identifiers, DOIs, are machine-readable identifiers that resolve to information about a resource. + In addition to datasets, researchers can have an ORCID digital identifier, see https://orcid.org/. Publishers are + now generally requiring DOIs that point to data referenced publications, and often ORCIDs as well. + - See [Digital Object Identifiers](digital_object_identifiers.md) for an introduction to DOIs. + - See [Creating a DOI via CU Libraries and DataCite](creating_a_doi.md) for a quick start on creating a DOI and a + landing page for a dataset. + - LASP could build resources to create and manage DOIs and associated landing pages. + +**Cite Datasets in Publications** + Follow established data citation principles to ensure datasets are properly cited in scholarly works. Reference + the [Force 11 Joint Declaration of Data Citation Principles](https://www.force11.org/datacitationprinciples) and + follow practices described in the [ESIP Dataset Citation Guidelines](https://doi.org/10.6084/m9.figshare.8441816). + +**Understand Publisher Requirements** + Ensure that DOIs and ORCIDs are included as required by publishers to maintain compliance with submission guidelines. + +## Options + +There are several options for publishing datasets: + +1. **CU Libraries and DataCite DOI Creation** + Researchers can create DOIs and landing pages for datasets using CU Libraries' integration with DataCite. + +2. **CU Scholar Hosting** + CU Scholar can host articles, reports, and datasets of limited size. CU Scholar prefers to generate and manage DOIs + for hosted datasets. + +3. **LASP DOI Management (Future Direction)** + LASP can develop internal resources for creating and managing DOIs and dataset landing pages, streamlining the + process for LASP-affiliated data products. + +4. **External Repositories** + For larger datasets or specialized data types, external repositories that support DOI assignment can be considered. + +## Useful Links + +- [Creating a DOI via CU Libraries and DataCite](creating_a_doi.md) +- [CU Scholar](https://scholar.colorado.edu/about) +- [Force 11 Joint Declaration of Data Citation Principles](https://www.force11.org/datacitationprinciples) +- [ESIP Dataset Citation Guidelines](https://doi.org/10.6084/m9.figshare.8441816) +- [Zenodo DOI Citation Guide](https://doi.org/10.5281/zenodo.1451971) +- [Data Citation Roadmap (Scholarly Repositories)](https://www.biorxiv.org/content/biorxiv/early/2017/10/09/097196.full.pdf) +- [ORCID](https://orcid.org/) +- [DOIs for SORCE Data Products](https://confluence.lasp.colorado.edu/pages/viewpage.action?pageId=21464459) + (Confluence) + +## Acronyms + +- **DOI** = Digital Object Identifier +- **ORCID** = Open Researcher and Contributor ID +- **ESIP** = Earth Science Information Partners + +Credit: Content taken from a Confluence guide written by Anne Wilson and Shawn Polson. diff --git a/docs/source/data_management/creating_a_doi.md b/docs/source/data_management/creating_a_doi.md new file mode 100644 index 0000000..3adc6b8 --- /dev/null +++ b/docs/source/data_management/creating_a_doi.md @@ -0,0 +1,89 @@ +# Creating a DOI via CU Libraries and DataCite + +As of 2018, CU Libraries is a member of DataCite. Through this membership, LASP can mint and +register DOIs for datasets housed in our repositories, enabling data to be persistently identified, +accessed, and cited. + +Guidelines for assigning Digital Object Identifiers (DOIs) to datasets using this +membership, including steps for request, metadata requirements, and long-term +responsibilities are outlined here. + +## Purpose + +This guideline supports LASP’s data publishing workflows by enabling the creation and registration +of persistent identifiers (DOIs) for datasets using CU resources. These identifiers help ensure +long-term access, discoverability, and proper citation of data. + +## How to create a DOI + +Dataset DOIs should resolve to a dataset landing page providing information about the dataset like where it can be +accessed. CU libraries automatically generates a generic landing page populated with high level metadata on datasets +provided in a DOI request form. Whereas CU Scholar, another CU resource, requires data providers to have a reference to +such a landing page when creating a DOI. + +Note, that the number of DOIs allocated to LASP is limited. + +1. **Submit a Request** + - Researchers: File a Jira issue with type "DOI" in the [Data Management Jira project](https://jira.lasp.colorado.edu/projects/DATAMAN/). + +2. **Prepare Required Metadata** + - Work with the Data Management team to ensure proper metadata and landing page are available. + - Minimum required metadata for DOI creation: + - URL of the landing page (not the dataset itself) + - Creators (list of names) + - Title + - Publisher (typically LASP or a project within LASP) + - ResourceType (usually `dataset`) + - DataCite supports additional metadata. Those properties are described here: https://support.datacite.org/docs/metadata-quality. + +3. **Create DOI via DataCite** + - Data Management team logs into [doi.datacite.org](https://doi.datacite.org/) using the `CUB.LASP` repository ID. + - Click "DOIs" → "Create DOI (Form)" + - Use the form to enter metadata. See full field descriptions at: [DataCite Field Descriptions](https://support.datacite.org/docs/field-descriptions-for-form) + - For developers: There is an [API](https://support.datacite.org/docs/api) that reads the full metadata schema. + +4. **Maintain DOI Metadata** + - Keep DOI metadata up to date in the [DataCite Metadata Store](https://support.datacite.org/docs) + - If a dataset is moved, update the registry. + - If a dataset is removed, maintain a “tombstone” landing page. + +5. **Follow DOI Best Practices** + - Use landing pages (not direct links to datasets). + - Maintain metadata quality and completeness as information changes. + - See [Metadata Guidelines](metadata.md) for dataset metadata requirements. + +6. **Adhere to Roles and Responsibilities** + + LASP (as a DataCite Client) must meet responsibilities outlined in: + - [DataCite Community Responsibility](https://support.datacite.org/docs/community-responsibility) + - [Data Citation Roadmap for Scholarly Data Repositories](https://www.biorxiv.org/content/biorxiv/early/2017/10/09/097196.full.pdf) + +![DataCite_Repository_Guidelines](../_static/repository_obligations_table.png) + +## Getting Help + +Please use the [DATAMAN](https://jira.lasp.colorado.edu/secure/RapidBoard.jspa?rapidView=1430) project on MODS-Jira to +submit a ticket, and someone from the Data Management Working Group will respond to it. + +## Useful Links + +- [Intro to Digital Object Identifiers](digital_object_identifiers.md) +- [DataCite](https://doi.datacite.org/) +- [Field Descriptions for DOI Form](https://support.datacite.org/docs/field-descriptions-for-form) +- [DataCite Metadata Quality](https://support.datacite.org/docs/metadata-quality) +- [DataCite Community Responsibility](https://support.datacite.org/docs/community-responsibility) +- [Data Citation Roadmap (Scholarly Repositories)](https://www.biorxiv.org/content/biorxiv/early/2017/10/09/097196.full.pdf) +- [Intro to DataCite REST API](https://support.datacite.org/docs/api) +- [Metadata Requirements](metadata.md) +- [NASA EOSDIS DOI Guidelines](https://wiki.earthdata.nasa.gov/display/DOIsforEOSDIS) +- [CU Scholar](https://scholar.colorado.edu/about) +- [Creating a DOI for Software](../workflows/open_source/citing_software.md) + +## Acronyms + +- **DOI** = Digital Object Identifier +- **NASA** = National Aeronautics and Space Administration +- **EOSDIS** = Earth Observing System Data and Information System +- **API** = Application Programming Interface + +Credit: Content taken from a Confluence guide written by Anne Wilson and updated by Doug Lindholm diff --git a/docs/source/data_management/digital_object_identifiers.md b/docs/source/data_management/digital_object_identifiers.md new file mode 100644 index 0000000..5d4c928 --- /dev/null +++ b/docs/source/data_management/digital_object_identifiers.md @@ -0,0 +1,137 @@ +# Digital Object Identifiers + +A Digital Object Identifier (DOI) is a code used to uniquely +identify content of various types. DOIs enable easy online +access to research data for discovery, attribution, and reuse, +and enable accurate data citation and other metrics. DOIs are +a persistent identifier, and as such carry expectations of +curation, persistent access, and rich metadata. + +There is a system and practices associated with DOI usage, +for "persistent and actionable identification and interoperable +exchange of managed information on digital networks" +(https://support.datacite.org/docs/doi-basics). + +DOIs are intended to be "resolvable," usually to information +about the object to which the DOI refers—including information +about where the object can be found. For a dataset, that would +be a dataset landing page providing information about the +dataset like where it can be accessed. The DOI should not +point to the dataset itself. The DOI remains fixed over the +lifetime of the object, whereas its location and metadata may +change. When the location changes, the publisher of the +object is responsible for updating the metadata for the DOI +to the new locations. + +The developer and administrator of the DOI system is the +International DOI Foundation (IDF) which introduced DOIs +in 2000. Organizations that meet the contractual obligations +of the DOI system and that are willing to pay to become a +member (such as DataCite, see below) can assign DOIs. + +The DOI system is implemented through a federation of +registration agencies coordinated by the IDF. +See https://www.doi.org/, and particularly +https://www.doi.org/hb.html, the DOI Handbook, for details. + +## Purpose of DOIs + +Funding agencies and publishers increasingly recognize that +datasets and scientific software are valuable research outputs +that should be openly available, identifiable, and citable—often +through DOIs. + +At LASP, digital objects worthy of identification include +datasets and associated outputs (e.g., documentation, papers, +workflows, algorithms, software, etc.). + +## DOI registries + +To enable accessibility, a DOI needs to reside in a registry +where it can be resolved. The registry collects and provides +high level information, assigns DOIs, and links to references. + +[DataCite](https://datacite.org/) is a not-for-profit, global +initiative to "help the research community locate, identify, +and cite research data with confidence," through DOI minting +and registration. It is the leading global provider of DOIs +for datasets. From their website: + +>By working closely with data centres to assign DOIs to +> datasets and other research objects, we are developing a +> robust infrastructure that supports simple and effective +> methods of data citation, discovery, and access. Citable +> data become legitimate contributions to scholarly +> communication, paving the way for new metrics and +> publication models that recognize and reward data sharing. + +CU Libraries are now a member of DataCite. Through this +membership, LASP can mint and register DOIs for datasets +housed in our repositories, enabling data to be persistently +identified, accessed, and cited. + +[Crossref](https://www.crossref.org/) is another registry that +is often mentioned in Earth and space science contexts. It's +a not-for-profit association of ~2000 voting member publishers +who represent 4300 societies and publishers. It exists to +facilitate the links between distributed content hosted at +other sites, and uses DOIs to do so. + +[Zenodo](https://zenodo.org/) is a free repository developed +by CERN and operated by OpenAIRE. It is a general-purpose +repository that allows researchers to deposit datasets, +research software, reports, and any other research-related +digital artifacts. Zenodo assigns DOIs to the deposited +content, making it citable and discoverable. +See [citing software](../workflows/open_source/citing_software.md) +for more on using Zenodo to cite software. + +[ORCiDs](https://orcid.org/) are like DOIs but provide +persistent digital object identifiers for people. + +## DOI Format + +When a LASP researcher needs a DOI, they will provide some information and receive a DOI back. +They will never actually create a DOI. Nevertheless, it is worth understanding the form of a DOI +and the goals behind its format. + +DataCite goals for DOIs include enabling robots and crawlers to recognize DataCite DOIs as URLs, +making them easy to cut and paste, and helping users recognize that DOIs are both a persistent link +and a persistent identifier. + +This is a DOI: + +https://doi.org/10.5281/ZENODO.31780 +A DOI name consists of three parts: + +![DOI_Format](../_static/doi_format.png) + +The proxy is an HTTP URL. DataCite recommends that all DOIs are permanent URLs. +(Using the old DOI protocol, e.g. doi:/10.5281/ZENODO.31780 is NOT recommended.) + +A DOI prefix always starts with "10." and continues with a number. This number +defines a globally unique namespace. (The scope of "global" depends on the organization +managing multiple repositories.) Prefixes should not have semantic meaning. Adding +meaning to the identifier is risky because "despite besting intentions, all names can +change over time" [DataCite DOI Basics](https://support.datacite.org/docs/doi-basics). + +The suffix for a DOI can be almost any string. Here is where information provided in an +input form may be integrated into the DOI. + +Note that DOI names are not case-sensitive, while URLs are case-sensitive: +https://support.datacite.org/docs/datacite-doi-display-guidelines. + +## Useful Links + +- [DataCite: DOI Basics](https://support.datacite.org/docs/doi-basics) +- [DataCite: DOI Handbook](https://www.doi.org/the-identifier/resources/handbook/) +- [DataCite: DOI Display Guidelines](https://support.datacite.org/docs/datacite-doi-display-guidelines) +- [Creating a DOI via CU Libraries and DataCite](creating_a_doi.md) + +## Acronyms + +- **DOI** = Digital Object Identifier +- **IDF** = International DOI Foundation +- **ORCID** = Open Researcher and Contributor ID + +Credit: Content taken from a Confluence guide written by Anne Wilson and Shawn Polson. \ No newline at end of file diff --git a/docs/source/data_management/index.rst b/docs/source/data_management/index.rst index c5ada3e..ed29b55 100644 --- a/docs/source/data_management/index.rst +++ b/docs/source/data_management/index.rst @@ -8,4 +8,7 @@ Data Management file_formats/index metadata.md fair_principles.md - data_stewardship.md \ No newline at end of file + data_stewardship.md + citing_and_publishing_datasets.md + digital_object_identifiers.md + creating_a_doi.md \ No newline at end of file diff --git a/pyproject.toml b/pyproject.toml index bae4313..aeb7ae9 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -36,4 +36,4 @@ homepage = "https://github.com/lasp/" repository = "https://github.com/lasp/developer-guide" [tool.codespell] -ignore-words-list = "nd" \ No newline at end of file +ignore-words-list = "nd, SORCE" \ No newline at end of file