DOI Request Handler (Email → DataCite XML)

Small command-line utility that parses .eml messages from a DOI request form (HTML body) and produces a DataCite Kernel-4 metadata XML file suitable for import into DataCite Fabrica.

Requirements

Python 3.8+
Dependencies: lxml, requests (see requirements.txt)

Quick start

Create a virtualenv and install dependencies:

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Run the converter:

python src/main.py path/to/request.eml

This writes path/to/request.xml (same directory as the input .eml).

What gets extracted

The script uses simple regex patterns against the email’s HTML body to extract:

Requester name and email
URL
Creator(s)
Title
Publisher
Publication year
Resource type
Description/abstract

It prints a readable “Extracted data” table to the terminal and then generates DataCite XML.

DataCite output details

The <identifier> is a placeholder (10.XXXX/XXXXX). DataCite assigns the real DOI when you register it.
Creator parsing:
- Multiple creators are separated by semicolons (;).
- Names containing a comma are treated as Personal (e.g., Last, First).
- Names without a comma are treated as Organizational.
The <resourceType> is currently emitted with resourceTypeGeneral="Dataset".

ROR organization identifiers

For organizational creators and the publisher, the script attempts to add a ROR ID:

Checks a small local mapping (ROR_MAPPINGS) in src/main.py
Falls back to the public ROR API (https://api.ror.org/organizations)

If the lookup fails (offline, timeout, no matches), it continues without a ROR ID.

Repository layout

src/main.py — CLI entrypoint and conversion logic
data/ — sample .eml / .xml files
output/ — spare output folder (not currently used by the script)

License

See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DOI Request Handler (Email → DataCite XML)

Requirements

Quick start

What gets extracted

DataCite output details

ROR organization identifiers

Repository layout

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DOI Request Handler (Email → DataCite XML)

Requirements

Quick start

What gets extracted

DataCite output details

ROR organization identifiers

Repository layout

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages