This script uses lxml to generate a representation of an EAD XML document that you can do different stuff with. The basic EAD class handles the namespace junk that EAD includes and has a built-in function to extract all the item or folder level info from the finding aid. One major assumption is that the EAD is generated by ArchivesSpace, but it's probably applicable to other sources (as long as they use the same XML namespaces provided by ArchivesSpace/EAD spec).
You have to choose one of two modes, items or replace.
items is the default and in this mode the script produces a CSV file with each line representing an item or folder level description from the finding aid. It includes the System ID, Title, Date, Scope & Content Note.
replace lets you select a part of the finding aid that you want to modify in bulk, provided a condition to be met and a replacement value. This requires a CSV including valid XPATH to select the target data you want to replace, a value for the XPATH to evaluate, and the replacement value. The example CSV in this directory shows the format to use. You need to use the first row for column headers.
Getting the right XPATH expression can be a pain, but there are many online XPATH constructors where you can test out your expression.
Note! You also need to include the e prefix to tag names in your XPATH expression. This captures the "empty" namespaced (non-namespaced? naked?) tags in the file. For example: //*[c03] vs //*[e:c03]. If you're using an online XPATH validator, you won't need to include namespaces, just be sure to add them in the CSV you create for this script.
For example, if you want to change all the URLs in the digital object <dao> tags, you would have these three elements in each line of the CSV:
- XPATH expression to get to the
hrefattribute inside the<dao>tag (this is probably the same for all the rows) - Some ID or other hook for the XPATH to search for, for example the
idattribute of the parent<c>tag, or perhaps theunittitleor some other unique way to get to the rightdao - The URL that you want to apply to the
<dao href="__">attribute
You could also replace the text content of a tag using the same process.
The output is a new xml file in the same directory as the input EAD file with _new appended.
The next logical step would be to include an add mode where you can add new tags to the finding aid, like a new <scopecontent> note for all the items, or what have you.
pip3 install lxml