A Python command-line tool that extracts standard references from DITA XML topic files and generates DITA key definition maps.
This tool parses DITA topic files containing structured lists of standards and documentation references, then automatically generates a DITA keymap file with key definitions that can be reused across your documentation project.
- Multiple XML Structure Support: Handles various DITA XML structures for standard references
- Intelligent Parsing: Uses a chain-of-responsibility pattern with specialized handlers for different XML patterns
- Automatic ID Generation: Creates standardized IDs when not explicitly provided
- Debugging Tools: Extract and inspect individual elements for troubleshooting
- Configurable Logging: Multiple verbosity levels for detailed operation insight
# clone the repository:
git clone git@github.com:elizaluszczyk/dita-topic-to-keymap.git
cd dita-topic-to-keymap
# install development dependencies:
pip install -e .
pip install -r ./requirements/dev.txt
pre-commit installConvert a DITA topic file to a keymap:
ditatk parse input.xmlWith custom output file:
ditatk parse input.xml -o standards-keymap.xmlWith verbose logging:
# INFO level
ditatk parse input.xml -v
# DEBUG level (most detailed)
ditatk parse input.xml -vvExtract and display a specific list item element (useful for debugging):
ditatk extract input.xml 5This displays the 5th <li> element from the input file.
The tool handles eight different XML patterns commonly found in DITA topics:
<li>
<keyword id="iso-9001">ISO 9001</keyword>
</li>Handler: KeywordWithIdHandler
Result: (iso-9001, "ISO 9001")
<li id="std-iso-14001">
<keyword>ISO 14001</keyword>
</li>Handler: KeywordWithoutIdHandler
Result: (std-iso-14001, "ISO 14001")
<li>
<keyword>ISO 27001</keyword>
</li>Handler: KeywordWithoutIdHandler
Result: (std_iso-27001, "ISO 27001") - ID auto-generated from keyword text
<li>IEEE 802.11</li>Handler: ListItemWithoutKeywordHandler
Result: (std_ieee-802-11, "IEEE 802.11") - ID auto-generated from text
<li id="nist-sp-800-53">
<cite>NIST Special Publication 800-53</cite>
</li>Handler: ListItemWithoutKeywordHandler
Result: (nist-sp-800-53, "NIST Special Publication 800-53")
<li id="fips-140-2">
<cite>Federal Information Processing Standard 140-2</cite>
<keyword keyref="nist-fips"/>
</li>Handler: ListItemWithCiteHandler
Result: (fips-140-2, "Federal Information Processing Standard 140-2") - Uses cite text as description
<li id="rfc-7540">
<cite><keyword keyref="ietf-rfc"/>Hypertext Transfer Protocol Version 2 (HTTP/2)</cite>
</li>
Handler: ListItemWithCiteHandler
Result: (rfc-7540, "Hypertext Transfer Protocol Version 2 (HTTP/2)") - Uses only the text following the keyword reference
<li>
<cite>
<keyword id="gdpr">General Data Protection Regulation</keyword>
</cite>
</li>Handler: KeywordNestedInCiteHandler
Result: (gdpr, "General Data Protection Regulation")
The tool may encounter <li> elements that cannot be processed by any handler. This occurs when an element lacks sufficient content to generate a valid keymap entry (e.g., no keyword text, no citation text, or empty content).
Example scenario:
$ ditatk parse data/r_standards.xml
[2025-10-08 18:41:01,991] dita_topic_to_keymap.cli [WARNING] No handler was able to parse element 286
[2025-10-08 18:41:01,991] dita_topic_to_keymap.cli [WARNING] No handler was able to parse element 287
[2025-10-08 18:41:01,991] dita_topic_to_keymap.cli [WARNING] No handler was able to parse element 288Generated keymap files follow DITA map standards:
<?xml version="1.0" ?>
<!DOCTYPE map PUBLIC '-//OASIS//DTD DITA Map//EN' 'map.dtd'>
<!-- Generated automatically on 2025-10-08 16:15:34 -->
<map>
<title>Standards and Documentation Key Definitions</title>
<keydef keys="iso-9001">
<topicmeta>
<keywords>
<keyword>ISO 9001</keyword>
</keywords>
</topicmeta>
</keydef>
<!-- Additional keydef elements... -->
</map>The tool uses specialized handlers in priority order:
- KeywordWithIdHandler - Processes
<keyword>with explicitid - KeywordWithoutIdHandler - Processes
<keyword>withoutid - ListItemWithoutKeywordHandler - Processes
<li>without<keyword> - ListItemWithCiteHandler - Processes
<li>with<cite>elements - KeywordNestedInCiteHandler - Processes
<keyword>nested in<cite>
- No flag:
WARNING(default) -v:INFO-vv:DEBUG
This project is licensed under the MIT License - see the LICENSE file for details.
Eliza Łuszczyk