PDF-Auto-A11y is a Java-based PDF accessibility remediation tool that validates and automatically fixes common tagging and document-structure issues in tagged PDFs. It is designed for accessibility professionals who want to automate the repetitive structural corrections that would otherwise be done manually in Acrobat or similar remediation tools.
Requirements:
- Java 21+
- Maven 3.x
Clone the repository and run the CLI wrapper:
git clone https://github.com/boyechko/pdf-autoa11y.git
cd pdf-autoa11y
./pdf-autoa11y input.pdf
# writes: input_autoa11y.pdfLaunch the GUI:
./pdf-autoa11y-guiThe wrapper scripts compile the project on first run and download Maven dependencies as needed. If you prefer to build explicitly first:
mvn clean packagePDF-Auto-A11y runs a pipeline of checks against the input PDF. Each check gets its own pipeline step so intermediate results can be inspected. Checks cover document-level metadata and annotations as well as structure-tree validation and repair.
By default, the tool can automatically remediate issues such as:
- Missing document language (
Lang, defaulting toen-US) - Missing logical tab order (
Tabs/S) - Missing tagged-PDF marker (
MarkInfo/Marked) - Missing
Documentroot element - Untagged link annotations by creating
Linktags - Unexpected pushbutton widget annotations
- Broken ligature mappings
- Unverified PDF/UA conformance claims in XMP metadata
- Common list-structure problems such as missing
LI/LBody, flattened list hierarchies, and mislabeled bullets - Needless nesting, empty tags, some mistagged artifacts, and related structure cleanup
Some issues are detected but not automatically fixed. For example:
- Missing alt text on figures
- PDFs that are image-only and need OCR first
- PDFs with no structure tree at all
Documents must already be tagged. The tool validates missing structure trees, but it does not create a full structure tree for an untagged PDF.
This tool is useful when:
- You have tagged PDFs with recurring list or tag-shape issues from Word, InDesign, or similar authoring tools
- You want consistent remediation across many files
- You want to identify structural issues before manual review
- You want to automate the repetitive fixes and focus manual effort on issues that require judgment
./pdf-autoa11y [options] <input.pdf> [output.pdf|output_dir/]If output is omitted, the result is written next to the input as
<base>_autoa11y.pdf. If output is a directory, the output filename is
generated inside that directory.
Common options:
-a, --analyzeanalyze only; do not write an output PDF-r, --reportsave the console output to a text report file-r=<file>,--report=<file>write the report to a custom path-p, --password <pw>open encrypted PDFs-q, --quietshow only errors and final status-v, --verboseshow detailed processing information-vv, --debugshow full debug output-t, --print-treeprint the structure tree during processing--dump-treeprint the structure tree with MCRs/annotations and exit--dump-rolesprint the structure tree roles and exit--skip-checks <a,b,c>skip specific checks--only-checks <a,b,c>run only specific checks--sidecar <file>use a specific sidecar config file--create-sidecarcreate a template sidecar config file and exit-f, --forcewrite an output PDF even if no fixes were applied
Run ./pdf-autoa11y --help for the full CLI help text.
Examples:
# Remediate using the default output name
./pdf-autoa11y document.pdf
# Analyze only and save a report next to the input
./pdf-autoa11y -a -r document.pdf
# Choose both output and report paths
./pdf-autoa11y --report=out/report.txt document.pdf out/fixed.pdf
# Write output into a directory
./pdf-autoa11y document.pdf out/
# Process an encrypted PDF
./pdf-autoa11y -p mypassword document.pdf
# Inspect the structure tree without remediating
./pdf-autoa11y --dump-tree document.pdf./pdf-autoa11y-guiThe GUI lets you select a PDF, enter a password if needed, view processing output in real time, and save the remediated PDF.
Use -r/--report to save the same console output to a text file.
If you pass -r/--report without a path, the report defaults to
<base>_autoa11y.txt:
- In analyze mode, the report is written next to the input PDF
- In remediation mode, the report is written next to the output PDF
Analysis and remediation can report different issue counts. Analysis scans the original document in a single pass, while remediation applies fixes sequentially, so earlier fixes may resolve issues before later checks run.
The output includes:
- Checks that passed
- Issues found
- Fixes applied
- Warnings that still require manual review
- Page references or object references when available
Remediation runs as a stepwise pipeline and writes intermediate PDFs to a temp directory. By default, that location is:
/tmp/pdf-autoa11y/pipeline/<input-filename>/
You can override the base directory with either:
AUTOA11Y_PIPELINE_DIR=/path/to/dir-Dautoa11y.pipeline.dir=/path/to/dir
Intermediate files are currently kept by default, which is useful for debugging and inspecting each processing step.
CHANGELOG.mdfor release notesCONTRIBUTING.mdfor development workflowdocs/sidecar.mdfor per-PDF sidecar config referencedocs/checks-and-fixes.mdfor the full list of checks and fixesdocs/architecture-decisions.mdfor architecture decision records
Run tests:
mvn testPersist test PDFs for review:
mvn test -Dpdf.autoa11y.testOutputDir=/tmp/pdf-autoa11y-testsRun the app through Maven:
mvn exec:java -Dexec.args="input.pdf"Apply code formatting:
mvn spotless:applyThis project is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0).
This project uses iText PDF, which is also licensed under AGPL. As a derivative work, PDF-Auto-A11y must remain under AGPL-compatible terms.
In practical terms:
- You may use, modify, and distribute this software
- If you modify it and make it available over a network, you must also make the modified source available under AGPL-3.0
- Software incorporating this code must also comply with AGPL obligations
See LICENSE for the full license text.