Fix issue where form fields can be skipped#166
Conversation
…rent object; pikepdf handles this fine but we were skipping valid form fields
There was a problem hiding this comment.
Pull request overview
This PR fixes PDF form-field traversal so that fields aren’t skipped when the field name/type live on a parent field dictionary while the visible widget annotation (Rect/P/F) lives on a child in /Kids (common in some “unusual layout” PDFs).
Changes:
- Update
_unnest_pdf_fieldsto propagate effective field type/flags down to child widgets and treat widget-only leaves as fields when their type is inherited. - Add a regression test that constructs a parent
/FT+/Tfield with a child widget annotation and assertsget_existing_pdf_fieldsreturns the correctly named/positioned field. - Apply minor formatting-only changes in a few tests and in
lit_explorerfor readability.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
formfyxer/pdf_wrangling.py |
Fixes field unnesting so named-parent + widget-child fields are recognized and not skipped. |
formfyxer/tests/test_pdf_labeling_rules.py |
Adds regression coverage for parent-named + child-widget AcroForm fields; minor formatting tweak. |
formfyxer/tests/test_lit_explorer_pdf_labeling.py |
Formatting-only changes to test decorators and namespaces. |
formfyxer/lit_explorer.py |
Formatting-only changes in field rename loop for readability. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Sorry formatting changes got dragged in here; didn't realize we didn't have a black action yet. I'll do that in a separate PR |
|
Going to merge, this is easily reproduced and tested, but later feedback welcome |
BryceStevenWilley
left a comment
There was a problem hiding this comment.
Today I learned! Good PR, sorry for the late review.
For future reference, here's the quote from the PDF spec about situations like this:
A field's children in the hierarchy may also include widget annotations (see 12.5.6.19) that define its appearance on the page... As a convenience, when a field has only a single associated widget annotation, the contents of the field dictionary and the annotation dictionary may be merged into a single dictionary...
Which helped alleviate my confusion.
| if effective_type == "/Btn" and bool((effective_flags or 0) & 0x10000): | ||
| return [] | ||
|
|
||
| if hasattr(field, "FT") or hasattr(field, "F"): |
There was a problem hiding this comment.
I don't think this affects the particular PDF you were trying to fix, but from what I understand there could be terminal fields where the field type is defined in the parent. The annotation flag ("F") can either be a separate Widget annotation (like you've addressed in this PR) or it can be merged with the terminal field.
So I think this should be:
| if hasattr(field, "FT") or hasattr(field, "F"): | |
| if effective_type or hasattr(field, "F"): |
I'm still unclear if needs to be present or not. It most cases it is (to set the "Print" flag for the field to show up when the document is printed).
Tested this PDF eoir-59_rop_request (2).pdf
; old FormFyxer was failing to identify these four, valid forms with just a slightly unusual layout:
Here's what they looked like internally:
So the field name/type live on the parent, and the visible placement lives on the child.
Our old code in /home/quinten/FormFyxer/formfyxer/pdf_wrangling.py only recognized: