Skip to content

Fix issue where form fields can be skipped#166

Merged
nonprofittechy merged 1 commit intomainfrom
better-field-detection
Apr 1, 2026
Merged

Fix issue where form fields can be skipped#166
nonprofittechy merged 1 commit intomainfrom
better-field-detection

Conversation

@nonprofittechy
Copy link
Copy Markdown
Member

Tested this PDF eoir-59_rop_request (2).pdf
; old FormFyxer was failing to identify these four, valid forms with just a slightly unusual layout:

  • recipient_name
  • recipient_org
  • recipient_mailing_care_name
  • recipient_mailing_address_address
  • recipient_mailing_address_unit

Here's what they looked like internally:

  • Parent field object:
    • has the field name: /T (recipient_org)
    • has the field type: /FT /Tx
    • has /Kids [...]
    • does not have widget-only properties like /Rect, /P, sometimes no /F
  • Child widget annotation:
    • has the visible box: /Rect [...]
    • has page link: /P
    • has annotation flags: /F
    • points back to the parent with /Parent
    • does not repeat /T or /FT

So the field name/type live on the parent, and the visible placement lives on the child.

Our old code in /home/quinten/FormFyxer/formfyxer/pdf_wrangling.py only recognized:

  • a leaf with both FT and F, or
  • a child inheriting from a parent only if parent_flags was truthy

…rent object; pikepdf handles this fine but we were skipping valid form fields
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes PDF form-field traversal so that fields aren’t skipped when the field name/type live on a parent field dictionary while the visible widget annotation (Rect/P/F) lives on a child in /Kids (common in some “unusual layout” PDFs).

Changes:

  • Update _unnest_pdf_fields to propagate effective field type/flags down to child widgets and treat widget-only leaves as fields when their type is inherited.
  • Add a regression test that constructs a parent /FT + /T field with a child widget annotation and asserts get_existing_pdf_fields returns the correctly named/positioned field.
  • Apply minor formatting-only changes in a few tests and in lit_explorer for readability.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File Description
formfyxer/pdf_wrangling.py Fixes field unnesting so named-parent + widget-child fields are recognized and not skipped.
formfyxer/tests/test_pdf_labeling_rules.py Adds regression coverage for parent-named + child-widget AcroForm fields; minor formatting tweak.
formfyxer/tests/test_lit_explorer_pdf_labeling.py Formatting-only changes to test decorators and namespaces.
formfyxer/lit_explorer.py Formatting-only changes in field rename loop for readability.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@nonprofittechy
Copy link
Copy Markdown
Member Author

Sorry formatting changes got dragged in here; didn't realize we didn't have a black action yet. I'll do that in a separate PR

@nonprofittechy
Copy link
Copy Markdown
Member Author

Going to merge, this is easily reproduced and tested, but later feedback welcome

@nonprofittechy nonprofittechy merged commit ab51c48 into main Apr 1, 2026
6 checks passed
@nonprofittechy nonprofittechy deleted the better-field-detection branch April 1, 2026 22:18
Copy link
Copy Markdown
Contributor

@BryceStevenWilley BryceStevenWilley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Today I learned! Good PR, sorry for the late review.

For future reference, here's the quote from the PDF spec about situations like this:

A field's children in the hierarchy may also include widget annotations (see 12.5.6.19) that define its appearance on the page... As a convenience, when a field has only a single associated widget annotation, the contents of the field dictionary and the annotation dictionary may be merged into a single dictionary...

Which helped alleviate my confusion.

if effective_type == "/Btn" and bool((effective_flags or 0) & 0x10000):
return []

if hasattr(field, "FT") or hasattr(field, "F"):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this affects the particular PDF you were trying to fix, but from what I understand there could be terminal fields where the field type is defined in the parent. The annotation flag ("F") can either be a separate Widget annotation (like you've addressed in this PR) or it can be merged with the terminal field.

So I think this should be:

Suggested change
if hasattr(field, "FT") or hasattr(field, "F"):
if effective_type or hasattr(field, "F"):

I'm still unclear if needs to be present or not. It most cases it is (to set the "Print" flag for the field to show up when the document is printed).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants