Skip to content

Implement LiteraryFormPack: mapping, LCSH suffix extraction, conflict resolution#4

Open
modi02 wants to merge 5 commits into
Open-Book-Genome-Project:mainfrom
modi02:raj/literary-form-pack
Open

Implement LiteraryFormPack: mapping, LCSH suffix extraction, conflict resolution#4
modi02 wants to merge 5 commits into
Open-Book-Genome-Project:mainfrom
modi02:raj/literary-form-pack

Conversation

@modi02
Copy link
Copy Markdown

@modi02 modi02 commented Apr 21, 2026

This builds on top of the architecture in #3.

The LiteraryFormPack in #3 only had PrefixRule("form") so actual subject strings like "fiction" or "Pirates--Fiction" weren't matching anything. This PR fills that in.

Changes:

  • MappingRule against literary_form.json (49 mappings for Fiction/Nonfiction variants)
  • LCSHSuffixRule for "--" patterns like "Pirates--Fiction" → Fiction. These were silently falling through to unmapped in the legacy script, this fixes that.
  • DroppableRule for noise strings (accessible book, internet archive wishlist, texts etc.)
  • Conflict resolution logic: Fiction wins by default unless there's a strong Nonfiction signal like biography or memoir. This matters for historical fiction works that have both "history" and "fiction" in subjects.

Tested against 11 cases, all pass.

Built to be compatible with #3, happy to rebase once that merges.

Kaftow and others added 4 commits April 21, 2026 12:07
…solution

Replaces the PrefixRule-only skeleton with a fully working implementation:

- MappingRule against literary_form.json (49 mappings covering Fiction and
  Nonfiction variants including general fiction, short story, fictitious works,
  biographical, autobiographical)
- LCSHSuffixRule: extracts genre signals from LCSH '--' patterns
  e.g. 'Pirates--Fiction' correctly resolves to Fiction
- DroppableRule: silently drops noise strings (import artifacts, access markers)
- Conflict resolution: Fiction wins by default unless strong unambiguous
  Nonfiction markers present (biography, memoir, autobiography etc.)
  Prevents topic subdivisions like 'history' from overriding fiction classification
  on historical fiction works
- Adds resources/mappings/literary_form.json and updated droppable.json

All 11 manual tests pass.
@Kaftow
Copy link
Copy Markdown

Kaftow commented Apr 21, 2026

Great Work!
But I think maybe LCSHSuffixRule should be placed in the rules/ folder as its own file, similar to other rules like mapping_rule.py...

- Extract LCSHSuffixRule into rules/lcsh_suffix_rule.py
- Export from rules/__init__.py alongside other rules
- Import from rules in literary_form.py instead of defining inline
- All 11 tests still pass
@modi02
Copy link
Copy Markdown
Author

modi02 commented Apr 21, 2026

Done, moved to rules/lcsh_suffix_rule.py and exported from rules/init.py.

All tests still pass.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants