-
-
Notifications
You must be signed in to change notification settings - Fork 608
Refactor PO generator for robust metadata and improved class discovery #3204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Changes from all commits
1cb2b76
b9fc216
3c4fccc
71054ed
68e148b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -1,6 +1,5 @@ | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| #!/usr/bin/env python3 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| # holidays | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| # -------- | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| # A fast, efficient Python library for generating country, province and state | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
@@ -17,37 +16,75 @@ | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| import inspect | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| import sys | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| from concurrent.futures import ProcessPoolExecutor | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| from datetime import datetime | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| from pathlib import Path | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| from time import perf_counter | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| from lingva.extract import extract as create_pot_file | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| from lingva.extract import _location_sort_key | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| from polib import pofile | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| sys.path.insert(0, str(Path.cwd())) # Make holidays visible. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| sys.path.insert(0, str(Path.cwd())) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| from holidays import __version__ as package_version | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| from holidays.holiday_base import HolidayBase | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| WRAP_WIDTH = 99 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HEADER_PATH = Path("docs/file_header.txt") | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| class POGenerator: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| """Generates .po files for supported country/market entities.""" | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| @staticmethod | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| def _get_license_header() -> str: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| """Reads and formats the license header from docs/file_header.txt.""" | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| if not HEADER_PATH.exists(): | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| return "" | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| content = HEADER_PATH.read_text(encoding="utf-8").strip() | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| if not content: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| return "" | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| lines = [] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| for line in content.splitlines(): | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| line = line.rstrip() | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| if not line: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| lines.append("#") | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| elif line.startswith("#"): | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| lines.append(line) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| else: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| lines.append(f"# {line}") | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| return "\n".join(lines) + "\n" | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| @staticmethod | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| def _get_standard_metadata(default_language: str = "en_US") -> dict: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| """Returns the standard metadata required for gettext.""" | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| return { | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| "Report-Msgid-Bugs-To": "dr-prodigy@users.noreply.github.com", | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
This wasn't included in any existing l10n files AFAIK, let's remove them for now
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. TODO: Reminder for ME |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| "POT-Creation-Date": datetime.now().strftime("%Y-%m-%d %H:%M%z"), | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| "Last-Translator": "Vacanza Team <dr.prodigy.github@gmail.com>", | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Comment on lines
+65
to
+66
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| "POT-Creation-Date": datetime.now().strftime("%Y-%m-%d %H:%M%z"), | |
| "Last-Translator": "Vacanza Team <dr.prodigy.github@gmail.com>", | |
| "POT-Creation-Date": datetime.now().astimezone().strftime("%Y-%m-%d %H:%M%z"), | |
| "Last-Translator": "Vacanza Team <dr-prodigy.github@gmail.com>", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did double check and the bot's astimezone() inclusion is correct - but let's revert to the placeholder name instead
| "POT-Creation-Date": datetime.now().strftime("%Y-%m-%d %H:%M%z"), | |
| "Last-Translator": "Vacanza Team <dr.prodigy.github@gmail.com>", | |
| "POT-Creation-Date": datetime.now().astimezone().strftime("%Y-%m-%d %H:%M%z"), | |
| "Last-Translator": "FULL NAME <EMAIL@ADDRESS>", |
Alternatively, since we're doing this, may as well make the example email RFC 2606-compliant
| "POT-Creation-Date": datetime.now().strftime("%Y-%m-%d %H:%M%z"), | |
| "Last-Translator": "Vacanza Team <dr.prodigy.github@gmail.com>", | |
| "POT-Creation-Date": datetime.now().astimezone().strftime("%Y-%m-%d %H:%M%z"), | |
| "Last-Translator": "FULL NAME <EMAIL@EXAMPLE.COM>", |
Copilot
AI
Jan 9, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hardcoded version string "Lingva 5.0.5" in the metadata is a maintainability issue. This value will become outdated if the lingva library is updated. Consider dynamically retrieving the lingva version from the package or removing this field if not essential.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pareshjoshij Let's remove "Generated-By": "Lingva 5.0.5", line, it should be automatically added by Lingva anyway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should disable l10n location inclusion in the .po file
| allow_empty=True, | |
| location=False, | |
| allow_empty=True, |
With this (and my other comments), it should now at least work as a proof-of-concept, though I can get it to generate for non-default language yet
Copilot
AI
Jan 9, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The description line generation has a specific hardcoded pattern where it strips " holidays" from the end of the first line (line 143). This assumes a specific docstring format. If the docstring doesn't follow this format exactly, the logic may produce unexpected results. Consider documenting this expectation or making the parsing more flexible.
| first_line = entity_docstring.strip().split("\n")[0].strip().rstrip(".") | |
| if first_line.endswith(" holidays"): | |
| clean_name = first_line[:-9].strip() | |
| else: | |
| clean_name = first_line | |
| # Use the first non-empty line of the docstring as a short description. | |
| # Commonly, entity docstrings follow the "<entity> holidays." pattern. | |
| # In that case we strip the trailing "holidays"/"holiday" keyword here | |
| # to avoid duplicating it when building `desc_line` below. If the | |
| # docstring uses a different format, we fall back to the first | |
| # sentence unchanged. | |
| first_line = entity_docstring.strip().split("\n", 1)[0].strip() | |
| # Only consider the first sentence to keep the description concise. | |
| first_sentence = first_line.split(".", 1)[0].strip() | |
| lowered = first_sentence.lower() | |
| if lowered.endswith(" holidays"): | |
| clean_name = first_sentence[: -len(" holidays")].strip() | |
| elif lowered.endswith(" holiday"): | |
| clean_name = first_sentence[: -len(" holiday")].strip() | |
| else: | |
| clean_name = first_sentence |
PPsyrius marked this conversation as resolved.
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if not has_content_changed: | |
| if po_path.exists(): | |
| content = po_path.read_text(encoding="utf-8") | |
| if "Authors: Vacanza Team" not in content: | |
| new_parts = [] | |
| if license_header: | |
| new_parts.append(license_header) | |
| if desc_line and desc_line not in content: | |
| new_parts.append(desc_line) | |
| if new_parts: | |
| new_parts.append("#") | |
| final_content = "\n".join(new_parts) + "\n" + content | |
| if final_content.strip() != content.strip(): | |
| po_path.write_text(final_content, encoding="utf-8") | |
| return | |
| if not has_content_changed: | |
| if po_path.exists(): | |
| content = po_path.read_text(encoding="utf-8") | |
| content = POGenerator._strip_gettext_boilerplate(content) | |
| if "Authors: Vacanza Team" not in content: | |
| new_parts = [] | |
| if license_header: | |
| new_parts.extend(license_header.rstrip("\n").splitlines()) | |
| new_parts.append("#") | |
| if desc_line and desc_line not in content: | |
| new_parts.append(desc_line) | |
| if new_parts: | |
| new_parts.append("#") | |
| final_content = "\n".join(new_parts) + "\n" + content | |
| if final_content.strip() != content.strip(): | |
| po_path.write_text(final_content, encoding="utf-8") | |
| return |
This and Lingva's boiler plate stripper:
@staticmethod
def _strip_gettext_boilerplate(content: str) -> str:
if content.startswith("# SOME DESCRIPTIVE TITLE"):
return content.split("#, fuzzy", 1)[1].lstrip()
return content.lstrip()
Copilot
AI
Jan 9, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same timestamp format issue exists here. The '%z' format specifier may produce an empty string on some platforms where timezone information is not available. This could lead to inconsistent or invalid POT-Creation-Date values.
Copilot
AI
Jan 9, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The metadata update only adds missing fields but never updates existing ones with potentially stale values. When content changes, fields like 'POT-Creation-Date', 'Generated-By', or 'Report-Msgid-Bugs-To' should be updated even if they already exist in the metadata. This could lead to inconsistent or outdated metadata across PO files.
| if k not in po_file.metadata: | |
| po_file.metadata[k] = v | |
| po_file.metadata[k] = v |
Copilot
AI
Jan 9, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check 'if "Authors: Vacanza Team" not in content' is used as a proxy to determine whether the license header already exists. This is fragile because it only checks for one specific string within the header. If the header format changes or if this string appears elsewhere in the file, the logic will break. Consider using a more robust marker or checking for the complete header structure.
Copilot
AI
Jan 9, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic for determining whether to add headers is inconsistent. In the no-change path (lines 154-169), both license header and desc_line are checked before adding. In the changed path (lines 193-198), an extra condition checks if content doesn't start with "#". This inconsistency could lead to different header formatting between files that have changed and those that haven't.
| if not content.startswith("#"): | |
| new_parts.append("#") | |
| new_parts.append("#") |
Copilot
AI
Jan 9, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ImportError is caught but silently ignored without any logging. This could hide genuine import problems and make debugging difficult. Consider adding at least a debug log message when a module fails to import so that issues can be identified during development.
Copilot
AI
Jan 9, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The class name matching algorithm is case-insensitive and removes underscores from the filename, but doesn't account for potential edge cases. For a file named "united_states.py", target_name becomes "unitedstates", but if the class is named "UnitedStates", this will match. However, if there are naming variations or special characters, this logic may fail. Consider more robust matching strategies.
Copilot
AI
Jan 9, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fallback logic for choosing the "best" class is flawed. Sorting by docstring length may select a less relevant class over the primary one. For example, if a file contains a main holiday class with a short docstring and helper classes with longer docstrings, the wrong class could be selected. Consider using additional heuristics such as class inheritance depth or checking if the class directly implements certain methods.
| candidates.sort(key=lambda c: len(c.__doc__ or ""), reverse=True) | |
| def _class_selection_key(cls) -> tuple[int, int]: | |
| """ | |
| Heuristic for selecting the most relevant HolidayBase subclass | |
| when multiple candidates exist in a module. | |
| Higher score is better; we invert it for use in sort(). | |
| """ | |
| score = 0 | |
| # Prefer classes that implement the core population logic themselves. | |
| if "_populate" in cls.__dict__: | |
| score += 2 | |
| # Prefer classes that directly declare identifying attributes. | |
| if "country" in cls.__dict__ or "market" in cls.__dict__: | |
| score += 1 | |
| doc_len = len(cls.__doc__ or "") | |
| # sort() is ascending, so use negatives to put best candidates first. | |
| return (-score, -doc_len) | |
| candidates.sort(key=_class_selection_key) |
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix the first line not getting 2 space & simplify this up a bit
