Refactor PO generator for robust metadata and improved class discovery by pareshjoshij · Pull Request #3204 · vacanza/holidays

pareshjoshij · 2026-01-09T05:35:47Z

Proposed change

This PR refactors the generate_po_files.py script to significantly improve the robustness and standardization of our localization workflow.

Standardized Metadata: Added POT-Creation-Date, MIME-Version, Report-Msgid-Bugs-To, and other standard gettext headers to ensure better compatibility with translation tools (like Weblate/Crowdin).
License Headers: Added a mechanism to automatically inject license headers from docs/file_header.txt into generated .po files.
Descriptive Comments: The script now extracts docstrings to add helpful comments (e.g., # United States holidays.) to the .po files.
Strict No-Change Policy: Improved logic to minimize "noise" in git diffs by only updating timestamps if actual content has changed.

Testing & Notes:

I have tested this locally by modifying holidays/countries/india.py (adding/changing holidays) and creating dummy blank .po files.
I ran make check locally and all checks passed.
Draft Status: I am opening this as a draft because I am aware this refactor may affect scripts/l10n/l10n_helper.py. I plan to make necessary changes there accordingly.
I would appreciate feedback on whether this approach aligns with the project's coding style.

Type of change

New country/market holidays support (thank you!)
Supported country/market holidays update (calendar discrepancy fix, localization)
Existing code/documentation/test/process quality improvement (best practice, cleanup, refactoring, optimization)
Dependency update (version deprecation/pin/upgrade)
Bugfix (non-breaking change which fixes an issue)
Breaking change (a code change causing existing functionality to break)
New feature (new holidays functionality in general)

Checklist

I've read and followed the contributing guidelines.
I've run make check locally; all checks and tests passed.

coderabbitai · 2026-01-09T05:36:00Z

Caution

Review failed

Failed to post review comments

Summary by CodeRabbit

Chores
- Enhanced translation file generation with standardized metadata and automatic license/header insertion.
- Improved update flow for POT/PO files to merge changes, preserve metadata, and inject descriptive comments per language.
- Strengthened entity discovery, language detection, and error handling to skip missing modules gracefully and provide more reliable localization output.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Walkthrough

Updated scripts/l10n/generate_po_files.py to inject license headers and standardized gettext metadata into generated POT/PO files, enrich entity discovery and processing with class docstrings and default-language info, and change PO update tasks to carry extended 4-tuple payloads for conditional metadata/header updates.

Changes

Cohort / File(s)	Summary
License & Metadata Infrastructure `scripts/l10n/generate_po_files.py`	Added `_get_license_header()` to read/format `docs/file_header.txt` and `_get_standard_metadata(default_language)` to produce standardized POT/PO metadata. Introduced `HEADER_PATH` and `datetime` usage.
Entity Processing Enhancement `scripts/l10n/generate_po_files.py`	`_process_entity_worker()` signature updated to accept/return richer tuples including class docstrings and default language; now populates standard metadata and creates/updates POT/PO with docstring passed downstream.
PO Update Workflow `scripts/l10n/generate_po_files.py`	`_update_po_file()` signature changed to accept 4-tuples `(po_path, pot_path, entity_docstring, default_language)`; merges POT into PO, conditionally injects license/header and descriptive comments, computes dynamic metadata (Language, Project-Id-Version) and saves updates.
Entity Discovery & Task Orchestration `scripts/l10n/generate_po_files.py`	Improved module loading with try/except ImportError, enhanced discovery of HolidayBase subclasses (best match by class name or longest docstring), mapping to `(default_language, path, doc_text)`; `all_po_update_tasks` updated from 2-tuples to 4-tuples and executor.map adjusted accordingly.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Refactor l10n scripts #3063: Refactors the same worker/updater flow and modifies signatures used by this change.
Update .po files generator: refresh "PO-Revision-Date" on file changes #3160: Alters POT→PO merge and metadata handling in generate_po_files.py, overlapping with this PR.
Update .po files generator #2470: Earlier changes to POT/PO generation and metadata/header composition in the same script.

Suggested reviewers

arkid15r
PPsyrius

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	Title accurately describes the main refactoring objective focusing on metadata standardization and improved class discovery logic.
Description check	✅ Passed	Description clearly explains proposed changes, testing approach, draft status rationale, and alignment with issue #3180 requirements.
Linked Issues check	✅ Passed	Changes implement all core requirements from #3180: license header injection, standardized gettext metadata (POT-Creation-Date, MIME-Version, Language detection), and descriptive comments from docstrings.
Out of Scope Changes check	✅ Passed	All changes remain focused on generate_po_files.py refactoring; script improvements for metadata, headers, and discovery are directly within scope of #3180.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Copilot

Pull request overview

This PR refactors the generate_po_files.py script to enhance the localization workflow with standardized metadata, license headers, and improved class discovery. The changes aim to improve compatibility with translation tools and reduce git diff noise.

Key Changes:

Added standard gettext headers (POT-Creation-Date, MIME-Version, Report-Msgid-Bugs-To) to generated PO files
Implemented license header injection from docs/file_header.txt
Enhanced class discovery logic with name-matching and fallback selection based on docstring length
Improved timestamp handling to minimize unnecessary updates when content hasn't changed

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-09T05:40:19Z

scripts/l10n/generate_po_files.py

+            if k not in po_file.metadata:
+                po_file.metadata[k] = v


The metadata update only adds missing fields but never updates existing ones with potentially stale values. When content changes, fields like 'POT-Creation-Date', 'Generated-By', or 'Report-Msgid-Bugs-To' should be updated even if they already exist in the metadata. This could lead to inconsistent or outdated metadata across PO files.

Suggested change

if k not in po_file.metadata:

po_file.metadata[k] = v

po_file.metadata[k] = v

Copilot · 2026-01-09T05:40:19Z

scripts/l10n/generate_po_files.py

+            "MIME-Version": "1.0",
+            "Content-Type": "text/plain; charset=UTF-8",
+            "Content-Transfer-Encoding": "8bit",
+            "Generated-By": "Lingva 5.0.5",


The hardcoded version string "Lingva 5.0.5" in the metadata is a maintainability issue. This value will become outdated if the lingva library is updated. Consider dynamically retrieving the lingva version from the package or removing this field if not essential.

@pareshjoshij Let's remove "Generated-By": "Lingva 5.0.5", line, it should be automatically added by Lingva anyway

Copilot · 2026-01-09T05:40:20Z

scripts/l10n/generate_po_files.py

+            if not content.startswith("#"):
+                new_parts.append("#")
+


The logic for determining whether to add headers is inconsistent. In the no-change path (lines 154-169), both license header and desc_line are checked before adding. In the changed path (lines 193-198), an extra condition checks if content doesn't start with "#". This inconsistency could lead to different header formatting between files that have changed and those that haven't.

Suggested change

if not content.startswith("#"):

new_parts.append("#")

new_parts.append("#")

Copilot · 2026-01-09T05:40:20Z

scripts/l10n/generate_po_files.py

+                try:
+                    mod = importlib.import_module(module)
+                except ImportError:
+                    continue


ImportError is caught but silently ignored without any logging. This could hide genuine import problems and make debugging difficult. Consider adding at least a debug log message when a module fails to import so that issues can be identified during development.

Copilot · 2026-01-09T05:40:20Z

scripts/l10n/generate_po_files.py

+        content = po_path.read_text(encoding="utf-8")
+        new_parts = []
+
+        if license_header and "Authors: Vacanza Team" not in content:


The check 'if "Authors: Vacanza Team" not in content' is used as a proxy to determine whether the license header already exists. This is fragile because it only checks for one specific string within the header. If the header format changes or if this string appears elsewhere in the file, the logic will break. Consider using a more robust marker or checking for the complete header structure.

Copilot · 2026-01-09T05:40:21Z

scripts/l10n/generate_po_files.py

+            "POT-Creation-Date": datetime.now().strftime("%Y-%m-%d %H:%M%z"),
+            "Last-Translator": "Vacanza Team <dr.prodigy.github@gmail.com>",


The timestamp format string '%z' may produce an empty string on some platforms where timezone information is not available. Consider using a fixed timezone or handling the case where timezone offset is not available to ensure consistent output across different environments.

Suggested change

"POT-Creation-Date": datetime.now().strftime("%Y-%m-%d %H:%M%z"),

"Last-Translator": "Vacanza Team <dr.prodigy.github@gmail.com>",

"POT-Creation-Date": datetime.now().astimezone().strftime("%Y-%m-%d %H:%M%z"),

"Last-Translator": "Vacanza Team <dr-prodigy.github@gmail.com>",

I did double check and the bot's astimezone() inclusion is correct - but let's revert to the placeholder name instead

Suggested change

"POT-Creation-Date": datetime.now().strftime("%Y-%m-%d %H:%M%z"),

"Last-Translator": "Vacanza Team <dr.prodigy.github@gmail.com>",

"POT-Creation-Date": datetime.now().astimezone().strftime("%Y-%m-%d %H:%M%z"),

"Last-Translator": "FULL NAME <EMAIL@ADDRESS>",

Alternatively, since we're doing this, may as well make the example email RFC 2606-compliant

Suggested change

"POT-Creation-Date": datetime.now().strftime("%Y-%m-%d %H:%M%z"),

"Last-Translator": "Vacanza Team <dr.prodigy.github@gmail.com>",

"POT-Creation-Date": datetime.now().astimezone().strftime("%Y-%m-%d %H:%M%z"),

"Last-Translator": "FULL NAME <EMAIL@EXAMPLE.COM>",

Copilot · 2026-01-09T05:40:21Z

scripts/l10n/generate_po_files.py

+                            po_path.write_text(final_content, encoding="utf-8")
+            return
+
+        timestamp = datetime.now().strftime("%Y-%m-%d %H:%M%z")


The same timestamp format issue exists here. The '%z' format specifier may produce an empty string on some platforms where timezone information is not available. This could lead to inconsistent or invalid POT-Creation-Date values.

Copilot · 2026-01-09T05:40:21Z

scripts/l10n/generate_po_files.py


-        all_po_update_tasks: list[tuple[str, str]] = []
+                if not chosen_cls:
+                    candidates.sort(key=lambda c: len(c.__doc__ or ""), reverse=True)


The fallback logic for choosing the "best" class is flawed. Sorting by docstring length may select a less relevant class over the primary one. For example, if a file contains a main holiday class with a short docstring and helper classes with longer docstrings, the wrong class could be selected. Consider using additional heuristics such as class inheritance depth or checking if the class directly implements certain methods.

Suggested change

candidates.sort(key=lambda c: len(c.__doc__ or ""), reverse=True)

def _class_selection_key(cls) -> tuple[int, int]:

"""

Heuristic for selecting the most relevant HolidayBase subclass

when multiple candidates exist in a module.

Higher score is better; we invert it for use in sort().

"""

score = 0

# Prefer classes that implement the core population logic themselves.

if "_populate" in cls.__dict__:

score += 2

# Prefer classes that directly declare identifying attributes.

if "country" in cls.__dict__ or "market" in cls.__dict__:

score += 1

doc_len = len(cls.__doc__ or "")

# sort() is ascending, so use negatives to put best candidates first.

return (-score, -doc_len)

candidates.sort(key=_class_selection_key)

Copilot · 2026-01-09T05:40:22Z

scripts/l10n/generate_po_files.py

+                target_name = path.stem.replace("_", "").lower()
+
+                for cls in candidates:
+                    if cls.__name__.lower() == target_name:


The class name matching algorithm is case-insensitive and removes underscores from the filename, but doesn't account for potential edge cases. For a file named "united_states.py", target_name becomes "unitedstates", but if the class is named "UnitedStates", this will match. However, if there are naming variations or special characters, this logic may fail. Consider more robust matching strategies.

Copilot · 2026-01-09T05:40:22Z

scripts/l10n/generate_po_files.py

+            first_line = entity_docstring.strip().split("\n")[0].strip().rstrip(".")
+            if first_line.endswith(" holidays"):
+                clean_name = first_line[:-9].strip()
+            else:
+                clean_name = first_line


The description line generation has a specific hardcoded pattern where it strips " holidays" from the end of the first line (line 143). This assumes a specific docstring format. If the docstring doesn't follow this format exactly, the logic may produce unexpected results. Consider documenting this expectation or making the parsing more flexible.

Suggested change

first_line = entity_docstring.strip().split("\n")[0].strip().rstrip(".")

if first_line.endswith(" holidays"):

clean_name = first_line[:-9].strip()

else:

clean_name = first_line

# Use the first non-empty line of the docstring as a short description.

# Commonly, entity docstrings follow the "<entity> holidays." pattern.

# In that case we strip the trailing "holidays"/"holiday" keyword here

# to avoid duplicating it when building `desc_line` below. If the

# docstring uses a different format, we fall back to the first

# sentence unchanged.

first_line = entity_docstring.strip().split("\n", 1)[0].strip()

# Only consider the first sentence to keep the description concise.

first_sentence = first_line.split(".", 1)[0].strip()

lowered = first_sentence.lower()

if lowered.endswith(" holidays"):

clean_name = first_sentence[: -len(" holidays")].strip()

elif lowered.endswith(" holiday"):

clean_name = first_sentence[: -len(" holiday")].strip()

else:

clean_name = first_sentence

sonarqubecloud · 2026-01-09T16:49:45Z

Quality Gate passed

Issues
2 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

pareshjoshij · 2026-01-10T04:30:31Z

@PPsyrius , @KJhellico

I see the automated review suggestions (regarding robustness, timestamps, etc.) and will definitely address them in the next revision.

However, before I apply those fixes and refactor scripts/l10n/l10n_helper.py, could you confirm you are happy with this overall approach for headers and metadata? I want to ensure the strategy is sound before doing the final integration.

KJhellico · 2026-01-10T20:00:08Z

The code looks overcomplicated.

PPsyrius

I've include most of the formatting fixes, but I haven't figured out how this PR accidentally adds l10n location tracker back in again after we removed them a few years back i.e.

#. Monday following %s.
#: ./holidays/countries/spain.py:105
#, c-format
msgid "Lunes siguiente a %s"
msgstr ""

#. New Year's Day.
#: ./holidays/countries/spain.py:172 ./holidays/countries/spain.py:215
#: ./holidays/countries/spain.py:249 ./holidays/countries/spain.py:283
#: ./holidays/countries/spain.py:371 ./holidays/countries/spain.py:426
#: ./holidays/countries/spain.py:564 ./holidays/countries/spain.py:671
#: ./holidays/countries/spain.py:754
msgid "Año Nuevo"
msgstr ""

PPsyrius · 2026-01-19T05:57:41Z

scripts/l10n/generate_po_files.py

+            "MIME-Version": "1.0",
+            "Content-Type": "text/plain; charset=UTF-8",
+            "Content-Transfer-Encoding": "8bit",
+            "Generated-By": "Lingva 5.0.5",


@pareshjoshij Let's remove "Generated-By": "Lingva 5.0.5", line, it should be automatically added by Lingva anyway

PPsyrius · 2026-01-19T05:58:26Z

scripts/l10n/generate_po_files.py

+    def _get_standard_metadata(default_language: str = "en_US") -> dict:
+        """Returns the standard metadata required for gettext."""
+        return {
+            "Report-Msgid-Bugs-To": "dr-prodigy@users.noreply.github.com",


Suggested change

"Report-Msgid-Bugs-To": "dr-prodigy@users.noreply.github.com",

This wasn't included in any existing l10n files AFAIK, let's remove them for now

TODO: Reminder for ME
"Report-Msgid-Bugs-To: l10n@vacanza.dev\n"

PPsyrius · 2026-01-19T06:07:40Z

scripts/l10n/generate_po_files.py

+            "POT-Creation-Date": datetime.now().strftime("%Y-%m-%d %H:%M%z"),
+            "Last-Translator": "Vacanza Team <dr.prodigy.github@gmail.com>",


I did double check and the bot's astimezone() inclusion is correct - but let's revert to the placeholder name instead

Suggested change

"POT-Creation-Date": datetime.now().strftime("%Y-%m-%d %H:%M%z"),

"Last-Translator": "Vacanza Team <dr.prodigy.github@gmail.com>",

"POT-Creation-Date": datetime.now().astimezone().strftime("%Y-%m-%d %H:%M%z"),

"Last-Translator": "FULL NAME <EMAIL@ADDRESS>",

Alternatively, since we're doing this, may as well make the example email RFC 2606-compliant

Suggested change

"POT-Creation-Date": datetime.now().strftime("%Y-%m-%d %H:%M%z"),

"Last-Translator": "Vacanza Team <dr.prodigy.github@gmail.com>",

"POT-Creation-Date": datetime.now().astimezone().strftime("%Y-%m-%d %H:%M%z"),

"Last-Translator": "FULL NAME <EMAIL@EXAMPLE.COM>",

PPsyrius · 2026-01-19T06:37:35Z

scripts/l10n/generate_po_files.py

+        content = HEADER_PATH.read_text(encoding="utf-8").strip()
+        if not content:
+            return ""
+
+        lines = []
+        for line in content.splitlines():
+            line = line.rstrip()
+            if not line:
+                lines.append("#")
+            elif line.startswith("#"):
+                lines.append(line)
+            else:
+                lines.append(f"# {line}")
+
+        return "\n".join(lines) + "\n"


Fix the first line not getting 2 space & simplify this up a bit

Suggested change

content = HEADER_PATH.read_text(encoding="utf-8").strip()

if not content:

return ""

lines = []

for line in content.splitlines():

line = line.rstrip()

if not line:

lines.append("#")

elif line.startswith("#"):

lines.append(line)

else:

lines.append(f"# {line}")

return "\n".join(lines) + "\n"

content = HEADER_PATH.read_text(encoding="utf-8").rstrip("\n")

if not content:

return ""

return "\n".join(

"#" if not line.rstrip() else f"# {line.rstrip()}"

for line in content.splitlines()

) + "\n"

PPsyrius · 2026-01-19T07:46:23Z

scripts/l10n/generate_po_files.py

+        if not has_content_changed:
+            if po_path.exists():
+                content = po_path.read_text(encoding="utf-8")
+                if "Authors: Vacanza Team" not in content:
+                    new_parts = []
+                    if license_header:
+                        new_parts.append(license_header)
+                    if desc_line and desc_line not in content:
+                        new_parts.append(desc_line)
+
+                    if new_parts:
+                        new_parts.append("#")
+                        final_content = "\n".join(new_parts) + "\n" + content
+                        if final_content.strip() != content.strip():
+                            po_path.write_text(final_content, encoding="utf-8")
+            return


Suggested change

if not has_content_changed:

if po_path.exists():

content = po_path.read_text(encoding="utf-8")

if "Authors: Vacanza Team" not in content:

new_parts = []

if license_header:

new_parts.append(license_header)

if desc_line and desc_line not in content:

new_parts.append(desc_line)

if new_parts:

new_parts.append("#")

final_content = "\n".join(new_parts) + "\n" + content

if final_content.strip() != content.strip():

po_path.write_text(final_content, encoding="utf-8")

return

if not has_content_changed:

if po_path.exists():

content = po_path.read_text(encoding="utf-8")

content = POGenerator._strip_gettext_boilerplate(content)

if "Authors: Vacanza Team" not in content:

new_parts = []

if license_header:

new_parts.extend(license_header.rstrip("\n").splitlines())

new_parts.append("#")

if desc_line and desc_line not in content:

new_parts.append(desc_line)

if new_parts:

new_parts.append("#")

final_content = "\n".join(new_parts) + "\n" + content

if final_content.strip() != content.strip():

po_path.write_text(final_content, encoding="utf-8")

return

This and Lingva's boiler plate stripper:

@staticmethod def _strip_gettext_boilerplate(content: str) -> str: if content.startswith("# SOME DESCRIPTIVE TITLE"): return content.split("#, fuzzy", 1)[1].lstrip() return content.lstrip()

PPsyrius · 2026-01-19T09:16:47Z

scripts/l10n/generate_po_files.py

@@ -58,50 +95,107 @@ def _process_entity_worker(
            allow_empty=True,


This should disable l10n location inclusion in the .po file

Suggested change

allow_empty=True,

location=False,

allow_empty=True,

With this (and my other comments), it should now at least work as a proof-of-concept, though I can get it to generate for non-default language yet

PPsyrius · 2026-01-19T09:45:12Z

Here's the updated table of mismatching country name from new implementation (read from individual country\market.py file's docstring) vs the existing ones, IMO let's use the ones with ✅ but let's wait for other maintainer's comment as well

Current	New
Isle of Man ✅	Isle Of Man
National Stock Exchange of India✅	National Stock Exchange of India (NSE)
The Philippines	Philippines✅
Northern Mariana Islands ✅	Northern Mariana Islands (the)
United States of America ✅	United States of America (the)
United States Virgin Islands ✅	United States Virgin Islands (the)
Timor-Leste ✅	Timor Leste
Chinese	China ✅*

*Maybe People's Republic of China for clarity's sake?

KJhellico · 2026-01-19T09:49:49Z

will definitely address them in the next revision

I will wait for the next revision for my review. :)

pareshjoshij · 2026-01-19T10:40:03Z

Here's the updated table of mismatching country name from new implementation (read from individual country\market.py file's docstring) vs the existing ones, IMO let's use the ones with ✅ but let's wait for other maintainer's comment as well

Current New
Isle of Man ✅ Isle Of Man
National Stock Exchange of India✅ National Stock Exchange of India (NSE)
The Philippines Philippines✅
Northern Mariana Islands ✅ Northern Mariana Islands (the)
United States of America ✅ United States of America (the)
United States Virgin Islands ✅ United States Virgin Islands (the)
Timor-Leste ✅ Timor Leste
Chinese China ✅*
*Maybe People's Republic of China for clarity's sake?

Would it be acceptable if I simply updated the docstrings in the actual country/market files to match the preferred names (the ones with ✅)?

pareshjoshij · 2026-01-19T10:45:41Z

@PPsyrius Thank you so much for the incredibly detailed review! ❤️ I really respect the time and effort you put into guiding me through this. Your suggestions gave me exactly the path I was looking for to make this easier and less complex.

I am currently focused on my WOC task in this repo, so I might not update this right this second. However, I will definitely carve out some time to address all your suggestions very soon. Thanks again! 🚀

pareshjoshij added 4 commits January 7, 2026 12:02

Initial commit: Add PO generation script

1cb2b76

Initial commit: Add PO generation script

b9fc216

Initial commit: Add PO generation script

3c4fccc

refactor: upgrade PO generator with robust metadata and headers

71054ed

Copilot AI review requested due to automatic review settings January 9, 2026 05:35

pareshjoshij requested review from KJhellico, PPsyrius and arkid15r as code owners January 9, 2026 05:35

github-actions bot added the script label Jan 9, 2026

Copilot started reviewing on behalf of pareshjoshij January 9, 2026 05:36 View session

Copilot AI reviewed Jan 9, 2026

View reviewed changes

pareshjoshij marked this pull request as draft January 9, 2026 05:43

Merge branch 'dev' into automate-headers

68e148b

coderabbitai bot approved these changes Jan 19, 2026

View reviewed changes

PPsyrius reviewed Jan 19, 2026

View reviewed changes

	if k not in po_file.metadata:
	po_file.metadata[k] = v
	po_file.metadata[k] = v

	if not content.startswith("#"):
	new_parts.append("#")
	new_parts.append("#")

		"POT-Creation-Date": datetime.now().strftime("%Y-%m-%d %H:%M%z"),
		"Last-Translator": "Vacanza Team <dr.prodigy.github@gmail.com>",

-                    candidates.sort(key=lambda c: len(c.__doc__ or ""), reverse=True)
+                    def _class_selection_key(cls) -> tuple[int, int]:
+                        """
+                        Heuristic for selecting the most relevant HolidayBase subclass
+                        when multiple candidates exist in a module.
+                        Higher score is better; we invert it for use in sort().
+                        """
+                        score = 0
+                        # Prefer classes that implement the core population logic themselves.
+                        if "_populate" in cls.__dict__:
+                            score += 2
+                        # Prefer classes that directly declare identifying attributes.
+                        if "country" in cls.__dict__ or "market" in cls.__dict__:
+                            score += 1
+                        doc_len = len(cls.__doc__ or "")
+                        # sort() is ascending, so use negatives to put best candidates first.
+                        return (-score, -doc_len)
+                    candidates.sort(key=_class_selection_key)

-            first_line = entity_docstring.strip().split("\n")[0].strip().rstrip(".")
-            if first_line.endswith(" holidays"):
-                clean_name = first_line[:-9].strip()
-            else:
-                clean_name = first_line
+            # Use the first non-empty line of the docstring as a short description.
+            # Commonly, entity docstrings follow the "<entity> holidays." pattern.
+            # In that case we strip the trailing "holidays"/"holiday" keyword here
+            # to avoid duplicating it when building `desc_line` below. If the
+            # docstring uses a different format, we fall back to the first
+            # sentence unchanged.
+            first_line = entity_docstring.strip().split("\n", 1)[0].strip()
+            # Only consider the first sentence to keep the description concise.
+            first_sentence = first_line.split(".", 1)[0].strip()
+            lowered = first_sentence.lower()
+            if lowered.endswith(" holidays"):
+                clean_name = first_sentence[: -len(" holidays")].strip()
+            elif lowered.endswith(" holiday"):
+                clean_name = first_sentence[: -len(" holiday")].strip()
+            else:
+                clean_name = first_sentence

		@@ -58,50 +95,107 @@ def _process_entity_worker(
		allow_empty=True,

Uh oh!

Conversation

pareshjoshij commented Jan 9, 2026

Proposed change

Type of change

Checklist

Uh oh!

coderabbitai bot commented Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Summary by CodeRabbit

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

PPsyrius Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

PPsyrius Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud bot commented Jan 9, 2026

Quality Gate passed

Uh oh!

pareshjoshij commented Jan 10, 2026

Uh oh!

KJhellico commented Jan 10, 2026

Uh oh!

PPsyrius left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

PPsyrius Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

PPsyrius Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

pareshjoshij Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

PPsyrius Jan 19, 2026

coderabbitai bot commented Jan 9, 2026 •

edited

Loading

PPsyrius Jan 19, 2026 •

edited

Loading

PPsyrius left a comment •

edited

Loading

PPsyrius Jan 19, 2026 •

edited

Loading

PPsyrius Jan 19, 2026 •

edited

Loading

PPsyrius Jan 19, 2026 •

edited

Loading