Skip to content

Conversation

@abarone-btf
Copy link

@abarone-btf abarone-btf commented Dec 3, 2025

The Issue

Some valid, real-world Mexican RFCs are currently failing validation with an InvalidChecksum error.

Hypothesis for the cause

The library strictly enforces the official SAT algorithm, which uses an extended alphabet containing & and Ñ. However, historically, many legacy systems generated RFC check digits using a standard Base36 alphabet (0-9, A-Z) without these special characters. This results in a mathematical mismatch for valid IDs.

Solution

I implemented a fallback strategy for the checksum calculation:

Primary Check: Attempt validation using the Official SAT Alphabet (Strict).

Fallback Check: If the primary check fails, retry using the Legacy Alphabet (Base36).

Test Case

Added a unit test with SOTO800101110, which is mathematically invalid under the Strict rule but valid under the Legacy rule (and passes the regex structure check).

@koblas 🙏

@gemini-code-assist
Copy link

Summary of Changes

Hello @abarone-btf, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves an issue where certain valid Mexican RFCs were failing validation due to differences in the checksum calculation alphabet. The changes introduce a dual-validation approach, first attempting to validate RFCs using the official SAT alphabet, and then falling back to a legacy Base36 alphabet if the initial check fails. This ensures that both modern and historically generated RFCs are correctly recognized as valid.

Highlights

  • Legacy RFC Validation: Implemented a fallback mechanism for Mexican RFC checksum validation to correctly process RFCs generated using older, legacy systems.
  • New Alphabet Definition: Introduced a checkAlphabetLegacy constant representing the Base36 alphabet, which excludes special characters like '&' and 'Ñ'.
  • Checksum Calculation Refactoring: Encapsulated the checksum calculation logic into a reusable calculateChecksum helper function to support validation with different alphabets.
  • New Test Case: Added a unit test for SOTO800101110, a specific RFC that is valid under the legacy rule but would fail under the strict official SAT rule, ensuring the new logic works as intended.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a fallback mechanism for RFC checksum validation to support legacy RFCs generated with a Base36 alphabet. The implementation is clean, using a helper function to avoid code duplication. I've added one suggestion to refactor the new helper function into a pure function, which will improve its modularity and testability. Otherwise, the changes look good and the new test case is a valuable addition.

it('validate:SOTO800101110', () => {
const result = validate('SOTO800101110');

expect(result.isValid && result.compact).toEqual('SOTO800101110');

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This combined assertion result.isValid && result.compact can lead to confusing test failure messages. If result.isValid is false, the expression evaluates to false, and the test fails with expected false to equal 'SOTO800101110', which doesn't clearly indicate the root cause. Asserting on an object containing the properties you want to check provides much clearer output on failure, making tests easier to debug.

Suggested change
expect(result.isValid && result.compact).toEqual('SOTO800101110');
expect({ isValid: result.isValid, compact: result.compact }).toEqual({ isValid: true, compact: 'SOTO800101110' });

Comment on lines +156 to +176
const calculateChecksum = (alphabet: string) => {
const sum = weightedSum(paddedInput, {
modulus: 11,
alphabet: alphabet,
weights: [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14],
reverse: true,
});

const mod = 11 - (sum % 11);
if (mod === 11) return '0';
if (mod === 10) return 'A';
return String(mod);
};

// Try with official SAT alphabet first
const valOfficial = calculateChecksum(checkAlphabet);

if (check !== valOfficial) {
// If it fails, try with Legacy alphabet (Base 36)
// This handles older RFCs generated without '&' or 'Ñ' support
const valLegacy = calculateChecksum(checkAlphabetLegacy);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The calculateChecksum function is a great improvement for reducing code duplication. To make it a pure function, which improves readability and prepares it for potential extraction into a standalone testable utility, consider passing paddedInput as an argument instead of capturing it from the outer scope.

      const calculateChecksum = (input: string, alphabet: string) => {
        const sum = weightedSum(input, {
          modulus: 11,
          alphabet: alphabet,
          weights: [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14],
          reverse: true,
        });

        const mod = 11 - (sum % 11);
        if (mod === 11) return '0';
        if (mod === 10) return 'A';
        return String(mod);
      };

      // Try with official SAT alphabet first
      const valOfficial = calculateChecksum(paddedInput, checkAlphabet);

      if (check !== valOfficial) {
        // If it fails, try with Legacy alphabet (Base 36)
        // This handles older RFCs generated without '&' or 'Ñ' support
        const valLegacy = calculateChecksum(paddedInput, checkAlphabetLegacy);

@abarone-btf
Copy link
Author

Hi! @koblas When you get a chance, could you give this PR a quick look? 🙏
Sorry for the inconvenience, really appreciate the help!

'RUIN',
]);

// Official alphabet per SAT (Anexo 20). Includes '&' and 'Ñ'.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After thinking about this, unless there is a really good reason there should be two distinct validators for efc and rfcLegacy or similar names.

Copy link
Author

@abarone-btf abarone-btf Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the reply! @koblas . But making all the consumers knows which one to call?, isnt that more invasive? (because they are "just" rfc). Like:

1 - Consumer would need to apply a logic to know if the rfc is "legacy" or not, then call the correct function

or

2 - Consumer would call rfc, and if it fails then call rfcLegacy to check again (not as ugly imo, but more logic to the consumer)

Open to the conversation, whatever you preferred I can make the change to fit that 🙌

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants