-
Notifications
You must be signed in to change notification settings - Fork 30
[Fix] Update RFC validation to include legacy alphabet #143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Summary of ChangesHello @abarone-btf, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request resolves an issue where certain valid Mexican RFCs were failing validation due to differences in the checksum calculation alphabet. The changes introduce a dual-validation approach, first attempting to validate RFCs using the official SAT alphabet, and then falling back to a legacy Base36 alphabet if the initial check fails. This ensures that both modern and historically generated RFCs are correctly recognized as valid. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a fallback mechanism for RFC checksum validation to support legacy RFCs generated with a Base36 alphabet. The implementation is clean, using a helper function to avoid code duplication. I've added one suggestion to refactor the new helper function into a pure function, which will improve its modularity and testability. Otherwise, the changes look good and the new test case is a valuable addition.
| it('validate:SOTO800101110', () => { | ||
| const result = validate('SOTO800101110'); | ||
|
|
||
| expect(result.isValid && result.compact).toEqual('SOTO800101110'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This combined assertion result.isValid && result.compact can lead to confusing test failure messages. If result.isValid is false, the expression evaluates to false, and the test fails with expected false to equal 'SOTO800101110', which doesn't clearly indicate the root cause. Asserting on an object containing the properties you want to check provides much clearer output on failure, making tests easier to debug.
| expect(result.isValid && result.compact).toEqual('SOTO800101110'); | |
| expect({ isValid: result.isValid, compact: result.compact }).toEqual({ isValid: true, compact: 'SOTO800101110' }); |
| const calculateChecksum = (alphabet: string) => { | ||
| const sum = weightedSum(paddedInput, { | ||
| modulus: 11, | ||
| alphabet: alphabet, | ||
| weights: [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14], | ||
| reverse: true, | ||
| }); | ||
|
|
||
| const mod = 11 - (sum % 11); | ||
| if (mod === 11) return '0'; | ||
| if (mod === 10) return 'A'; | ||
| return String(mod); | ||
| }; | ||
|
|
||
| // Try with official SAT alphabet first | ||
| const valOfficial = calculateChecksum(checkAlphabet); | ||
|
|
||
| if (check !== valOfficial) { | ||
| // If it fails, try with Legacy alphabet (Base 36) | ||
| // This handles older RFCs generated without '&' or 'Ñ' support | ||
| const valLegacy = calculateChecksum(checkAlphabetLegacy); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The calculateChecksum function is a great improvement for reducing code duplication. To make it a pure function, which improves readability and prepares it for potential extraction into a standalone testable utility, consider passing paddedInput as an argument instead of capturing it from the outer scope.
const calculateChecksum = (input: string, alphabet: string) => {
const sum = weightedSum(input, {
modulus: 11,
alphabet: alphabet,
weights: [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14],
reverse: true,
});
const mod = 11 - (sum % 11);
if (mod === 11) return '0';
if (mod === 10) return 'A';
return String(mod);
};
// Try with official SAT alphabet first
const valOfficial = calculateChecksum(paddedInput, checkAlphabet);
if (check !== valOfficial) {
// If it fails, try with Legacy alphabet (Base 36)
// This handles older RFCs generated without '&' or 'Ñ' support
const valLegacy = calculateChecksum(paddedInput, checkAlphabetLegacy);|
Hi! @koblas When you get a chance, could you give this PR a quick look? 🙏 |
| 'RUIN', | ||
| ]); | ||
|
|
||
| // Official alphabet per SAT (Anexo 20). Includes '&' and 'Ñ'. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After thinking about this, unless there is a really good reason there should be two distinct validators for efc and rfcLegacy or similar names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the reply! @koblas . But making all the consumers knows which one to call?, isnt that more invasive? (because they are "just" rfc). Like:
1 - Consumer would need to apply a logic to know if the rfc is "legacy" or not, then call the correct function
or
2 - Consumer would call rfc, and if it fails then call rfcLegacy to check again (not as ugly imo, but more logic to the consumer)
Open to the conversation, whatever you preferred I can make the change to fit that 🙌
The Issue
Some valid, real-world Mexican RFCs are currently failing validation with an InvalidChecksum error.
Hypothesis for the cause
The library strictly enforces the official SAT algorithm, which uses an extended alphabet containing & and Ñ. However, historically, many legacy systems generated RFC check digits using a standard Base36 alphabet (0-9, A-Z) without these special characters. This results in a mathematical mismatch for valid IDs.
Solution
I implemented a fallback strategy for the checksum calculation:
Primary Check: Attempt validation using the Official SAT Alphabet (Strict).
Fallback Check: If the primary check fails, retry using the Legacy Alphabet (Base36).
Test Case
Added a unit test with
SOTO800101110, which is mathematically invalid under the Strict rule but valid under the Legacy rule (and passes the regex structure check).@koblas 🙏