Skip to content

fix(backend): resolve OCRService strict base64 line break rejection and CPU blocking DoS#794

Open
harshitanagpal05 wants to merge 8 commits into
ritesh-1918:gssocfrom
harshitanagpal05:fix/ocr-service-base64
Open

fix(backend): resolve OCRService strict base64 line break rejection and CPU blocking DoS#794
harshitanagpal05 wants to merge 8 commits into
ritesh-1918:gssocfrom
harshitanagpal05:fix/ocr-service-base64

Conversation

@harshitanagpal05
Copy link
Copy Markdown
Contributor

Summary

We have successfully resolved the base64 input validation functional bug and CPU exhaustion performance vulnerability in the OCR service.

Proposed Changes

  • Optimized Regex Lookup: Replaced the synchronous, CPU-blocking character loop in OCRService.extract_text with a compiled C regular expression lookup to allow standard whitespaces and line wraps (\n, \r).
  • MIME Base64 Normalization: Safely strips internal line breaks and whitespace before computing bounds and decoding, resolving compatibility across browser and mobile encoders.
  • Asynchronous Unit Tests Refactoring: Rewrote test_ocr_service.py to be fully asynchronous using pytest-anyio.
  • Pillow Mocking: Patched PIL.Image.open inside unit tests to return a mock Image with custom bounds, preventing format exceptions from failing tests.
    closes bug(backend): strict all() base64 checking in OCRService rejects valid base64 strings with newlines and causes CPU-blocking DoS #793

Verification Results

All 8 async unit tests pass cleanly:

backend\tests\test_ocr_service.py ........                               [100%]
======================== 8 passed in 60.26s (0:01:00) =========================

@vercel
Copy link
Copy Markdown

vercel Bot commented May 30, 2026

@harshitanagpal05 is attempting to deploy a commit to the ritesh Team on Vercel.

A member of the Team first needs to authorize it.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 30, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 11e8766f-3a49-4978-a658-181dfc3983b0

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ritesh-1918 ritesh-1918 added gssoc GirlScript Summer of Code gssoc:approved GSSoC Approved PR level:critical Critical level difficulty quality:exceptional Exceptional code quality type:bug Bug fix labels May 31, 2026
@ritesh-1918
Copy link
Copy Markdown
Owner

Superb implementation, @harshitanagpal05! I've successfully resolved all conflicts in your PR and queued it for merging into gssoc.

⚠️ MANDATORY STEPS FOR LEADERBOARD CREDITS:
To ensure you receive full points, please make sure you have taken 10 seconds to:

Keep up the outstanding work! Let's build together! 🔥

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

gssoc:approved GSSoC Approved PR gssoc GirlScript Summer of Code level:critical Critical level difficulty quality:exceptional Exceptional code quality type:bug Bug fix

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants