Skip to content

feat: implement atomic writes, locking, and PII redaction for AI corrections (closes #368)#853

Open
singhanurag0317-bit wants to merge 1 commit into
ritesh-1918:mainfrom
singhanurag0317-bit:feat/atomic-corrections-368
Open

feat: implement atomic writes, locking, and PII redaction for AI corrections (closes #368)#853
singhanurag0317-bit wants to merge 1 commit into
ritesh-1918:mainfrom
singhanurag0317-bit:feat/atomic-corrections-368

Conversation

@singhanurag0317-bit
Copy link
Copy Markdown

Description

This pull request addresses Issue #368 ("Corrections log writes are non-atomic (race leads to corrupted training/audit data)"). It introduces a highly resilient, cross-process atomic file-locking mechanism, secure PII redaction, and structured rotating logging to make the corrections logging flow fully robust and production-grade.

Key Enhancements

  1. Directory-Based Cross-Process File Lock: Uses os.mkdir (atomic directory creation guaranteed by OS kernels) to implement an exceptionally resilient, cross-process and thread-safe lock context manager without third-party dependencies.
  2. Atomic Temp + Rename Write Strategy: Prevents data corruption and interleaved concurrent modifications by writing the updated log array to a temporary file (.json.tmp) and atomically replacing the target log file (corrections_log.json).
  3. Structured Logging with Rotation: Set up a dedicated structured logger with a RotatingFileHandler writing JSON Lines (JSONL) to data/corrections_structured.log (1MB limit, 5 backups) for rapid, performant auditing.
  4. PII Redaction Layer: Implemented regex redaction for standard email and telephone number patterns on raw fields (original_text and ocr_text) to completely eliminate PII from logs.
  5. FastAPI Compilation Fix: Resolved a pre-existing NameError in type hints by importing Response from fastapi.

Verification

  • Confirmed error-free compilation of backend/main.py.
  • Developed and ran a rigorous test suite scratch/test_concurrency_corrections.py which spawns 20 parallel threads to concurrently write logs, verifying:
    • 100% JSON validity and zero data corruption.
    • Consistent ordering and no lost updates (all 20 entries written).
    • High-precision redaction of PII (emails/phones).
    • Rotated structured logs written correctly.

@vercel
Copy link
Copy Markdown

vercel Bot commented May 31, 2026

@singhanurag0317-bit is attempting to deploy a commit to the ritesh Team on Vercel.

A member of the Team first needs to authorize it.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 31, 2026

Warning

Review limit reached

@singhanurag0317-bit, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 19 minutes and 7 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e39a4274-a541-4076-bbdc-c5aaa12a8877

📥 Commits

Reviewing files that changed from the base of the PR and between da8faf2 and b200c81.

📒 Files selected for processing (2)
  • backend/main.py
  • scratch/test_concurrency_corrections.py
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Comment thread backend/main.py
# Phone number pattern (7 to 15 digit formats)
phone_pattern = r'\b(?:\+?\d{1,3}[-.\s]?)?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}\b'

redacted = re.sub(email_pattern, "[EMAIL_REDACTED]", text)
Comment thread backend/main.py
os.remove(tmp_path)
except Exception:
pass
return {"status": "error", "message": f"Failed to log correction: {str(e)}"}
@ritesh-1918
Copy link
Copy Markdown
Owner

Hi @singhanurag0317-bit! 🙌

Thank you so much for your excellent contribution: "feat: implement atomic writes, locking, and PII redaction for AI corrections (closes #368)"! We really appreciate the high-quality code and effort you have put into the platform.

Just a quick, friendly heads-up as we prepare our manual merging and verification queues—please make sure to complete all the mandatory community steps listed below.

Once those manual steps are verified, we'll get your PR officially merged into the gssoc branch (or keep it neatly cataloged if closed as integrated) and assign it the highest possible GSSoC S-Tier labels to maximize your leaderboard points!

Let's build something amazing together! 🚀🔥


🌟 Community Support & Network Steps (Take 10 Seconds!)

As we prepare our manual verification and merging queues, please make sure you have taken a moment to complete these required steps to finalize your points:

  1. Star this repository: https://github.com/ritesh-1918/HELPDESK.AI (Mandatory)
  2. 🍴 Fork this repository: https://github.com/ritesh-1918/HELPDESK.AI/fork (Mandatory)
  3. 👤 Follow @ritesh-1918 on GitHub: https://github.com/ritesh-1918 (Mandatory - manual step)
  4. 💼 Connect on LinkedIn: https://www.linkedin.com/in/ritesh1908/ (Mandatory)

Note: Having these steps completed manually is required before your PR points are officially cleared.

@ritesh-1918 ritesh-1918 added gssoc GirlScript Summer of Code gssoc:approved GSSoC Approved PR quality:exceptional Exceptional code quality level:intermediate Intermediate level difficulty type:feature New feature labels May 31, 2026
@ritesh-1918
Copy link
Copy Markdown
Owner

Hi @singhanurag0317-bit! 🙌

Thank you so much for your excellent contribution: "feat: implement atomic writes, locking, and PII redaction for AI corrections (closes #368)"! We really appreciate the high-quality code and effort you have put into the platform.

Just a quick, friendly heads-up as we prepare our manual merging and verification queues—please make sure to complete all the mandatory community steps listed below.

Once those manual steps are verified, we'll get your PR officially merged into the gssoc branch (or keep it neatly cataloged if closed as integrated) and assign it the highest possible GSSoC S-Tier labels to maximize your leaderboard points!

Let's build something amazing together! 🚀🔥


🌟 Community Support & Network Steps (Take 10 Seconds!)

As we prepare our manual verification and merging queues, please make sure you have taken a moment to complete these required steps to finalize your points:

  1. Star this repository: https://github.com/ritesh-1918/HELPDESK.AI (Mandatory)
  2. 🍴 Fork this repository: https://github.com/ritesh-1918/HELPDESK.AI/fork (Mandatory)
  3. 👤 Follow @ritesh-1918 on GitHub: https://github.com/ritesh-1918 (Mandatory - manual step)
  4. 💼 Connect on LinkedIn: https://www.linkedin.com/in/ritesh1908/ (Mandatory)

Note: Having these steps completed manually is required before your PR points are officially cleared.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

gssoc:approved GSSoC Approved PR gssoc GirlScript Summer of Code level:intermediate Intermediate level difficulty quality:exceptional Exceptional code quality type:feature New feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants