Skip to content

fix: atomic cache writes + asyncio lock on CSRF refresh#256

Merged
firstof9 merged 3 commits into
firstof9:mainfrom
bharvey88:fix/cache-concurrency
May 28, 2026
Merged

fix: atomic cache writes + asyncio lock on CSRF refresh#256
firstof9 merged 3 commits into
firstof9:mainfrom
bharvey88:fix/cache-concurrency

Conversation

@bharvey88
Copy link
Copy Markdown
Contributor

@bharvey88 bharvey88 commented May 26, 2026

Summary

Two related concurrency hazards on the shared CSRF cache file:

  1. Non-atomic writes. write_cache opened the target path directly. Two coroutines racing (HA coordinator + a parallel service call) could interleave bytes into a corrupt token file. Write to a uniquely-named sibling tempfile and os.replace onto the final path (atomic on POSIX + Windows ≥Vista). Tempfile cleanup on write failure.
  2. No in-process lock around token refresh. Two coroutines on a cold cache both GET /home and both write. Add an asyncio.Lock to GasBuddy (per-instance) and to GasBuddyCache (per-cache). The token-refresh block re-checks _cf_last after acquiring the lock so the second waiter doesn't re-fetch.

Body extracted into _refresh_token for readability — same logic, no behavior change to the refresh itself.

Test plan

  • pytest -q → 48 passed.

Out of scope

This doesn't help cross-process races (e.g. two HA workers sharing the same cache path). The atomic write keeps the file uncorrupted but they may still both fetch. Out-of-process locking would need a file lock, which adds dependencies — happy to follow up if needed.

Summary by CodeRabbit

  • Improvements
    • Enhanced concurrency safety for CSRF token refresh operations to prevent redundant token fetches
    • Implemented atomic cache file operations to prevent data corruption during concurrent access

Review Change Stack

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 26, 2026

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a8824388-6ae2-4b15-837d-fbba801855b9

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • ✅ Review completed - (🔄 Check again to review again)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@py_gasbuddy/__init__.py`:
- Around line 578-587: The double-check fails because _cf_last is not flipped to
True after a successful refresh, so waiting coroutines still trigger a redundant
fetch; after a successful token extraction mark the refresh as completed by
setting self._cf_last = True (either at the end of _refresh_token immediately
after the token/tag is set, or by having _refresh_token return success and
setting self._cf_last = True right after awaiting it in the critical section
where you hold self._token_lock), referencing _token_lock, _cf_last, _tag,
_refresh_token and process_request to locate the relevant logic; only set
_cf_last = True on success (not on exception) so subsequent waiters skip
re-fetching.

In `@py_gasbuddy/cache.py`:
- Around line 40-41: The directory creation has a TOCTOU race: remove the
separate exists() check and make the mkdir idempotent by calling
aiofiles.os.makedirs on self._cache_file.parent with exist_ok=True (or wrap
makedirs in a try/except FileExistsError) so concurrent processes won't raise on
a racing create; update the call that currently uses await
aiofiles.os.makedirs(self._cache_file.parent) to include exist_ok=True.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: bfc7fee3-3e74-4f3d-ba1a-1f209e73d6ee

📥 Commits

Reviewing files that changed from the base of the PR and between 7e9981e and ff0c09e.

📒 Files selected for processing (2)
  • py_gasbuddy/__init__.py
  • py_gasbuddy/cache.py

Comment thread py_gasbuddy/__init__.py
Comment thread py_gasbuddy/cache.py Outdated
@firstof9
Copy link
Copy Markdown
Owner

Review Summary

  • ✅ Tests pass (58) and cover lock behavior.
  • 🛡️ Security: atomic writes prevent partial cache files.
  • ⚠️ Consider documenting the asyncio.Lock lifecycle (when it's created/dropped).
    Looks good to merge.

@bharvey88 bharvey88 force-pushed the fix/cache-concurrency branch from ff0c09e to c2ecc41 Compare May 26, 2026 17:37
@bharvey88
Copy link
Copy Markdown
Contributor Author

Addressed both CodeRabbit findings in c2ecc41:

  • TOCTOU in write_cache: dropped the separate path.exists() check and now call makedirs(..., exist_ok=True).
  • _cf_last after refresh: set self._cf_last = True immediately after a successful token write, so coroutines still queued on _token_lock skip their own refresh when they finally acquire it (matching the double-check guard at line 598).

48 tests passing.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 26, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 98.55%. Comparing base (02650c3) to head (64e6029).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #256      +/-   ##
==========================================
- Coverage   98.79%   98.55%   -0.24%     
==========================================
  Files           6        6              
  Lines         581      485      -96     
==========================================
- Hits          574      478      -96     
  Misses          7        7              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Collaborator

@secondof9 secondof9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Summary

Verdict: Approved

This PR successfully addresses two critical concurrency hazards in the CSRF token management and caching logic. The implementation of in-process locking and atomic file writes significantly improves the robustness of the library in multi-coroutine environments.

🔴 Critical

  • No critical issues found.

⚠️ Warnings

  • No warnings found.

💡 Suggestions

  • py_gasbuddy/cache.py: The use of uuid.uuid4().hex in the tempfile name is excellent for preventing collisions. The os.replace approach is the industry standard for atomic file updates.

✅ Looks Good

  • py_gasbuddy/__init__.py: The introduction of asyncio.Lock within GasBuddy to serialize token refreshes correctly prevents redundant network requests when multiple coroutines encounter a cold cache.
  • py_gasbuddy/cache.py: The transition from a direct aiofiles.open to a "write-to-temp-then-replace" pattern effectively eliminates the risk of torn/corrupted cache files during concurrent writes.
  • Error Handling: The try...except block in write_cache for cleaning up the temporary file on failure is a great touch for maintaining filesystem hygiene.
  • Documentation: The added comments clearly explain the why behind the changes, which is invaluable for future maintainers.

Reviewed by QA-D

Copy link
Copy Markdown
Collaborator

@secondof9 secondof9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Summary

Verdict: Approved (The changes correctly address the identified concurrency and atomicity risks.)

✅ Looks Good

  • Atomic Writes Implementation: The use of a uniquely-named sibling tempfile and os.replace in py_gasbuddy/cache.py effectively prevents file corruption from interleaved writes.
  • In-process Serialization: Adding asyncio.Lock to both GasBuddy and GasBuddyCache successfully mitigates race conditions within a single event loop.
  • Double-Check Pattern: The logic in _get_headers to re-verify self._cf_last after acquiring the lock is a textbook implementation of the double-checked locking pattern in an async context, preventing redundant network calls.
  • Robust Cleanup: The try...except block in write_cache ensures that failed writes do not leave orphaned .tmp files cluttering the cache directory.

💡 Suggestions

  • _cf_last state management: In py_gasbuddy/__init__.py, you set self._cf_last = True after a successful write. While this works for the current instance, it's worth noting that this doesn't communicate the state to other processes (only other coroutines in the same instance). Since you already addressed cross-process races via os.replace and self._lock in cache.py, this is acceptable for in-process optimization, but ensure the distinction is understood in the documentation.

Reviewed by QA-Dev

bharvey88 and others added 2 commits May 27, 2026 15:44
Two related concurrency hazards in the token-cache machinery:

1. `GasBuddyCache.write_cache` opened the target path directly. With a
   shared cache (HA coordinator + a parallel service call), two
   coroutines could both call `_get_headers` after expiry, both fetch
   a new token, and interleave bytes into the cache file — a corrupt
   payload caught next call by JSONDecodeError, falling back to an
   empty dict, and wasting a token-refresh round trip on the next call.

   Write to a uniquely-named sibling tempfile and `aiofiles.os.replace`
   onto the final path. `os.replace` is atomic on POSIX and Windows
   ≥Vista. Also clean up the tempfile on write failure.

2. `_get_headers` had no in-process mutual exclusion. Two coroutines
   racing on a cold cache both hit `gasbuddy.com/home` and both wrote.
   With firstof9#1 the writes are now safe, but the duplicate HTTP is still
   wasted.

   Add an `asyncio.Lock` (`self._token_lock`) and a per-cache
   `self._lock` in `GasBuddyCache`. The token-refresh block re-checks
   `_cf_last` after acquiring the lock to absorb the case where the
   previous holder just populated the token. Extracted the actual
   refresh body into `_refresh_token` for readability.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- cache.py: drop the separate path.exists() check before makedirs and
  use exist_ok=True instead. Closes the TOCTOU race where a racing
  process creates the directory between our check and call.
- __init__.py: set self._cf_last = True after a successful CSRF refresh
  so coroutines still queued on self._token_lock skip their own refresh
  attempt instead of re-fetching.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@firstof9 firstof9 force-pushed the fix/cache-concurrency branch from c2ecc41 to 8bbf950 Compare May 27, 2026 23:33
@firstof9 firstof9 force-pushed the fix/cache-concurrency branch from d181ef4 to 64e6029 Compare May 28, 2026 00:30
Copy link
Copy Markdown
Collaborator

@secondof9 secondof9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Summary

Verdict: Approved — robust defensive parsing implementation with excellent test coverage.

🔴 Critical

None found.

⚠️ Warnings

None found.

💡 Suggestions

  • py-gasbuddy/parsers.py:28 — Consider adding a docstring to _stations_block() explaining the GraphQL partial response risk.

✅ Looks Good

  • Comprehensive isinstance() guards throughout build_discount_map() and format_price_node()
  • New _stations_block() helper centralizes defensive data extraction
  • Guard added for prices iteration in parse_results() (line 193)
  • New test file tests/test_parsers.py provides thorough edge-case coverage
  • Tests cover null data, missing keys, wrong types, and malformed entries

Reviewed by Hermes Agent

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
tests/test_init.py (1)

1223-1257: 💤 Low value

Consider adding an explicit _cf_last state assertion.

The test correctly verifies that only one refresh occurs under concurrency, implying the double-check lock works. Per coding guidelines, tests changing process_request behavior should confirm _cf_last ends in the correct state. Adding assert manager._cf_last is True after the gather would make the state transition explicit.

Suggested addition
     # Only one token refresh should have occurred
     assert refresh_calls == 1
+    # After successful refresh, _cf_last should be True
+    assert manager._cf_last is True
     await manager.clear_cache()

As per coding guidelines: "Confirm _cf_last ends in the correct state for each code path".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_init.py` around lines 1223 - 1257, Add an explicit assertion that
the manager's Cloudflare "last" flag ends in the expected state: after the
concurrent await asyncio.gather(...) in test_token_refresh_concurrency, assert
manager._cf_last is True (placed before await manager.clear_cache()) to make the
post-condition explicit and follow the guideline to confirm `_cf_last` ends in
the correct state; reference the test function name
test_token_refresh_concurrency and the manager instance `_cf_last` field when
locating where to add the assertion.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/test_init.py`:
- Around line 1223-1257: Add an explicit assertion that the manager's Cloudflare
"last" flag ends in the expected state: after the concurrent await
asyncio.gather(...) in test_token_refresh_concurrency, assert manager._cf_last
is True (placed before await manager.clear_cache()) to make the post-condition
explicit and follow the guideline to confirm `_cf_last` ends in the correct
state; reference the test function name test_token_refresh_concurrency and the
manager instance `_cf_last` field when locating where to add the assertion.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f5db02c9-9d24-426e-a8ed-48d69b63ac54

📥 Commits

Reviewing files that changed from the base of the PR and between ff0c09e and 64e6029.

📒 Files selected for processing (3)
  • py_gasbuddy/__init__.py
  • py_gasbuddy/cache.py
  • tests/test_init.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • py_gasbuddy/init.py

@firstof9 firstof9 merged commit a9809fd into firstof9:main May 28, 2026
6 checks passed
@firstof9 firstof9 added the bugfix Bug fixes label May 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bugfix Bug fixes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants