fix: tolerate malformed GraphQL payloads in parsers#250
Conversation
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #250 +/- ##
==========================================
- Coverage 98.79% 98.63% -0.17%
==========================================
Files 6 6
Lines 581 511 -70
==========================================
- Hits 574 504 -70
Misses 7 7 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Review Summary
|
|
Friendly nudge: this one looks like it got missed in the last batch merge. It's still green and mergeable, and the issue is still present on |
|
❌ Patch coverage is 94.73684% with 1 line in your changes missing coverage. Please review. |
|
Covered in 98a4b37. The two guard branches (the non-list |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@py_gasbuddy/parsers.py`:
- Around line 70-82: The _stations_block function currently assumes
response["data"] and data["locationBySearchTerm"] are dict-like before calling
.get(), which can crash on non-dict truthy values; update _stations_block to
first retrieve response.get("data") and response.get("data") into local
variables and check isinstance(..., dict) before using .get() (i.e., only call
.get() on dict instances), similarly check that location is a dict before
accessing location.get("stations"), and return {} if any check fails; apply the
same pattern to the analogous function(s) around lines 206-210 (the other
defensive-parsing helper) so all JSON path accesses validate types before
calling .get().
- Around line 126-129: The current parse_results flow trusts entries that merely
have an "id" and later directly indexes required fields causing crashes for
partially malformed station entries; update the validation to only include
entries that are dicts and contain all required top-level keys used later (e.g.,
the keys accessed around lines 135-138 and 157), and when iterating nested lists
like "offers" and "prices" ensure each element is a dict with the expected keys
before indexing—skip any entry or nested element that fails these checks so
malformed but id-bearing records are safely ignored (update the logic that
builds results from raw_results and the code paths that process offers/prices
accordingly).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 14b3258d-7ec5-4dfc-bc30-941ca89bb8b9
📒 Files selected for processing (2)
py_gasbuddy/parsers.pytests/test_parsers.py
The GraphQL spec allows partial responses (`data: null`, or `locationBySearchTerm: null`), and real responses occasionally contain `stations: null` or non-dict `cursor` values. The parsers used unconditional key/index access and would raise `TypeError` / `KeyError`, which then propagated as raw aiohttp/JSON errors to the caller instead of clean empty results. - Introduce a shared `_stations_block` helper that returns an empty dict for any unexpected shape. - `parse_cursor`, `parse_location_results`, `parse_results`, and `parse_trends` now skip malformed entries and return empty results instead of raising. - Add unit tests covering the `data: null`, null-segment, wrong-type, and missing-key cases. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The existing not-a-list tests passed None, which is handled by the `or []` fallback before the isinstance guard, so the guard branches (parse_location_results raw_results reset, parse_trends early return) were uncovered. Add tests with a truthy non-list value to exercise them. parsers.py back to 100%.
98a4b37 to
aa09dbc
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
py_gasbuddy/parsers.py (1)
162-164:⚠️ Potential issue | 🟠 Major | ⚡ Quick winGuard
brands[0]shape before calling.get().Line 162 assumes the first
brandselement is a dict. If it’s a string/number/object from malformed data, this still raises instead of degrading safely.Proposed fix
+ brands = result.get("brands") raw: dict[str, Any] = { @@ - "image_url": result["brands"][0].get("imageUrl") - if isinstance(result.get("brands"), list) and result["brands"] - else None, + "image_url": ( + brands[0].get("imageUrl") + if isinstance(brands, list) + and brands + and isinstance(brands[0], dict) + else None + ), @@ - "brands": result.get("brands") or [], + "brands": brands or [],🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@py_gasbuddy/parsers.py` around lines 162 - 164, The current construction that sets "image_url" accesses result["brands"][0].get("imageUrl") assuming the first brands element is a dict; change it to first retrieve result.get("brands") into a variable and only call .get("imageUrl") when isinstance(brands, list) and brands and isinstance(brands[0], dict), otherwise set image_url to None so malformed brands entries (e.g., strings or numbers) don't raise; update the assignment that creates the "image_url" value in parsers.py accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@py_gasbuddy/parsers.py`:
- Around line 32-33: The runtime check "if not isinstance(offer, dict): continue
# type: ignore[unreachable]" indicates the declared parameter types are too
narrow; update the relevant function signatures to accept a wider type (e.g.,
use Sequence[object] or Sequence[Any] for the iterable parameter and import
Any/Sequence from typing) so the defensive isinstance checks are valid, then
remove the "# type: ignore[unreachable]" comments (apply the same change for the
other occurrence around lines 196-197). Ensure any downstream TypedDict casts
remain explicit where you know the shape after the isinstance(dict) guard.
- Around line 47-51: The loop handling disc.get("grades") may attempt to use
unhashable or non-string items as dict keys; update the loop in the section that
iterates over grades (variable names: grades, grade, discount_map, amount,
disc.get) to validate each grade before using it as a key—either by checking
type (e.g., ensure isinstance(grade, (str, int))) or by attempting to
hash(grade) inside a try/except and skipping items that raise TypeError; only
add to discount_map when the grade passes validation so malformed payload items
are ignored.
---
Outside diff comments:
In `@py_gasbuddy/parsers.py`:
- Around line 162-164: The current construction that sets "image_url" accesses
result["brands"][0].get("imageUrl") assuming the first brands element is a dict;
change it to first retrieve result.get("brands") into a variable and only call
.get("imageUrl") when isinstance(brands, list) and brands and
isinstance(brands[0], dict), otherwise set image_url to None so malformed brands
entries (e.g., strings or numbers) don't raise; update the assignment that
creates the "image_url" value in parsers.py accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: a45deb96-3f8d-4e3a-9857-5287494f56bd
📒 Files selected for processing (3)
py_gasbuddy/parsers.pytests/test_init.pytests/test_parsers.py
✅ Files skipped from review due to trivial changes (1)
- tests/test_init.py
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/test_parsers.py
aa09dbc to
5c2c4f4
Compare
secondof9
left a comment
There was a problem hiding this comment.
LGTM. The changes add robust defensive parsing for GraphQL payloads, introduce _stations_block helper, add type checks, and update docs/tests. All tests now pass (61 passed). No functional regressions observed. Approve.
5c2c4f4 to
c40f086
Compare
secondof9
left a comment
There was a problem hiding this comment.
Code Review Summary
Verdict: Approved
✅ Looks Good
- Defensive Parsing: The updates to
py_gasbuddy/parsers.pysignificantly improve the robustness of the parser against malformed or unexpected GraphQL responses. The addition of explicitisinstancechecks and the new_stations_blockhelper effectively mitigate potentialTypeErrorandKeyErrorexceptions during payload processing. - Test Coverage: The introduction of
tests/test_parsers.pyis excellent. It provides comprehensive coverage for the newly added defensive branches, ensuring that edge cases likenulldata, missing keys, and incorrect types are handled as intended. - Code Cleanliness: The refactoring in
tests/test_init.pyto use parenthesizedwithstatements is a good move for readability and follows modern Python style.
💡 Suggestions
- Type Hinting Consistency: In
py_gasbuddy/parsers.py, the signature ofbuild_discount_mapwas changed fromoffers: list[dict[str, Any]]tooffers: list[Any]. While this allows the defensiveisinstance(offer, dict)check to work, you might consider keeping the more specific type hint if you can validate the elements, or perhaps usinglist[dict[str, Any]]if you can guarantee the outer list contains dicts. However, given the defensive nature of this PR, the current change is acceptable for safety.
Reviewed by QA-Dev
secondof9
left a comment
There was a problem hiding this comment.
Code Review Summary
Verdict: Approved
✅ Looks Good
- Defensive Programming: The primary focus of this PR is adding robust type checking and defensive guards against malformed or unexpected GraphQL response shapes. This is a highly valuable addition for a library parsing external, third-party API data.
- Improved Robustness: The introduction of
_stations_blockprovides a single, reliable way to extract the core stations dictionary, significantly reducing the risk ofKeyErrororTypeErroracross multiple parsing functions. - Comprehensive Test Coverage: The addition of
tests/test_parsers.pyis excellent. It covers a wide range of edge cases, includingNonevalues, missing keys, and incorrect types, ensuring that the new defensive logic is actually working and stays working. - Cleaner Logic: Using list comprehensions with type guards (e.g.,
[o for o in ... if isinstance(o, dict)]) makes the parsing loops more readable and safer. - API Stability: Despite the internal logic changes, the public-facing signatures and return types of the parsing functions remain consistent, ensuring no breaking changes for users.
💡 Suggestions
- Small Nitpick (tests/test_init.py): The change from single quotes to double quotes for the HTML body in
test_retry_succeeds_on_second_attemptis purely stylistic but keeps it consistent with the rest of the file. No action required.
Reviewed by QA-Dev
Summary
data: null,locationBySearchTerm: null), and real responses occasionally returnstations: null/ non-dictcursor. The current parsers used unconditional key access and raised on those shapes._stations_blockhelper and makeparse_cursor,parse_location_results,parse_results, andparse_trendsskip malformed payloads instead of raising.Test plan
tests/test_parsers.pycoversdata: null, null intermediate segments,stationsnot a dict, malformed entries.Summary by CodeRabbit