Skip to content

fix: tolerate malformed GraphQL payloads in parsers#250

Merged
firstof9 merged 3 commits into
firstof9:mainfrom
bharvey88:fix/defensive-parsers
May 28, 2026
Merged

fix: tolerate malformed GraphQL payloads in parsers#250
firstof9 merged 3 commits into
firstof9:mainfrom
bharvey88:fix/defensive-parsers

Conversation

@bharvey88
Copy link
Copy Markdown
Contributor

@bharvey88 bharvey88 commented May 26, 2026

Summary

  • The GraphQL spec allows partial responses (data: null, locationBySearchTerm: null), and real responses occasionally return stations: null / non-dict cursor. The current parsers used unconditional key access and raised on those shapes.
  • Add a shared _stations_block helper and make parse_cursor, parse_location_results, parse_results, and parse_trends skip malformed payloads instead of raising.

Test plan

  • New tests/test_parsers.py covers data: null, null intermediate segments, stations not a dict, malformed entries.
  • Full suite: 61 passed (48 prior + 13 new).

Summary by CodeRabbit

  • Bug Fixes
    • Enhanced resilience when handling incomplete or malformed API responses
    • Improved parsing to gracefully handle missing or unexpected data fields
    • Fixed edge cases preventing proper station and pricing data retrieval

Review Change Stack

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 26, 2026

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b248b3f2-fe03-4c90-823d-fc4d3aeed0c8

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 26, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 98.63%. Comparing base (02650c3) to head (c40f086).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #250      +/-   ##
==========================================
- Coverage   98.79%   98.63%   -0.17%     
==========================================
  Files           6        6              
  Lines         581      511      -70     
==========================================
- Hits          574      504      -70     
  Misses          7        7              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@firstof9
Copy link
Copy Markdown
Owner

Review Summary

  • ✅ Tests pass (61) and includes new parser tests.
  • 🛡️ Security: defensive parsing prevents crashes on malformed GraphQL responses.
  • 📝 Documentation: clear summary of changes and test plan.
    Looks good to merge.

@bharvey88
Copy link
Copy Markdown
Contributor Author

Friendly nudge: this one looks like it got missed in the last batch merge. It's still green and mergeable, and the issue is still present on main (parsers.py lines 72, 78, 105, and 193 still do unconditional response["data"]["locationBySearchTerm"]["stations"] access, which raises on the partial/null GraphQL payloads the spec allows). Happy to rebase or split it smaller if that's easier to review.

@firstof9
Copy link
Copy Markdown
Owner

❌ Patch coverage is 94.73684% with 1 line in your changes missing coverage. Please review.

@bharvey88
Copy link
Copy Markdown
Contributor Author

Covered in 98a4b37. The two guard branches (the non-list results reset in parse_location_results and the non-list trends early return in parse_trends) were only exercised with None, which the or [] fallback handles before the isinstance check, so the guards themselves never ran. Added tests passing a truthy non-list value; parsers.py is back to 100%.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@py_gasbuddy/parsers.py`:
- Around line 70-82: The _stations_block function currently assumes
response["data"] and data["locationBySearchTerm"] are dict-like before calling
.get(), which can crash on non-dict truthy values; update _stations_block to
first retrieve response.get("data") and response.get("data") into local
variables and check isinstance(..., dict) before using .get() (i.e., only call
.get() on dict instances), similarly check that location is a dict before
accessing location.get("stations"), and return {} if any check fails; apply the
same pattern to the analogous function(s) around lines 206-210 (the other
defensive-parsing helper) so all JSON path accesses validate types before
calling .get().
- Around line 126-129: The current parse_results flow trusts entries that merely
have an "id" and later directly indexes required fields causing crashes for
partially malformed station entries; update the validation to only include
entries that are dicts and contain all required top-level keys used later (e.g.,
the keys accessed around lines 135-138 and 157), and when iterating nested lists
like "offers" and "prices" ensure each element is a dict with the expected keys
before indexing—skip any entry or nested element that fails these checks so
malformed but id-bearing records are safely ignored (update the logic that
builds results from raw_results and the code paths that process offers/prices
accordingly).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 14b3258d-7ec5-4dfc-bc30-941ca89bb8b9

📥 Commits

Reviewing files that changed from the base of the PR and between 7e9981e and 98a4b37.

📒 Files selected for processing (2)
  • py_gasbuddy/parsers.py
  • tests/test_parsers.py

Comment thread py_gasbuddy/parsers.py
Comment thread py_gasbuddy/parsers.py Outdated
bharvey88 and others added 2 commits May 27, 2026 16:34
The GraphQL spec allows partial responses (`data: null`, or
`locationBySearchTerm: null`), and real responses occasionally contain
`stations: null` or non-dict `cursor` values. The parsers used
unconditional key/index access and would raise `TypeError` / `KeyError`,
which then propagated as raw aiohttp/JSON errors to the caller instead
of clean empty results.

- Introduce a shared `_stations_block` helper that returns an empty dict
  for any unexpected shape.
- `parse_cursor`, `parse_location_results`, `parse_results`, and
  `parse_trends` now skip malformed entries and return empty results
  instead of raising.
- Add unit tests covering the `data: null`, null-segment, wrong-type, and
  missing-key cases.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The existing not-a-list tests passed None, which is handled by the
`or []` fallback before the isinstance guard, so the guard branches
(parse_location_results raw_results reset, parse_trends early return)
were uncovered. Add tests with a truthy non-list value to exercise them.
parsers.py back to 100%.
@firstof9 firstof9 force-pushed the fix/defensive-parsers branch from 98a4b37 to aa09dbc Compare May 27, 2026 23:36
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
py_gasbuddy/parsers.py (1)

162-164: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard brands[0] shape before calling .get().

Line 162 assumes the first brands element is a dict. If it’s a string/number/object from malformed data, this still raises instead of degrading safely.

Proposed fix
+        brands = result.get("brands")
         raw: dict[str, Any] = {
@@
-            "image_url": result["brands"][0].get("imageUrl")
-            if isinstance(result.get("brands"), list) and result["brands"]
-            else None,
+            "image_url": (
+                brands[0].get("imageUrl")
+                if isinstance(brands, list)
+                and brands
+                and isinstance(brands[0], dict)
+                else None
+            ),
@@
-            "brands": result.get("brands") or [],
+            "brands": brands or [],
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@py_gasbuddy/parsers.py` around lines 162 - 164, The current construction that
sets "image_url" accesses result["brands"][0].get("imageUrl") assuming the first
brands element is a dict; change it to first retrieve result.get("brands") into
a variable and only call .get("imageUrl") when isinstance(brands, list) and
brands and isinstance(brands[0], dict), otherwise set image_url to None so
malformed brands entries (e.g., strings or numbers) don't raise; update the
assignment that creates the "image_url" value in parsers.py accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@py_gasbuddy/parsers.py`:
- Around line 32-33: The runtime check "if not isinstance(offer, dict): continue
# type: ignore[unreachable]" indicates the declared parameter types are too
narrow; update the relevant function signatures to accept a wider type (e.g.,
use Sequence[object] or Sequence[Any] for the iterable parameter and import
Any/Sequence from typing) so the defensive isinstance checks are valid, then
remove the "# type: ignore[unreachable]" comments (apply the same change for the
other occurrence around lines 196-197). Ensure any downstream TypedDict casts
remain explicit where you know the shape after the isinstance(dict) guard.
- Around line 47-51: The loop handling disc.get("grades") may attempt to use
unhashable or non-string items as dict keys; update the loop in the section that
iterates over grades (variable names: grades, grade, discount_map, amount,
disc.get) to validate each grade before using it as a key—either by checking
type (e.g., ensure isinstance(grade, (str, int))) or by attempting to
hash(grade) inside a try/except and skipping items that raise TypeError; only
add to discount_map when the grade passes validation so malformed payload items
are ignored.

---

Outside diff comments:
In `@py_gasbuddy/parsers.py`:
- Around line 162-164: The current construction that sets "image_url" accesses
result["brands"][0].get("imageUrl") assuming the first brands element is a dict;
change it to first retrieve result.get("brands") into a variable and only call
.get("imageUrl") when isinstance(brands, list) and brands and
isinstance(brands[0], dict), otherwise set image_url to None so malformed brands
entries (e.g., strings or numbers) don't raise; update the assignment that
creates the "image_url" value in parsers.py accordingly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a45deb96-3f8d-4e3a-9857-5287494f56bd

📥 Commits

Reviewing files that changed from the base of the PR and between 98a4b37 and aa09dbc.

📒 Files selected for processing (3)
  • py_gasbuddy/parsers.py
  • tests/test_init.py
  • tests/test_parsers.py
✅ Files skipped from review due to trivial changes (1)
  • tests/test_init.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/test_parsers.py

Comment thread py_gasbuddy/parsers.py Outdated
Comment thread py_gasbuddy/parsers.py
@firstof9 firstof9 force-pushed the fix/defensive-parsers branch from aa09dbc to 5c2c4f4 Compare May 27, 2026 23:42
Copy link
Copy Markdown
Collaborator

@secondof9 secondof9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. The changes add robust defensive parsing for GraphQL payloads, introduce _stations_block helper, add type checks, and update docs/tests. All tests now pass (61 passed). No functional regressions observed. Approve.

@firstof9 firstof9 force-pushed the fix/defensive-parsers branch from 5c2c4f4 to c40f086 Compare May 28, 2026 13:50
@firstof9 firstof9 merged commit 318ad23 into firstof9:main May 28, 2026
6 checks passed
Copy link
Copy Markdown
Collaborator

@secondof9 secondof9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Summary

Verdict: Approved

✅ Looks Good

  • Defensive Parsing: The updates to py_gasbuddy/parsers.py significantly improve the robustness of the parser against malformed or unexpected GraphQL responses. The addition of explicit isinstance checks and the new _stations_block helper effectively mitigate potential TypeError and KeyError exceptions during payload processing.
  • Test Coverage: The introduction of tests/test_parsers.py is excellent. It provides comprehensive coverage for the newly added defensive branches, ensuring that edge cases like null data, missing keys, and incorrect types are handled as intended.
  • Code Cleanliness: The refactoring in tests/test_init.py to use parenthesized with statements is a good move for readability and follows modern Python style.

💡 Suggestions

  • Type Hinting Consistency: In py_gasbuddy/parsers.py, the signature of build_discount_map was changed from offers: list[dict[str, Any]] to offers: list[Any]. While this allows the defensive isinstance(offer, dict) check to work, you might consider keeping the more specific type hint if you can validate the elements, or perhaps using list[dict[str, Any]] if you can guarantee the outer list contains dicts. However, given the defensive nature of this PR, the current change is acceptable for safety.

Reviewed by QA-Dev

Copy link
Copy Markdown
Collaborator

@secondof9 secondof9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Summary

Verdict: Approved

✅ Looks Good

  • Defensive Programming: The primary focus of this PR is adding robust type checking and defensive guards against malformed or unexpected GraphQL response shapes. This is a highly valuable addition for a library parsing external, third-party API data.
  • Improved Robustness: The introduction of _stations_block provides a single, reliable way to extract the core stations dictionary, significantly reducing the risk of KeyError or TypeError across multiple parsing functions.
  • Comprehensive Test Coverage: The addition of tests/test_parsers.py is excellent. It covers a wide range of edge cases, including None values, missing keys, and incorrect types, ensuring that the new defensive logic is actually working and stays working.
  • Cleaner Logic: Using list comprehensions with type guards (e.g., [o for o in ... if isinstance(o, dict)]) makes the parsing loops more readable and safer.
  • API Stability: Despite the internal logic changes, the public-facing signatures and return types of the parsing functions remain consistent, ensuring no breaking changes for users.

💡 Suggestions

  • Small Nitpick (tests/test_init.py): The change from single quotes to double quotes for the HTML body in test_retry_succeeds_on_second_attempt is purely stylistic but keeps it consistent with the rest of the file. No action required.

Reviewed by QA-Dev

@firstof9 firstof9 added the bugfix Bug fixes label May 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bugfix Bug fixes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants