Skip to content

feat(snapshots): add raw event export behind include_raw_events flag#14

Open
sujalgoel wants to merge 5 commits intoAOSSIE-Org:mainfrom
sujalgoel:raw-event-export
Open

feat(snapshots): add raw event export behind include_raw_events flag#14
sujalgoel wants to merge 5 commits intoAOSSIE-Org:mainfrom
sujalgoel:raw-event-export

Conversation

@sujalgoel
Copy link
Copy Markdown

@sujalgoel sujalgoel commented Mar 31, 2026

What this does

Adds a include_raw_events flag to SnapshotConfig that, when enabled, writes a events.json file alongside the other snapshot files in each snapshot directory.

The flag defaults to False, so existing deployments are unaffected.

Changes

config/models.py

  • Added include_raw_events: bool = False to SnapshotConfig

engine/snapshots.py

  • Passes include_raw_events from config down to _collect_snapshot_data
  • In _collect_snapshot_data, uses getattr(storage, "list_contributions", None) + callable() guard, consistent with how other optional storage methods are called in this module
  • When enabled, writes events.json with the same envelope schema as other snapshot files (schema_version, generated_at, org, run_id, period_start, period_end, data)

tests/test_snapshots.py

  • Added list_contributions to MockStorage and a contributions list for test data
  • Four new tests:
    • test_raw_events_excluded_when_false - flag off, no events.json
    • test_raw_events_included_when_enabled - flag on, events.json present with correct fields
    • test_raw_events_respects_period_start - only events at or after period_start appear
    • test_write_snapshots_raw_events_via_config - end-to-end: config flag triggers file write via github_writer

Why the getattr guard

The call to storage.list_contributions is guarded the same way list_pending_issue_requests and list_recent_notifications are guarded elsewhere in this file. The snapshot module accepts storage: Any and is designed to degrade gracefully if a storage adapter does not implement a given method.

Testing

pytest tests/test_snapshots.py  # 11 passed

Summary by CodeRabbit

  • New Features

    • Snapshot export can optionally include raw contribution events. When enabled, an events.json is produced with event metadata, timestamps, and run/org info; it respects the snapshot period (filters out older events) and is omitted gracefully if the backend cannot provide events.
  • Tests

    • Added coverage for events.json inclusion/exclusion, content structure and ordering, period filtering, and backend-absence behavior.

- Wrap storage.list_contributions call in getattr/callable check,
  consistent with how other optional storage methods are accessed in
  this module (list_pending_issue_requests, list_recent_notifications).
- Remove redundant getattr on snapshot_config.include_raw_events; field
  is now a real Pydantic field with a default, so direct access is safe.
- Rename test_raw_events_excluded_by_default ->
  test_raw_events_excluded_when_false; the old name implied the test
  exercised the default, but it passed include_raw_events=False
  explicitly.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 31, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: fdc6cb43-58ad-4a38-a3a5-f98ae698b829

📥 Commits

Reviewing files that changed from the base of the PR and between 47ef3f6 and 110f1b2.

📒 Files selected for processing (1)
  • tests/test_snapshots.py

Walkthrough

Adds include_raw_events: bool = False to snapshot configuration and implements optional collection/export of raw ContributionEvent records as events.json; tests added to verify presence, period filtering, write behavior, ordering, and graceful omission when storage lacks list_contributions.

Changes

Cohort / File(s) Summary
Configuration Model
src/ghdcbot/config/models.py
Added include_raw_events: bool = False to SnapshotConfig.
Snapshot Engine
src/ghdcbot/engine/snapshots.py
Propagates include_raw_events into _collect_snapshot_data; adds conditional assembly of events.json (schema_version, metadata, data array of event records) when storage.list_contributions exists and inclusion is enabled; changed one call style from list_notifications(limit=...) to list_notifications(...).
Tests
tests/test_snapshots.py
Extended MockStorage with contributions and list_contributions; added _make_config helper; added tests asserting events.json presence/absence, content schema and ordering, period_start/period_end filtering, write-to-repo behavior, and graceful omission when storage lacks list_contributions.

Sequence Diagram(s)

sequenceDiagram
    participant Config as SnapshotConfig
    participant Engine as _write_snapshots
    participant Collector as _collect_snapshot_data
    participant Storage as StorageAdapter
    participant Repo as GitHub_Repo

    Config->>Engine: include_raw_events=true
    Engine->>Collector: collect(snapshot_config, period_start, period_end, ...)
    alt storage supports list_contributions
        Collector->>Storage: list_contributions(since=period_start)
        Storage-->>Collector: [ContributionEvent,...]
        Collector->>Collector: filter by created_at <= period_end, build events.json
        Collector-->>Engine: files dict (includes events.json)
    else storage lacks list_contributions
        Collector-->>Engine: files dict (no events.json)
    end
    Engine->>Repo: write files (may include events.json)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested labels

Python Lang

Poem

🐰 I hopped through configs, tiny and bright,

Gathered events from day and night,
Sorted by time, then packed just so,
Tests checked the order—onward we go! 🥕

🚥 Pre-merge checks | ✅ 2
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: adding raw event export functionality controlled by an include_raw_events flag to the snapshot system.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/ghdcbot/engine/snapshots.py`:
- Line 168: This boolean parameter is acceptable for an internal function;
either silence Ruff for this name or make it keyword-only: update the function
signature that currently declares "include_raw_events: bool = False" to be
preceded by a positional-only separator (add "*, include_raw_events: bool =
False") if you want to enforce keyword use, or add a local noqa comment to the
parameter to suppress FBT001/FBT002 warnings while keeping the default as-is
(refer to the function that contains the "include_raw_events" parameter).

In `@tests/test_snapshots.py`:
- Around line 362-498: Add a test that exercises the case where the storage
object does not implement list_contributions so _collect_snapshot_data
gracefully omits events.json; create a MinimalStorage class without
list_contributions (but with any other stubbed methods used elsewhere), call
_collect_snapshot_data with include_raw_events=True and assert that
"events.json" is not present in the returned snapshots, referencing the helper
under test _collect_snapshot_data and the SnapshotConfig/include_raw_events
behavior to ensure the guard using getattr/callable is covered.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: e34cc510-d205-4f22-a570-b04c6e0ee864

📥 Commits

Reviewing files that changed from the base of the PR and between ac7c493 and ed94b7b.

📒 Files selected for processing (3)
  • src/ghdcbot/config/models.py
  • src/ghdcbot/engine/snapshots.py
  • tests/test_snapshots.py

Comment thread src/ghdcbot/engine/snapshots.py
Comment thread tests/test_snapshots.py
- Make include_raw_events keyword-only in _collect_snapshot_data to
  silence Ruff FBT001/FBT002 (boolean positional arg warnings)
- Replace timezone.utc with datetime.UTC alias in new tests (UP017)
- Add test_raw_events_graceful_when_storage_missing_method: verifies
  that events.json is omitted silently when storage does not implement
  list_contributions, documenting the getattr/callable guard behaviour
@github-actions github-actions bot added size/L and removed size/L labels Mar 31, 2026
@sujalgoel
Copy link
Copy Markdown
Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 31, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@github-actions github-actions bot added size/L and removed size/L labels Mar 31, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/ghdcbot/engine/snapshots.py`:
- Around line 329-338: The events payload currently collects contributions using
only period_start (for event in list_contributions(period_start)) which can
include items after period_end; update the collection to enforce the upper bound
by either calling list_contributions with both period_start and period_end
(e.g., list_contributions(period_start, period_end)) or post-filtering
events_data to only include events whose timestamp/created_at <= period_end
before assigning files["events.json"] (referencing list_contributions,
events_data, period_start, period_end, and files["events.json"]).

In `@tests/test_snapshots.py`:
- Line 504: The parameter `limit` in the `list_recent_notifications` method of
the `MinimalStorage` stub is intentionally unused and should be renamed to
`_limit` to silence ARG002; update the method signature `def
list_recent_notifications(self, limit: int = 1000) -> list[dict]:` to use
`_limit` instead (keep the same type hint and default) and leave the
implementation/return behavior unchanged so linting passes without altering
functionality.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 45e2b41a-3eca-43d2-8768-c7178d8efd08

📥 Commits

Reviewing files that changed from the base of the PR and between ed94b7b and 56e86ee.

📒 Files selected for processing (2)
  • src/ghdcbot/engine/snapshots.py
  • tests/test_snapshots.py

Comment thread src/ghdcbot/engine/snapshots.py
Comment thread tests/test_snapshots.py Outdated
@github-actions github-actions bot added size/L and removed size/L labels Mar 31, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/test_snapshots.py`:
- Line 3: The file mixes UTC and timezone.utc; standardize on
datetime.timezone.UTC by replacing uses of timezone.utc with UTC and update the
import to from datetime import UTC, datetime so all tests in
tests/test_snapshots.py use UTC consistently; search for any datetime(...,
tzinfo=timezone.utc) or references to timezone.utc and change them to tzinfo=UTC
(or pass UTC directly) and remove the unused timezone import.
- Around line 433-468: Add a new unit test function (e.g.,
test_raw_events_respects_period_end) that mirrors
test_raw_events_respects_period_start but verifies the upper-bound filtering:
create a MockStorage with two ContributionEvent objects (one with created_at
inside the period and one after period_end), call _collect_snapshot_data with
include_raw_events=True and the desired period_start/period_end, then assert
snapshots["events.json"]["data"] only contains the in-period event (check length
== 1 and github_user matches the expected in-period contributor); reference
MockStorage, ContributionEvent, _collect_snapshot_data, and the events.json data
structure to locate where to add the test.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 7d8037ec-2dba-459c-a141-f2d04ef19fd9

📥 Commits

Reviewing files that changed from the base of the PR and between 56e86ee and 47ef3f6.

📒 Files selected for processing (2)
  • src/ghdcbot/engine/snapshots.py
  • tests/test_snapshots.py

Comment thread tests/test_snapshots.py Outdated
Comment thread tests/test_snapshots.py
@github-actions github-actions bot added size/L and removed size/L labels Mar 31, 2026
@sujalgoel
Copy link
Copy Markdown
Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 31, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@github-actions github-actions bot added size/L and removed size/L labels Mar 31, 2026
@sujalgoel
Copy link
Copy Markdown
Author

@shubham5080 would love a review on this when you get a chance!

@shubham5080
Copy link
Copy Markdown
Member

i will check !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants