feat: add Google Health ingestion and storage#175
Conversation
WalkthroughAPI経由で Google Health ingest run を作成し、Extractor→Normalizer→Writer→Repository 経由で events を保存。続けて compact で monthly-partitioned compacted Parquet を範囲置換するワークフロー、DB移行、API・クライアント・ユニット/統合テスト、ドキュメントを追加。 ChangesGoogle Health データ取り込みワークフロー
Sequence DiagramsequenceDiagram
participant Client as API Client
participant API as Pipelines API
participant Service as PipelineService
participant Workflow as Workflow Engine
participant Extractor as GoogleHealthExtractor
participant Normalizer as Normalizer
participant Writer as GoogleHealthWriter
participant Repo as GoogleHealthRepository
Client->>API: POST /v1/sources/google-health/runs
API->>Service: validate & trigger_google_health_ingest(request)
Service->>Workflow: enqueue google_health_ingest_workflow
Workflow->>Extractor: extract(data_type, date_from, date_to)
Extractor->>API: call reconcile/rollup endpoints
Extractor->>Normalizer: payloads
Normalizer->>Writer: normalized rows
Writer->>Repo: save_events / save_raw / compact_range
Repo-->>Workflow: sync cursor results
Workflow->>Writer: compact_range(selected_data_types, date_from, date_to)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 分 Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
|
There was a problem hiding this comment.
Actionable comments posted: 9
🧹 Nitpick comments (4)
egograph/pipelines/README.md (1)
129-129: 💤 Low valueエンドポイント説明文の詳細さを docs/architecture/pipelines.md と比較
README.md line 129 の説明「Google Health取り込みrun作成」は正確だが、docs/architecture/pipelines.md line 489 の説明「Google Health backfill・期間指定run作成」と比べると、「backfill・期間指定」といった mode 情報が missing している。
README.md はユーザーが first touch する可能性が高いドキュメントなので、他のエンドポイント説明と並べたとき情報量が consistent であるほど good。説明を少し詳細にすることを検討してもよい。
Optional な改善案:
- | `POST` | `/v1/sources/google-health/runs` | Google Health取り込みrun作成 | + | `POST` | `/v1/sources/google-health/runs` | Google Health backfill・期間指定run作成 |🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@egograph/pipelines/README.md` at line 129, Update the README.md entry for the endpoint `POST /v1/sources/google-health/runs` to match the detail level in docs/architecture/pipelines.md by adding mode information (e.g., "backfill" and "period-specified run" capability); locate the table row containing the endpoint string `POST /v1/sources/google-health/runs` and replace the brief description "Google Health取り込みrun作成" with a more explicit description such as "Google Health backfill・期間指定 run 作成(backfill / period-specified)" or equivalent phrasing to clearly indicate supported modes.docs/data-sources/google-health.md (1)
293-314: 同期状態・再取得ロジックの実装、ドキュメントと一致してるよ(CodeRabbit)
writer.pyのcompact_rangeは、月partitionごとに既存compacted .../data.parquetを読み、_retain_outside_targetでconnection_idとdata_typeが対象&date_from <= row_date < date_toの行だけを置換して、他data type/期間外の行は保持したままretained + currentを再生成してるeventsの.../{run_id}.parquetはsave_eventsで追加するだけで、削除はcompacted_key側の_delete_if_existsのみ(events側は消さない)google_health_sync_cursorsはrepository.pyのsave_sync_resultがINSERT ... ON CONFLICT DO UPDATEで data type単位にstatus/range_start/range_end/last_run_id/record_count/last_error_message/updated_atを保存・更新し、compact側はlist_sync_results_for_runでlast_run_id=run.run_idを参照してる- 7.3の補足として、
sessionsかつdata_type=="sleep"の場合はターゲット判定にended_at_utcを使う特例があるので一言あると安心🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/data-sources/google-health.md` around lines 293 - 314, Docs currently describe re-fetch and sync-cursor behavior but lack a note about the special session/sleep target and the code-level mappings; update the Google Health doc to explicitly state (1) compact_range in writer.py replaces only rows matching connection_id/data_type and date_from <= row_date < date_to via _retain_outside_target while preserving other data types in the same month, (2) events saved by save_events (…/{run_id}.parquet) are never deleted (only compact-side _delete_if_exists removes compacted_key files), (3) sync cursors are persisted by repository.py::save_sync_result (INSERT … ON CONFLICT DO UPDATE) and read by list_sync_results_for_run using last_run_id, and (4) add the clarifying special-case note that for sessions with data_type == "sleep" the target selection uses ended_at_utc instead of row_date.egograph/pipelines/tests/unit/google_health/test_client.py (1)
104-109: ⚡ Quick win
filter_expressionの検証も入れておこう。テスト名は
filter送信をうたってるのに、params["filter"]の確認がない。ここを明示すると退行を拾いやすくなる。差分イメージ
# Assert params = session.calls[0][2]["params"] + assert params["filter"] == 'steps.interval.start_time >= "2026-06-01T00:00:00Z"' assert params["pageToken"] == "next-token" assert params["pageSize"] == 10000 assert params["dataSourceFamily"].endswith("/google-wearables")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@egograph/pipelines/tests/unit/google_health/test_client.py` around lines 104 - 109, テストが「filter を送信する」ことを謳っているのに params["filter"] の検証が抜けているので、session.calls[0][2]["params"] から取得した params に対して期待する filter 値(テストで設定したフィルタ文字列)を assert するアサーションを追加して退行を検出できるようにしてください(参照: params, session.calls を使っている箇所)。egograph/pipelines/tests/unit/google_health/test_extractor.py (1)
111-123: ⚡ Quick winテスト名どおり
reversedケースも足そう。今は
date_from == date_toだけ確認してる。date_from > date_toも同じテストで押さえると意図がブレない。差分イメージ
def test_extract_rejects_empty_or_reversed_range(): @@ with pytest.raises(ValueError, match="date_from must be earlier"): extractor.extract( connection_id="connection-1", data_type=DATA_TYPE_BY_NAME["steps"], date_from=date(2026, 6, 3), date_to=date(2026, 6, 3), ) + with pytest.raises(ValueError, match="date_from must be earlier"): + extractor.extract( + connection_id="connection-1", + data_type=DATA_TYPE_BY_NAME["steps"], + date_from=date(2026, 6, 4), + date_to=date(2026, 6, 3), + )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@egograph/pipelines/tests/unit/google_health/test_extractor.py` around lines 111 - 123, The test test_extract_rejects_empty_or_reversed_range only asserts the equal-dates case; add the reversed-range case so it also covers date_from > date_to by invoking GoogleHealthExtractor.extract with date_from later than date_to and asserting it raises the same ValueError (match "date_from must be earlier"); you can either add a second with pytest.raises block calling extractor.extract(...) with date_from=date(2026,6,4) and date_to=date(2026,6,3) or parametrize the test to include both scenarios, referencing the existing test name and GoogleHealthExtractor.extract to locate where to change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/data-sources/google-health.md`:
- Around line 142-147: Update the docs entry for
google_health_samples.measured_at_utc to match the implementation in
egograph/pipelines/sources/google_health/normalizer.py: state that the source is
`{dataType}.sampleTime.physicalTime` with a fallback to `{dataType}.instantTime`
(i.e., "sampleTime.physicalTime (if missing, instantTime)"), and make this
wording consistent with the `{dataType}.sampleTime.physicalTime` rows so both
locations reference the same precedence.
- Line 351: The sample JSON uses the field "dataPointName" which conflicts with
the API spec and surrounding docs that expect the DataPoint identifier field
"name"; update the sample to use "name" (e.g.,
"users/me/dataTypes/heart-rate/dataPoints/example") or, if you intentionally
want a different input shape, add a clear note describing the alternate
convention and why "dataPointName" is used to keep the document consistent with
the Google Health API v4 DataPoint identifier format; ensure references in the
doc that mention the identifier (e.g., examples and explanatory text) are
aligned with the chosen field.
In `@egograph/pipelines/infrastructure/db/schema.py`:
- Around line 151-155: The loop building dynamic SQL with an f-string in the
block using additions, columns and conn.execute (altering
google_health_sync_cursors) must be changed to use fixed SQL statements per
column instead of string interpolation; for each name in additions that is
missing from columns, call conn.execute with a constant query like "ALTER TABLE
google_health_sync_cursors ADD COLUMN {col} {type}" constructed by selecting
between a small set of allowed column names/types (or mapping additions to
explicit SQL templates) and pass any variable parts as parameters/validated
values rather than via f-strings so you remove direct interpolation and satisfy
the placeholder-based execute pattern.
In `@egograph/pipelines/sources/google_health/extractor.py`:
- Around line 116-131: The pagination loop calling
self._client.reconcile_data_points can infinite-loop if the API keeps returning
the same nextPageToken; add a duplicate-token guard: track the last seen
page_token (and a set of seen tokens) inside the while True in the function
using reconcile_data_points and raise an exception (or break) if the token
repeats or is unchanged between iterations. Apply the same guard to the
analogous loops in _fetch_daily_rollups and _fetch_interval_rollups; reference
the page_token variable and nextPageToken response field, and ensure you still
append responses and respect SESSION_PAGE_SIZE/DEFAULT_PAGE_SIZE behavior.
In `@egograph/pipelines/sources/google_health/normalizer.py`:
- Around line 274-285: _daily_date and _parse_datetime currently pass external
strings directly to date.fromisoformat()/datetime.fromisoformat() which can
raise ValueError and crash normalize_google_health_payload, causing an entire
data type to be marked FAILED; update _daily_date (function name: _daily_date)
and _parse_datetime (function name: _parse_datetime) to catch ValueError around
the fromisoformat calls and return None on parse failure, so callers like
_append_daily_metrics and _normalize_data_point will skip None dates rather than
letting the exception bubble up.
In `@egograph/pipelines/sources/google_health/workflow.py`:
- Around line 151-162: ループ内で extractor の返却値に含まれる cursor を
GoogleHealthRepository.save_sync_result に渡していないため GoogleHealthSyncCursor
が永続化されていません。workflow.py のループ(for result in results:)で result から cursor を取り出し(例:
result.get("cursor") または str(result["cursor"])
を必要に応じて使用)、dependencies.repository.save_sync_result の呼び出しに cursor=...
引数を追加して渡すように修正してください(参照シンボル: GoogleHealthSyncCursor,
GoogleHealthRepository.save_sync_result)。変更後は cursor が None
の場合でも明示的に渡されるようにして、sync-state の再開点が保存されることを確認してください。
- Around line 140-175: The top-level record_count should reflect per-data-type
result["record_count"] (which is set to 0 on save failure) instead of summing
the in-memory records; update the return value to compute record_count by
summing int(result["record_count"]) over results (use the existing results list
and result["record_count"]) or, alternatively, clear entries in records for
failed data_types when you set result.status to FAILED; modify the code around
the return block that currently sums len(dataset_records) for dataset_records in
records.values() to use the results-based count so failures correctly produce 0
saved records.
In `@egograph/pipelines/tests/integration/google_health/__init__.py`:
- Line 1: The module docstring in __init__.py is in English and should be
changed to Japanese to match project guidelines; update the top-level module
docstring in egograph.pipelines.tests.integration.google_health.__init__.py to a
concise Japanese string (e.g., "Google Health 統合テスト。" or similar) using the
project's preferred full-width punctuation and style so it aligns with other
Python files' docstrings.
In `@egograph/pipelines/tests/unit/google_health/test_workflow.py`:
- Around line 68-83: FakeRepository.list_sync_results_for_run currently returns
all self.sync_results and overwrites last_run_id with the call argument, which
masks bugs in filtering; change list_sync_results_for_run to filter
self.sync_results by the incoming run_id and connection_id (e.g., require
item["last_run_id"] == run_id and item["connection_id"] == connection_id or
similar keys used in your test fixtures) and preserve each item's actual
last_run_id when constructing GoogleHealthSyncCursor instead of blindly using
the run_id parameter so the fake repo accurately reproduces run-scoped results.
---
Nitpick comments:
In `@docs/data-sources/google-health.md`:
- Around line 293-314: Docs currently describe re-fetch and sync-cursor behavior
but lack a note about the special session/sleep target and the code-level
mappings; update the Google Health doc to explicitly state (1) compact_range in
writer.py replaces only rows matching connection_id/data_type and date_from <=
row_date < date_to via _retain_outside_target while preserving other data types
in the same month, (2) events saved by save_events (…/{run_id}.parquet) are
never deleted (only compact-side _delete_if_exists removes compacted_key files),
(3) sync cursors are persisted by repository.py::save_sync_result (INSERT … ON
CONFLICT DO UPDATE) and read by list_sync_results_for_run using last_run_id, and
(4) add the clarifying special-case note that for sessions with data_type ==
"sleep" the target selection uses ended_at_utc instead of row_date.
In `@egograph/pipelines/README.md`:
- Line 129: Update the README.md entry for the endpoint `POST
/v1/sources/google-health/runs` to match the detail level in
docs/architecture/pipelines.md by adding mode information (e.g., "backfill" and
"period-specified run" capability); locate the table row containing the endpoint
string `POST /v1/sources/google-health/runs` and replace the brief description
"Google Health取り込みrun作成" with a more explicit description such as "Google Health
backfill・期間指定 run 作成(backfill / period-specified)" or equivalent phrasing to
clearly indicate supported modes.
In `@egograph/pipelines/tests/unit/google_health/test_client.py`:
- Around line 104-109: テストが「filter を送信する」ことを謳っているのに params["filter"]
の検証が抜けているので、session.calls[0][2]["params"] から取得した params に対して期待する filter
値(テストで設定したフィルタ文字列)を assert するアサーションを追加して退行を検出できるようにしてください(参照: params,
session.calls を使っている箇所)。
In `@egograph/pipelines/tests/unit/google_health/test_extractor.py`:
- Around line 111-123: The test test_extract_rejects_empty_or_reversed_range
only asserts the equal-dates case; add the reversed-range case so it also covers
date_from > date_to by invoking GoogleHealthExtractor.extract with date_from
later than date_to and asserting it raises the same ValueError (match "date_from
must be earlier"); you can either add a second with pytest.raises block calling
extractor.extract(...) with date_from=date(2026,6,4) and date_to=date(2026,6,3)
or parametrize the test to include both scenarios, referencing the existing test
name and GoogleHealthExtractor.extract to locate where to change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 218339f6-8af2-463e-9f47-9bd36db66c6c
📒 Files selected for processing (29)
docs/architecture/pipelines.mddocs/data-sources/google-health.mdegograph/pipelines/README.mdegograph/pipelines/api/google_health.pyegograph/pipelines/domain/workflow.pyegograph/pipelines/infrastructure/db/schema.pyegograph/pipelines/infrastructure/dispatching/run_dispatcher.pyegograph/pipelines/service.pyegograph/pipelines/sources/google_health/client.pyegograph/pipelines/sources/google_health/data_types.pyegograph/pipelines/sources/google_health/extractor.pyegograph/pipelines/sources/google_health/models.pyegograph/pipelines/sources/google_health/normalizer.pyegograph/pipelines/sources/google_health/repository.pyegograph/pipelines/sources/google_health/workflow.pyegograph/pipelines/sources/google_health/writer.pyegograph/pipelines/tests/integration/google_health/__init__.pyegograph/pipelines/tests/integration/google_health/test_pipeline.pyegograph/pipelines/tests/support/dummy_steps.pyegograph/pipelines/tests/unit/google_health/test_api.pyegograph/pipelines/tests/unit/google_health/test_client.pyegograph/pipelines/tests/unit/google_health/test_data_types.pyegograph/pipelines/tests/unit/google_health/test_extractor.pyegograph/pipelines/tests/unit/google_health/test_normalizer.pyegograph/pipelines/tests/unit/google_health/test_repository.pyegograph/pipelines/tests/unit/google_health/test_workflow.pyegograph/pipelines/tests/unit/google_health/test_writer.pyegograph/pipelines/tests/unit/test_dispatcher.pyegograph/pipelines/workflows/registry.py
| for result in results: | ||
| status = SyncStatus(str(result["status"])) | ||
| dependencies.repository.save_sync_result( | ||
| connection_id=connection.connection_id, | ||
| data_type=str(result["data_type"]), | ||
| status=status, | ||
| range_start=request.date_from, | ||
| range_end=request.date_to, | ||
| run_id=run.run_id, | ||
| record_count=int(result["record_count"]), | ||
| error_message=str(result["error"]) if "error" in result else None, | ||
| ) |
There was a problem hiding this comment.
cursor が永続化経路に乗ってない
GoogleHealthSyncCursor と GoogleHealthRepository.save_sync_result(..., cursor=...) まで足してるのに、この呼び出しでは cursor を一度も渡してない。今の実装だと google_health_sync_cursors.cursor は全 run で NULL のままで、PR で追加した sync-state の再開点が残らない。extractor の返却値からここまで cursor を通す配線が必要。
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@egograph/pipelines/sources/google_health/workflow.py` around lines 151 - 162,
ループ内で extractor の返却値に含まれる cursor を GoogleHealthRepository.save_sync_result
に渡していないため GoogleHealthSyncCursor が永続化されていません。workflow.py のループ(for result in
results:)で result から cursor を取り出し(例: result.get("cursor") または
str(result["cursor"]) を必要に応じて使用)、dependencies.repository.save_sync_result の呼び出しに
cursor=... 引数を追加して渡すように修正してください(参照シンボル: GoogleHealthSyncCursor,
GoogleHealthRepository.save_sync_result)。変更後は cursor が None
の場合でも明示的に渡されるようにして、sync-state の再開点が保存されることを確認してください。
There was a problem hiding this comment.
♻️ Duplicate comments (2)
egograph/pipelines/sources/google_health/workflow.py (2)
113-119:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift
cursorが sync-state 永続化まで流れてないLine 113 の result 生成でも Line 156 の
save_sync_result()呼び出しでもcursorを保持してない。これだとGoogleHealthSyncCursor.cursorがずっと埋まらず、今回足した再開点管理が機能しない。extractor で得た cursor を result か別変数で持ち回して、そのまま repository 保存まで配線したい。GoogleHealthSyncCursorがcursorを持つ契約とも噛み合ってない。Also applies to: 154-165
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@egograph/pipelines/sources/google_health/workflow.py` around lines 113 - 119, The results being appended in the block building the sync result (the dict created where "data_type", "status", "record_count", "raw_ref" are set) do not include the extractor cursor, so the cursor never reaches save_sync_result or the GoogleHealthSyncCursor.cursor field; include the cursor produced by the extractor (e.g., add "cursor": cursor or store extractor.cursor in a local variable) when constructing each result entry, and ensure save_sync_result(...) is called with that cursor (or with a results object that includes cursor) so that the repository persists GoogleHealthSyncCursor.cursor; update any code paths around the results.append(...) site and the save_sync_result(...) call to forward the same cursor value rather than dropping it.
146-151:⚠️ Potential issue | 🟡 Minor | ⚡ Quick win
save_events()失敗後もトップレベルrecord_countだけ成功件数のままLine 146 で各 data type の
record_countを 0 に落としてるのに、返り値はrecordsから再集計してる。これだと run summary だけ「保存できた件数」が残って、失敗監視と再実行判断がずれる。💡 最小修正案
"saved_keys": saved_keys, - "record_count": sum( - len(dataset_records) for dataset_records in records.values() - ), + "record_count": sum(int(result["record_count"]) for result in results), "errors": _result_errors(results),Also applies to: 174-176
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@egograph/pipelines/sources/google_health/workflow.py` around lines 146 - 151, When save_events() fails and you mark each result's record_count to 0 (e.g., in the loop that updates result.update(status=SyncStatus.FAILED.value, record_count=0, error=...)), the top-level aggregation still uses the original "records" tally and leaves the run summary showing successful counts; change the final aggregation logic that builds the return/summary to recompute totals from the updated per-data-type entries (sum(result["record_count"]) and count statuses from result["status"]) instead of using the pre-existing "records" variable so the top-level record_count and summary reflect the per-result changes (apply the same change at the other analogous site where results are adjusted).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@egograph/pipelines/sources/google_health/workflow.py`:
- Around line 113-119: The results being appended in the block building the sync
result (the dict created where "data_type", "status", "record_count", "raw_ref"
are set) do not include the extractor cursor, so the cursor never reaches
save_sync_result or the GoogleHealthSyncCursor.cursor field; include the cursor
produced by the extractor (e.g., add "cursor": cursor or store extractor.cursor
in a local variable) when constructing each result entry, and ensure
save_sync_result(...) is called with that cursor (or with a results object that
includes cursor) so that the repository persists GoogleHealthSyncCursor.cursor;
update any code paths around the results.append(...) site and the
save_sync_result(...) call to forward the same cursor value rather than dropping
it.
- Around line 146-151: When save_events() fails and you mark each result's
record_count to 0 (e.g., in the loop that updates
result.update(status=SyncStatus.FAILED.value, record_count=0, error=...)), the
top-level aggregation still uses the original "records" tally and leaves the run
summary showing successful counts; change the final aggregation logic that
builds the return/summary to recompute totals from the updated per-data-type
entries (sum(result["record_count"]) and count statuses from result["status"])
instead of using the pre-existing "records" variable so the top-level
record_count and summary reflect the per-result changes (apply the same change
at the other analogous site where results are adjusted).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 8d4ae61e-2f86-41cc-981d-8a48f285cb34
📒 Files selected for processing (15)
docs/data-sources/google-health.mdegograph/pipelines/api/google_health.pyegograph/pipelines/config.pyegograph/pipelines/service.pyegograph/pipelines/sources/google_health/client.pyegograph/pipelines/sources/google_health/extractor.pyegograph/pipelines/sources/google_health/normalizer.pyegograph/pipelines/sources/google_health/timezone.pyegograph/pipelines/sources/google_health/workflow.pyegograph/pipelines/sources/google_health/writer.pyegograph/pipelines/tests/unit/google_health/test_api.pyegograph/pipelines/tests/unit/google_health/test_client.pyegograph/pipelines/tests/unit/google_health/test_extractor.pyegograph/pipelines/tests/unit/google_health/test_normalizer.pyegograph/pipelines/tests/unit/google_health/test_writer.py
🚧 Files skipped from review as they are similar to previous changes (10)
- egograph/pipelines/service.py
- egograph/pipelines/tests/unit/google_health/test_client.py
- egograph/pipelines/tests/unit/google_health/test_writer.py
- egograph/pipelines/tests/unit/google_health/test_extractor.py
- egograph/pipelines/tests/unit/google_health/test_normalizer.py
- egograph/pipelines/sources/google_health/extractor.py
- egograph/pipelines/sources/google_health/normalizer.py
- egograph/pipelines/sources/google_health/writer.py
- docs/data-sources/google-health.md
- egograph/pipelines/api/google_health.py
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
egograph/pipelines/tests/unit/google_health/test_workflow.py (1)
68-84: ⚡ Quick winフィルタリング修正は good だけど、Line 79 も
.get()にしとくと一貫性が出るよ。Line 83 でちゃんと
connection_idとrun_idでフィルタリングするようになって、過去レビューで指摘された問題が解決されてる!ただ、Line 73 はitem.get("cursor")を使ってるのに、Line 79 はitem["error_message"]を直接アクセスしてるんだよね。workflow コードでは
error_messageは常に渡されるから動くけど、テストコードの一貫性と将来の保守性を考えると、Line 79 も.get("error_message")にしといた方が安全だよ。♻️ 一貫性を高める提案
last_run_id=item["run_id"], record_count=item["record_count"], - last_error_message=item["error_message"], + last_error_message=item.get("error_message"), updated_at=datetime(2026, 6, 4, tzinfo=UTC), )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@egograph/pipelines/tests/unit/google_health/test_workflow.py` around lines 68 - 84, The test helper list_sync_results_for_run currently constructs GoogleHealthSyncCursor instances and accesses item["error_message"] directly; change that to use item.get("error_message") for consistency with item.get("cursor") and to make the test more robust—update the construction in list_sync_results_for_run so the last_error_message field is assigned item.get("error_message") instead of item["error_message"].
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@egograph/pipelines/tests/unit/google_health/test_workflow.py`:
- Around line 195-212: The test function
test_save_failure_reports_zero_saved_records is missing AAA comment markers; add
"# Arrange" before repository and FailingWriter setup and the
monkeypatch.setattr block, add "# Act" immediately before calling
run_google_health_ingest(_run(["steps"])), and add "# Assert" immediately before
the two assert lines so the test follows the Arrange/Act/Assert pattern used by
other tests.
---
Nitpick comments:
In `@egograph/pipelines/tests/unit/google_health/test_workflow.py`:
- Around line 68-84: The test helper list_sync_results_for_run currently
constructs GoogleHealthSyncCursor instances and accesses item["error_message"]
directly; change that to use item.get("error_message") for consistency with
item.get("cursor") and to make the test more robust—update the construction in
list_sync_results_for_run so the last_error_message field is assigned
item.get("error_message") instead of item["error_message"].
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 6d10da2e-51b3-4698-9c3f-2946641e4f8d
📒 Files selected for processing (11)
docs/data-sources/google-health.mdegograph/pipelines/README.mdegograph/pipelines/infrastructure/db/schema.pyegograph/pipelines/sources/google_health/extractor.pyegograph/pipelines/sources/google_health/normalizer.pyegograph/pipelines/sources/google_health/workflow.pyegograph/pipelines/tests/integration/google_health/__init__.pyegograph/pipelines/tests/unit/google_health/test_client.pyegograph/pipelines/tests/unit/google_health/test_extractor.pyegograph/pipelines/tests/unit/google_health/test_normalizer.pyegograph/pipelines/tests/unit/google_health/test_workflow.py
✅ Files skipped from review due to trivial changes (2)
- egograph/pipelines/README.md
- egograph/pipelines/tests/integration/google_health/init.py
🚧 Files skipped from review as they are similar to previous changes (7)
- egograph/pipelines/tests/unit/google_health/test_client.py
- egograph/pipelines/tests/unit/google_health/test_extractor.py
- egograph/pipelines/tests/unit/google_health/test_normalizer.py
- egograph/pipelines/infrastructure/db/schema.py
- docs/data-sources/google-health.md
- egograph/pipelines/sources/google_health/workflow.py
- egograph/pipelines/sources/google_health/normalizer.py
| def test_save_failure_reports_zero_saved_records(monkeypatch): | ||
| """events保存失敗時はメモリ上の正規化件数を返さない。""" | ||
| repository = FakeRepository() | ||
|
|
||
| class FailingWriter(FakeWriter): | ||
| def save_events(self, **kwargs): | ||
| raise RuntimeError("save failed") | ||
|
|
||
| monkeypatch.setattr( | ||
| "pipelines.sources.google_health.workflow._build_dependencies", | ||
| lambda: _dependencies(repository, FailingWriter()), | ||
| ) | ||
|
|
||
| result = run_google_health_ingest(_run(["steps"])) | ||
|
|
||
| assert result["status"] == "failed" | ||
| assert result["record_count"] == 0 | ||
|
|
There was a problem hiding this comment.
AAA pattern のコメントマーカーを追加してね。
このテスト、構造的には AAA pattern に従ってるんだけど、# Arrange, # Act, # Assert のコメントマーカーが抜けてるよ。コーディングガイドラインで「Test code must follow the AAA pattern」って指定されてるし、同じファイルの他のテスト(lines 141-163, 165-193, 214-251, 253-281)は全部コメントマーカーを付けてるから、一貫性を保つために追加しよう。
🐛 AAA コメントを追加する提案
def test_save_failure_reports_zero_saved_records(monkeypatch):
"""events保存失敗時はメモリ上の正規化件数を返さない。"""
+ # Arrange
repository = FakeRepository()
class FailingWriter(FakeWriter):
def save_events(self, **kwargs):
raise RuntimeError("save failed")
monkeypatch.setattr(
"pipelines.sources.google_health.workflow._build_dependencies",
lambda: _dependencies(repository, FailingWriter()),
)
+ # Act
result = run_google_health_ingest(_run(["steps"]))
+ # Assert
assert result["status"] == "failed"
assert result["record_count"] == 0As per coding guidelines, test code must follow the AAA pattern (Arrange, Act, Assert).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@egograph/pipelines/tests/unit/google_health/test_workflow.py` around lines
195 - 212, The test function test_save_failure_reports_zero_saved_records is
missing AAA comment markers; add "# Arrange" before repository and FailingWriter
setup and the monkeypatch.setattr block, add "# Act" immediately before calling
run_google_health_ingest(_run(["steps"])), and add "# Assert" immediately before
the two assert lines so the test follows the Arrange/Act/Assert pattern used by
other tests.
Source: Coding guidelines
概要
Google Health / Fitbit Air連携のPhase 2として、期間指定のデータ取得、正規化、Raw JSON・Parquet保存、同期状態管理を実装しました。
変更内容
initial_backfill、range、data_type_rangeのrun作成APIを追加events/.../{uuid}.parquetを保存compacted/.../data.parquetの対象期間を置換success、no_data、failed、件数、期間、エラーをSQLiteへ記録partial_failedをworkflow runへ反映保存フロー
events/google_health/.../{uuid}.parquetへ追加no_dataのdata typeだけをcompact対象にするcompacted/events/google_health/.../data.parquetを再生成影響
検証
uv run ruff check(変更Pythonファイル)git diff --checkuv run pytest egograph/pipelines/tests -qcoderabbit review --agent -t uncommittedSummary by CodeRabbit