Discover BCM Data Exports CUR 2.0 buckets for IAM policy by silvexis · Pull Request #110 · Cloudzero/provision-account

silvexis · 2026-06-15T19:06:47Z

Summary

Extends the discovery Lambda's CUR bucket discovery to also enumerate BCM Data Exports (CUR 2.0) S3 destination buckets, so the master-payer IAM policy grants s3:Get*/s3:List* on them alongside classic CUR buckets.

coeffects_bcm_data_exports calls bcm-data-exports:ListExports (paginated) then GetExport per export to read DestinationConfigurations.S3Destination.S3Bucket.
get_all_local_cur_bucket_names unions classic CUR report buckets with the data-export buckets, keeping only locally-owned ones, and feeds them into MasterPayerBillingBucketArns.
The DiscoveryFunction execution role gains bcm-data-exports:ListExports and bcm-data-exports:GetExport. (The cross-account role policies already grant bcm-data-exports:Get*/List*.)
Graceful degradation: an AccessDeniedException from BCM Data Exports logs a warning and falls back to classic CUR buckets.

Behavior notes

IAM bucket access is type-agnostic — access is granted to the destination bucket of every export, regardless of type.
Ingest remains COST_AND_USAGE_REPORT-only — CUR 2.0 is not wired into primary-bucket / BillingReportFormat selection; the classic CUR still drives what CloudZero ingests.

Testing

pytest — 18 passed, 89% coverage; flake8 clean. New tests cover: local CUR 2.0 bucket included, all export buckets covered regardless of type, remote bucket excluded, and graceful degradation on access-denied.

Discovery now calls bcm-data-exports ListExports/GetExport to find the S3 destination buckets of CUR 2.0 exports and folds every export's bucket into MasterPayerBillingBucketArns. Bucket access is granted type-agnostically (only COST_AND_USAGE_REPORT is ingested), and the discovery Lambda role gains the two bcm-data-exports actions. Author: Erik Peterson <erik@cloudzero.com>

khill2018

approved, but I agree with greptile about the test additions. Let's add those in (happy to do it myself) and then merge! Edit to add I see Erik is on a plane going to make those changes myself and have somebody additionally review the PR

Address review feedback on BCM Data Exports discovery: - Wrap each get_export call in its own try/except so a transient failure on a single export no longer drops buckets already resolved from other exports; the failing export is logged and skipped. - Keep list_exports failure as graceful fallback to empty exports. - Add tests for the multi-page list_exports NextToken pagination path and for per-export get_export error isolation. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

khill2018 · 2026-06-16T14:35:07Z

@greptile

greptile-apps · 2026-06-16T14:38:05Z

Greptile Summary

This PR extends the discovery Lambda to enumerate BCM Data Exports (CUR 2.0) S3 destination buckets and fold them into MasterPayerBillingBucketArns, so the master-payer IAM policy grants s3:Get*/List* on both classic CUR and CUR 2.0 buckets. It also removes the toolz pipeline style in favour of straight procedural code, removes the deprecated audit/cloudtrail-owner account types, replaces the inline Custom::CostAndUsageReport Lambda with the native AWS::CUR::ReportDefinition resource, and flattens the connected-account stack from six nested stacks to one.

list_data_export_bucket_names() paginates bcm:ListExports and resolves each export ARN with bcm:GetExport; per-export errors are isolated so a single GetExport failure doesn't drop the rest.
all_local_billing_bucket_names() unions classic CUR report buckets with BCM Data Export buckets, keeping only locally-owned ones; graceful AccessDeniedException degradation falls back to classic CUR buckets.
DiscoveryFunction execution role gains bcm-data-exports:ListExports + bcm-data-exports:GetExport; the new account_resources.yaml nested template replaces the old per-account-type templates.

Confidence Score: 4/5

Safe to merge with one fix: a mid-pagination ListExports failure discards all ARNs from earlier pages instead of using them.

The list_data_export_bucket_names function wraps the entire list_exports pagination loop in a single try/except. If the first page succeeds but any subsequent page raises, the except block executes return [], silently discarding all ARNs collected from previous pages. For accounts with enough exports to span multiple pages, this means their export buckets are missing from the IAM policy until the next stack update. Everything else looks correct.

services/discovery/src/app.py — the list_data_export_bucket_names pagination error handling.

Important Files Changed

Filename	Overview
services/discovery/src/app.py	Major refactor from toolz-pipeline style to procedural; adds `list_data_export_bucket_names()` for BCM Data Exports. One issue: mid-pagination `list_exports` failure discards all ARNs collected from earlier pages.
services/discovery/tests/unit/test_app.py	Comprehensive test coverage for BCM Data Exports paths: local bucket included, all-export-types covered, remote bucket excluded, access-denied degradation, pagination across two pages, per-export isolation, and missing-destination guard.
services/discovery/template.yaml	Adds `bcm-data-exports:ListExports` and `bcm-data-exports:GetExport` to the DiscoveryFunction execution role; removes deprecated `cloudtrail:DescribeTrails`.
services/account_type/account_resources.yaml	New template merging former resource_owner + master_payer templates; replaces Custom::CostAndUsageReport Lambda with native AWS::CUR::ReportDefinition (us-east-1 gated). IAM policies and Conditions look correct.
services/notification/src/app.py	Removed toolz pipeline and CFN stack-scraping; notification now builds payload directly from event properties. Fixed reactor payload contract is preserved including deprecated cloudtrail/audit fields.
services/connected_account.yaml	Flattened to a single nested AccountResources stack; inlines Discovery and Notification Lambdas; removes deprecated audit/cloudtrail-owner nested stacks. Adds Stage parameter for prod/dev callback URL selection.

Comments Outside Diff (1)

services/discovery/src/app.py, line 2398-2409 (link)

Mid-pagination ListExports failure silently discards already-collected ARNs

If list_exports succeeds on page 1 but raises on page 2, the export_arns list that was populated from page 1 is owned by the try block. When the except fires, return [] discards those page-1 ARNs entirely — so no export buckets at all end up in the IAM policy, even though half the exports were successfully enumerated.

The fix is to move the exception handling inside the loop so a page failure breaks out with partial results rather than discarding everything. The per-export get_export isolation pattern already in the code below is the right model to follow here.

_{Reviews (7): Last reviewed commit: "Drop CUR-format log line that trips Code..." | Re-trigger Greptile}

qiuz-cz · 2026-06-16T15:03:26Z

+            next_token = response.get('NextToken')
+            if not next_token:
+                break
+    except ClientError:


This except ClientError may be too narrow to deliver the intended graceful degradation. Two non-ClientError failure modes can bypass it:

Import-time client construction (line 19): boto3.client('bcm-data-exports', ...) runs at module scope. If the Lambda runtime's bundled botocore predates this relatively new service, it raises UnknownServiceError (a botocore.exceptions error, not ClientError) on cold start — before any try/except runs — crashing the whole custom resource.

Call-time connectivity — e.g. EndpointConnectionError is a BotoCoreError, not a ClientError.

Suggestion: confirm the runtime's botocore version actually supports bcm-data-exports, and consider broadening the catch here to also include BotoCoreError so unexpected client-side failures still fall back to classic CUR rather than failing the stack.

qiuz-cz · 2026-06-16T15:03:26Z

+        try:
+            export = bcm.get_export(ExportArn=export_arn).get('Export', {})
+        except ClientError:
+            logger.warning(f'Failed to access BCM Data Exports GetExport for {export_arn}', exc_info=True)


Nit: prefer lazy %s logging over an f-string for the message arg — it defers interpolation and matches the other logger.warning(...) calls in this module:

logger.warning('Failed to access BCM Data Exports GetExport for %s', export_arn, exc_info=True)

qiuz-cz · 2026-06-16T15:03:26Z

+    return {'Exports': [{'ExportArn': arn, 'ExportName': arn.split('/')[-1]} for arn in export_arns]}
+
+
+def _get_export_response(bucket_name):


Small coverage gap: there's no test for the if bucket: guard in coeffects_bcm_data_exports (app.py:171) — i.e. an export whose DestinationConfigurations.S3Destination.S3Bucket is missing/None. A quick test asserting such an export is silently skipped (and doesn't add an empty/None bucket) would lock in that behavior.

qiuz-cz · 2026-06-16T15:03:27Z

@@ -0,0 +1,24 @@
+# [1.0.101](https://github.com/Cloudzero/provision-account/compare/1.0.100...1.0.101) (2026-06-15)


The version (1.0.101) and date (2026-06-15) are hardcoded here. The compare-URL format suggests releases may be auto-generated (semantic-release style) — if so, a hand-written release file could conflict with or duplicate the generated one. Worth confirming this is the intended workflow for the repo.

Collapse the 6 nested stacks into one: connected_account.yaml inlines the Discovery and Notification Lambdas and provisions account resources through a single nested AccountResources stack (merged resource_owner + master_payer). Replace the Custom::CostAndUsageReport Lambda with native AWS::S3::Bucket, AWS::S3::BucketPolicy and AWS::CUR::ReportDefinition, gated on us-east-1 with DeletionPolicy: Retain. Remove the deprecated audit and cloudtrail-owner account types (templates, discovery detection, cloudtrail:DescribeTrails); the reactor payload contract is unchanged, emitting null/false for the deprecated fields. Rewrite the discovery and notification Lambdas into clear procedural flows; notification now reads direct resource properties instead of scraping stack outputs. Add a Stage parameter so prod/dev parents differ by one line. Author: Erik Peterson <erik@cloudzero.com>

- CodeQL (py/clear-text-logging-sensitive-data): stop logging the full event / payload / output in the discovery and notification handlers; they carry ExternalId and account-identifying fields. Log only non-sensitive summary fields. - Broaden the BCM Data Exports list/get catch to BotoCoreError so connectivity errors degrade gracefully (per review feedback). - Use lazy %s logging for the per-export GetExport warning. - Add tests for an export with no destination bucket (the if-bucket guard). - Drop now-unused toolz from discovery/notification requirements. Author: Erik Peterson <erik@cloudzero.com>

silvexis · 2026-06-16T18:34:54Z

Thanks for the reviews. Addressed the feedback in 0f5f29c (on top of the stack-flattening work):

CodeQL — clear-text logging of sensitive data (py/clear-text-logging-sensitive-data) ✅
The discovery and notification handlers no longer log the full event / payload / output — those carry ExternalId and account-identifying fields. They now log only non-sensitive summary fields (RequestType, message_type, response status, the boolean classification flags).

@qiuz-cz — except ClientError too narrow (app.py:122) ✅
Broadened the BCM Data Exports list/get catches to (ClientError, BotoCoreError) so connectivity errors (e.g. EndpointConnectionError) degrade gracefully. On the import-time UnknownServiceError concern: requirements.txt pins boto3>=1.34.91, and bcm-data-exports has shipped in botocore since 1.34.0, so the client constructs fine on the python3.12 runtime.

@qiuz-cz — lazy %s logging (app.py:134) ✅ Switched the per-export GetExport warning to logger.warning('... %s', export_arn, ...).

@qiuz-cz — coverage gap for the if bucket: guard ✅ Added test_handler_skips_export_with_no_destination_bucket.

@qiuz-cz — hardcoded version/date in docs/releases/1.0.101.md — confirmed: release notes here are hand-written and committed (see 1.0.95.md, 1.0.100.md); the ${...} compare-URL is just the file's own convention, not semantic-release generation. Same pattern followed for the new 1.0.102.md.

Greptile P2s (per-export error isolation + pagination test) — these were resolved by @khill2018's 50c9b31 and are preserved through the refactor (the discovery handler was rewritten; the isolation + pagination behavior and tests carried over).

Also dropped the now-unused toolz dependency from both Lambda requirements.

Heads-up @khill2018: this PR now also contains the connected-account stack flattening (6 nested stacks → 1), native AWS::CUR::ReportDefinition, and removal of the deprecated audit / cloudtrail-owner account types — see docs/releases/1.0.102.md. Your 50c9b31 discovery change was preserved through the rewrite.

The redacted logs still read tainted values: the notification payload dict carries ExternalId (so any field read is flagged) and the discovery log interpolated the CUR bucket name. Log a static message when posting to the reactor, and log only the (literal) billing report format when selecting a CUR. Author: Erik Peterson <erik@cloudzero.com>

CodeQL's clear-text-logging 'private data' heuristic flags the variable name billing_report_format (contains 'billing'), not its value (a literal like 'aws'). The log carried no real value beyond the format already present in the discovery output, so remove it rather than fight the scanner. Author: Erik Peterson <erik@cloudzero.com>

silvexis requested a review from a team as a code owner June 15, 2026 19:06

silvexis requested a review from khill2018 June 15, 2026 19:06

greptile-apps Bot reviewed Jun 15, 2026

View reviewed changes

Comment thread services/discovery/src/app.py Outdated

Comment thread services/discovery/tests/unit/test_app.py Outdated

khill2018 approved these changes Jun 16, 2026

View reviewed changes

Cloudzero deleted a comment from greptile-apps Bot Jun 16, 2026

khill2018 requested a review from qiuz-cz June 16, 2026 14:43

qiuz-cz reviewed Jun 16, 2026

View reviewed changes

github-advanced-security AI found potential problems Jun 16, 2026

View reviewed changes

Comment thread services/discovery/src/app.py Fixed

Comment thread services/notification/src/app.py Fixed

github-advanced-security AI found potential problems Jun 16, 2026

View reviewed changes

Comment thread services/notification/src/app.py Fixed

github-advanced-security AI found potential problems Jun 16, 2026

View reviewed changes

Comment thread services/discovery/src/app.py Fixed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discover BCM Data Exports CUR 2.0 buckets for IAM policy#110

Discover BCM Data Exports CUR 2.0 buckets for IAM policy#110
silvexis wants to merge 6 commits into
developfrom
feature/discovery-cur2-data-exports

silvexis commented Jun 15, 2026

Uh oh!

Uh oh!

Uh oh!

khill2018 left a comment •

edited

Loading

Uh oh!

khill2018 commented Jun 16, 2026

Uh oh!

greptile-apps Bot commented Jun 16, 2026 •

edited

Loading

Comments Outside Diff (1)

Uh oh!

qiuz-cz Jun 16, 2026

Uh oh!

qiuz-cz Jun 16, 2026

Uh oh!

qiuz-cz Jun 16, 2026

Uh oh!

qiuz-cz Jun 16, 2026

Uh oh!

Uh oh!

Uh oh!

silvexis commented Jun 16, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		return {'Exports': [{'ExportArn': arn, 'ExportName': arn.split('/')[-1]} for arn in export_arns]}


		def _get_export_response(bucket_name):

		@@ -0,0 +1,24 @@
		# [1.0.101](https://github.com/Cloudzero/provision-account/compare/1.0.100...1.0.101) (2026-06-15)

Conversation

silvexis commented Jun 15, 2026

Summary

Behavior notes

Testing

Uh oh!

Uh oh!

Uh oh!

khill2018 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

khill2018 commented Jun 16, 2026

Uh oh!

greptile-apps Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Comments Outside Diff (1)

Uh oh!

qiuz-cz Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

qiuz-cz Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

qiuz-cz Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

qiuz-cz Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

silvexis commented Jun 16, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

khill2018 left a comment •

edited

Loading

greptile-apps Bot commented Jun 16, 2026 •

edited

Loading